Daoud Clarke
2336ed7f7d
Allow posting extra links with lower score weighting
2023-01-01 20:37:41 +00:00
Daoud Clarke
6edf48693b
Check the domain is correct, potential bug in psql
2023-01-01 01:30:44 +00:00
Daoud Clarke
b7984684c9
Tidy, improve logging
2023-01-01 01:14:05 +00:00
Daoud Clarke
7c14cd99f8
Update the URL queue earlier
2022-12-31 23:37:59 +00:00
Daoud Clarke
0d33b4f68f
Merge pull request #86 from mwmbl/improve-crawling
...
Improve crawling
2022-12-31 22:56:21 +00:00
Daoud Clarke
a86e172bf3
Reinstate background tasks
2022-12-31 22:52:17 +00:00
Daoud Clarke
d9cd3c585b
Get results from other domains
2022-12-31 22:51:00 +00:00
Daoud Clarke
77f08d8f0a
Update URL status
2022-12-31 22:25:05 +00:00
Daoud Clarke
36af579f7c
Sample domains
2022-12-31 17:04:38 +00:00
Daoud Clarke
ea16e7b5cd
WIP: improve method of getting URLs for crawling
2022-12-31 13:37:40 +00:00
Daoud Clarke
7dae39b780
WIP: improve method of getting URLs for crawling
2022-12-31 13:32:15 +00:00
Daoud Clarke
c69108cfcc
Don't delete an index if the sizes don't match
2022-12-27 10:52:46 +00:00
Daoud Clarke
bb8a36a612
Number of pages is an int
2022-12-27 10:40:53 +00:00
Daoud Clarke
c01129cdb9
Merge branch 'master' of github.com:mwmbl/mwmbl
2022-12-27 10:25:41 +00:00
Daoud Clarke
26351a1072
Use the correct storage location in prod
2022-12-27 10:24:48 +00:00
Daoud Clarke
f3f3831a97
Merge pull request #83 from omasanori/spacy-deps-rework
...
Rework installation of spaCy models for clarity
2022-12-27 10:20:52 +00:00
Masanori Ogino
71187a3938
Rework installation of spaCy models for clarity
...
- Install the wheel package for compatibility with future pip
- Use `spacy download` for installing model(s)
- Use `spacy validate` for checking model compatibility explicitly
Signed-off-by: Masanori Ogino <167209+omasanori@users.noreply.github.com>
2022-12-27 11:33:52 +09:00
Daoud Clarke
d85067ec09
Remove apt command
2022-12-24 20:20:53 +00:00
Daoud Clarke
1ef60e8d5d
Put install in correct place
2022-12-24 20:18:02 +00:00
Daoud Clarke
8e613dd368
Install psql client
2022-12-24 20:13:53 +00:00
Daoud Clarke
80282cfc7a
Exclude a domain
2022-12-24 19:59:56 +00:00
Daoud Clarke
8676abbc63
Format fetched url
2022-12-24 19:59:15 +00:00
Daoud Clarke
57295846cb
Update README.md
2022-12-21 21:49:56 +00:00
Daoud Clarke
0a4e1e4aee
Add endpoint to fetch a URL and return title and extract
2022-12-21 21:15:34 +00:00
Daoud Clarke
c7571120cc
Implement validation
2022-12-21 15:32:30 +00:00
Daoud Clarke
061462460b
Separate out the curation to make it easier to store in a comment
2022-12-20 19:11:01 +00:00
Daoud Clarke
6cf27fa47f
Fix serialisation issue
2022-12-19 23:19:32 +00:00
Daoud Clarke
b559a50506
Require the whole result
2022-12-19 22:18:28 +00:00
Daoud Clarke
5eab543f3b
Merge branch 'master' into user-registration
2022-12-19 21:53:11 +00:00
Daoud Clarke
a88a1a3e95
Rename some parameters; return curation ID
2022-12-19 21:51:26 +00:00
Daoud Clarke
efc8e8e383
Merge pull request #78 from mwmbl/make-dev-easier
...
Make it easier to run mwmbl locally
2022-12-19 21:50:54 +00:00
Daoud Clarke
31c27daca4
Add curations
2022-12-11 18:48:25 +00:00
Daoud Clarke
f89e1d6043
Create a post when beginning curation
2022-12-10 23:45:10 +00:00
Daoud Clarke
eadb7f3e28
Follow a begin curate/update curation workflow
2022-12-10 22:49:06 +00:00
Daoud Clarke
f8ab6092b0
Suggest using dokku instead of docker directly
2022-12-08 22:33:58 +00:00
Daoud Clarke
8aa51e548b
Allow login
2022-12-08 22:23:48 +00:00
Daoud Clarke
cf6ceedfd5
Actually allow registration
2022-12-07 22:56:20 +00:00
Daoud Clarke
a50bc28436
Make it easier to rum mwmbl locally
2022-12-07 20:01:31 +00:00
Daoud Clarke
d8d7149f4a
Start to implement user registration using Lemmy as a back end
2022-12-06 22:36:38 +00:00
Daoud Clarke
c0f89ba6c3
Update matrix badge
2022-12-05 18:47:26 +00:00
Daoud Clarke
dd4dd8a752
Exclude an annoying web site
2022-12-02 21:29:06 +00:00
Daoud Clarke
40f9eade9a
Update index name
2022-08-27 09:38:39 +01:00
Daoud Clarke
b6183e00ea
Merge pull request #74 from mwmbl/evaluate-indexing
...
Evaluate indexing
2022-08-27 09:37:22 +01:00
Daoud Clarke
cf253ae524
Split out URL updating from indexing
2022-08-26 22:20:35 +01:00
Daoud Clarke
f4fb9f831a
Use terms and bigrams from the beginning of the string only
2022-08-26 17:20:11 +01:00
Daoud Clarke
619b6c3a93
Don't remove stopwords
2022-08-24 21:08:33 +01:00
Daoud Clarke
578b705609
Don't replace full stops and commas
2022-08-23 22:06:43 +01:00
Daoud Clarke
4779371cf3
Use a custom tokenizer
2022-08-23 21:57:38 +01:00
Daoud Clarke
b1eea2457f
Script to index local batch for evaluation
2022-08-22 22:47:42 +01:00
Daoud Clarke
480be85cfd
Fix bug in completions with duplicated terms
2022-08-14 22:03:50 +01:00