Commit graph

474 commits

Author SHA1 Message Date
Daoud Clarke
77e39b4a89 Optimise URL update 2023-01-22 20:28:18 +00:00
Daoud Clarke
66700f8a3e Speed up domain parsing 2023-01-20 20:53:50 +00:00
Daoud Clarke
2b36f2ccc1 Try and balance URLs before adding to queue 2023-01-19 21:56:40 +00:00
Daoud Clarke
603fcd4eb2 Create a custom URL queue 2023-01-14 21:59:31 +00:00
Daoud Clarke
01f08fd88d Return updated URLs 2023-01-14 19:17:16 +00:00
Daoud Clarke
bd0cc3863e Don't try and update an empty list of URLs 2023-01-09 21:02:40 +00:00
Daoud Clarke
d347a17d63 Update URL queue separately from the other background process to speed it up 2023-01-09 20:50:28 +00:00
Daoud Clarke
7bd12c1ead Fix some bugs in URL fetching query 2023-01-02 20:51:23 +00:00
Daoud Clarke
a50f1d8ae3 Fix postgres install 2023-01-02 12:19:10 +00:00
Daoud Clarke
1ab16b1fb4 Install postgres client 2023-01-02 12:18:03 +00:00
Daoud Clarke
dda5a25ad0 Add core domains 2023-01-02 12:05:22 +00:00
Daoud Clarke
ab37bbe0a5 Exclude google plus 2023-01-01 22:18:47 +00:00
Daoud Clarke
2336ed7f7d Allow posting extra links with lower score weighting 2023-01-01 20:37:41 +00:00
Daoud Clarke
6edf48693b Check the domain is correct, potential bug in psql 2023-01-01 01:30:44 +00:00
Daoud Clarke
b7984684c9 Tidy, improve logging 2023-01-01 01:14:05 +00:00
Daoud Clarke
7c14cd99f8 Update the URL queue earlier 2022-12-31 23:37:59 +00:00
Daoud Clarke
0d33b4f68f
Merge pull request #86 from mwmbl/improve-crawling
Improve crawling
2022-12-31 22:56:21 +00:00
Daoud Clarke
a86e172bf3 Reinstate background tasks 2022-12-31 22:52:17 +00:00
Daoud Clarke
d9cd3c585b Get results from other domains 2022-12-31 22:51:00 +00:00
Daoud Clarke
77f08d8f0a Update URL status 2022-12-31 22:25:05 +00:00
Daoud Clarke
36af579f7c Sample domains 2022-12-31 17:04:38 +00:00
Daoud Clarke
ea16e7b5cd WIP: improve method of getting URLs for crawling 2022-12-31 13:37:40 +00:00
Daoud Clarke
7dae39b780 WIP: improve method of getting URLs for crawling 2022-12-31 13:32:15 +00:00
Daoud Clarke
c69108cfcc Don't delete an index if the sizes don't match 2022-12-27 10:52:46 +00:00
Daoud Clarke
bb8a36a612 Number of pages is an int 2022-12-27 10:40:53 +00:00
Daoud Clarke
c01129cdb9 Merge branch 'master' of github.com:mwmbl/mwmbl 2022-12-27 10:25:41 +00:00
Daoud Clarke
26351a1072 Use the correct storage location in prod 2022-12-27 10:24:48 +00:00
Daoud Clarke
f3f3831a97
Merge pull request #83 from omasanori/spacy-deps-rework
Rework installation of spaCy models for clarity
2022-12-27 10:20:52 +00:00
Masanori Ogino
71187a3938 Rework installation of spaCy models for clarity
- Install the wheel package for compatibility with future pip
- Use `spacy download` for installing model(s)
- Use `spacy validate` for checking model compatibility explicitly

Signed-off-by: Masanori Ogino <167209+omasanori@users.noreply.github.com>
2022-12-27 11:33:52 +09:00
Daoud Clarke
d85067ec09 Remove apt command 2022-12-24 20:20:53 +00:00
Daoud Clarke
1ef60e8d5d Put install in correct place 2022-12-24 20:18:02 +00:00
Daoud Clarke
8e613dd368 Install psql client 2022-12-24 20:13:53 +00:00
Daoud Clarke
80282cfc7a Exclude a domain 2022-12-24 19:59:56 +00:00
Daoud Clarke
8676abbc63 Format fetched url 2022-12-24 19:59:15 +00:00
Daoud Clarke
57295846cb
Update README.md 2022-12-21 21:49:56 +00:00
Daoud Clarke
0a4e1e4aee Add endpoint to fetch a URL and return title and extract 2022-12-21 21:15:34 +00:00
Daoud Clarke
c7571120cc Implement validation 2022-12-21 15:32:30 +00:00
Daoud Clarke
061462460b Separate out the curation to make it easier to store in a comment 2022-12-20 19:11:01 +00:00
Daoud Clarke
6cf27fa47f Fix serialisation issue 2022-12-19 23:19:32 +00:00
Daoud Clarke
b559a50506 Require the whole result 2022-12-19 22:18:28 +00:00
Daoud Clarke
5eab543f3b Merge branch 'master' into user-registration 2022-12-19 21:53:11 +00:00
Daoud Clarke
a88a1a3e95 Rename some parameters; return curation ID 2022-12-19 21:51:26 +00:00
Daoud Clarke
efc8e8e383
Merge pull request #78 from mwmbl/make-dev-easier
Make it easier to run mwmbl locally
2022-12-19 21:50:54 +00:00
Daoud Clarke
31c27daca4 Add curations 2022-12-11 18:48:25 +00:00
Daoud Clarke
f89e1d6043 Create a post when beginning curation 2022-12-10 23:45:10 +00:00
Daoud Clarke
eadb7f3e28 Follow a begin curate/update curation workflow 2022-12-10 22:49:06 +00:00
Daoud Clarke
f8ab6092b0 Suggest using dokku instead of docker directly 2022-12-08 22:33:58 +00:00
Daoud Clarke
8aa51e548b Allow login 2022-12-08 22:23:48 +00:00
Daoud Clarke
cf6ceedfd5 Actually allow registration 2022-12-07 22:56:20 +00:00
Daoud Clarke
a50bc28436 Make it easier to rum mwmbl locally 2022-12-07 20:01:31 +00:00