Commit graph

19 commits

Author SHA1 Message Date
Daoud Clarke
4917b882d2 Exclude more spam sites 2023-10-18 17:02:45 +01:00
Daoud Clarke
78a9bfbb11 Filter out more spam domains 2023-10-17 22:05:53 +01:00
Daoud Clarke
8c7ddda7d9 Use blacklist on initialisation, add tests 2023-10-17 21:51:23 +01:00
Daoud Clarke
c6d9e6ebb0 Fix some paths, use prod settings in Dockerfile 2023-10-10 20:18:43 +01:00
Daoud Clarke
918eaa8709 Rename django app to mwmbl 2023-10-10 13:51:06 +01:00
Daoud Clarke
41061a695b Add tests 2023-10-04 20:19:42 +01:00
Daoud Clarke
b5b37629ce Clean unicode when formatting result 2023-05-20 22:11:51 +01:00
Rishabh Singh Ahluwalia
e9dfd40ecb
Merge pull request #98 from mwmbl/rishabh-fix-trim-data
Fix trimming page size logic while adding to a page
2023-03-28 08:18:53 -07:00
Rishabh Singh Ahluwalia
8e197a09f9 Fix trimming page size logic while adding to a page 2023-03-26 10:04:05 -07:00
Daoud Clarke
7d0c55c015 Fix broken test 2023-02-25 18:18:09 +00:00
Daoud Clarke
e5c08e0d24 Fix big with other URLs 2023-02-25 16:48:59 +00:00
Daoud Clarke
bc6be8b6d5 Merge branch 'master' into update-urls-queue-quickly 2023-02-24 21:37:54 +00:00
Daoud Clarke
a03b76e5cc Fix broken test 2023-02-24 21:37:32 +00:00
Rishabh Singh Ahluwalia
30aff3b920 Add pytest, unit tests for completer,gh actions ci 2023-02-22 21:37:10 -08:00
Daoud Clarke
2b36f2ccc1 Try and balance URLs before adding to queue 2023-01-19 21:56:40 +00:00
Daoud Clarke
6209382d76 Index batches in memory 2022-07-24 15:44:01 +01:00
Daoud Clarke
680fe1ca0c Fix unicode encoding error 2022-07-16 10:54:25 +01:00
Daoud Clarke
326f7e3d7f Use JSON instead of struct to store metadata 2022-02-18 22:22:47 +00:00
Daoud Clarke
e6273c7f76 WIP: include metadata in index - using struct approach 2022-02-18 22:12:22 +00:00