Commit graph

392 commits

Author SHA1 Message Date
Daoud Clarke
b6fd27352b Add crawler router 2023-10-08 14:13:38 +01:00
Daoud Clarke
ed64ca6c91 Merge branch 'main' into django-rewrite 2023-10-07 19:19:34 +01:00
Daoud Clarke
d716cb347f
Merge pull request #114 from mwmbl/exclude-domains-by-keyword
Exclude domains by keyword
2023-10-04 20:20:51 +01:00
Daoud Clarke
41061a695b Add tests 2023-10-04 20:19:42 +01:00
Daoud Clarke
593c71f689 Exclude domains by keyword 2023-10-04 19:51:33 +01:00
Daoud Clarke
a77dc3eb4c
Merge pull request #113 from mwmbl/more-stats
Add more stats
2023-10-02 22:19:20 +01:00
Daoud Clarke
988f3fd2a9 Add more stats 2023-10-02 22:19:02 +01:00
Daoud Clarke
7c3aea5ca0 Temporary just select some URLs at random for initialization 2023-09-29 22:32:31 +01:00
Daoud Clarke
826d3d6ba9
Merge pull request #112 from mwmbl/stats
Stats
2023-09-29 21:49:28 +01:00
Daoud Clarke
ab527c4b58 Use stats manager from redis URL 2023-09-29 21:48:36 +01:00
Daoud Clarke
0d795b7c64 Fix bugs with date method 2023-09-29 21:27:32 +01:00
Daoud Clarke
e1bf423e69 Get stats 2023-09-29 13:58:26 +01:00
Daoud Clarke
a55a027107 Store stats in redis 2023-09-29 13:37:54 +01:00
Daoud Clarke
db658daa88 Store stats in redis 2023-09-28 17:48:29 +01:00
Daoud Clarke
86a6524f0a WIP add search API to Django 2023-09-24 08:09:18 +01:00
Daoud Clarke
177324353f Merge branch 'main' into django-rewrite 2023-09-23 12:56:22 +01:00
Daoud Clarke
bec00cdab5 Exclude additional domain 2023-09-22 23:06:04 +01:00
Daoud Clarke
7e054d0854 Better blacklist 2023-09-22 23:04:37 +01:00
Daoud Clarke
ed96386f05
Merge pull request #110 from mwmbl/update-blacklist
Exclude blacklisted domains
2023-09-22 21:54:12 +01:00
Daoud Clarke
019095a4c1 Exclude blacklisted domains 2023-09-22 21:53:53 +01:00
Daoud Clarke
19cc196e34 Add django ninja 2023-09-22 19:56:42 +01:00
Daoud Clarke
4aefc48716 Add django 2023-08-27 07:37:15 +01:00
Daoud Clarke
18dc760a34 Temp disable CORS 2023-05-20 23:23:43 +01:00
Daoud Clarke
01bf4c21df Temporarily disable lemmy as connection is refused 2023-05-20 22:26:33 +01:00
Daoud Clarke
8851d86ff4 Justext is not extra 2023-05-20 22:17:23 +01:00
Daoud Clarke
b5b37629ce Clean unicode when formatting result 2023-05-20 22:11:51 +01:00
Daoud Clarke
dec7c4853d Whitespace fix 2023-05-20 21:52:33 +01:00
Daoud Clarke
3e08c6e804 Check response status; provide an answer when registering 2023-05-20 21:51:57 +01:00
Daoud Clarke
60980a6bc7
Merge pull request #100 from mwmbl/user-registration
User registration
2023-04-30 20:31:09 +01:00
Daoud Clarke
8d64af4f1b Keep track of curated couments 2023-04-30 18:25:48 +01:00
Daoud Clarke
f0592f99df Require a curated boolean flag 2023-04-13 06:27:51 +01:00
Daoud Clarke
759dbf07b9 Revert index 2023-04-13 05:37:43 +01:00
Daoud Clarke
00b5438492 Track curated items in the index 2023-04-09 06:26:23 +01:00
Daoud Clarke
a87d3d6def Store curated pages in the index 2023-04-09 05:31:23 +01:00
Daoud Clarke
61cdd4dd71 Merge branch 'main' into user-registration 2023-04-01 07:17:29 +01:00
Daoud Clarke
3e1f5da28e Off by one error with page size 2023-04-01 06:40:03 +01:00
Daoud Clarke
91269d5100 Handle a bad batch 2023-04-01 06:35:44 +01:00
Rishabh Singh Ahluwalia
e9dfd40ecb
Merge pull request #98 from mwmbl/rishabh-fix-trim-data
Fix trimming page size logic while adding to a page
2023-03-28 08:18:53 -07:00
Rishabh Singh Ahluwalia
f232badd67 fix comma formatting 2023-03-27 22:18:10 -07:00
Rishabh Singh Ahluwalia
8e197a09f9 Fix trimming page size logic while adding to a page 2023-03-26 10:04:05 -07:00
Daoud Clarke
23688bd3ad Merge branch 'master' into user-registration 2023-03-18 22:37:45 +00:00
Daoud Clarke
0838157185
Merge pull request #97 from mwmbl/initialize-with-found-urls
Initialize with found urls
2023-02-25 18:20:11 +00:00
Daoud Clarke
7d0c55c015 Fix broken test 2023-02-25 18:18:09 +00:00
Daoud Clarke
e5c08e0d24 Fix big with other URLs 2023-02-25 16:48:59 +00:00
Daoud Clarke
a24156ce5c Initialize URLs by processing them like all other URLs to avoid bias 2023-02-25 13:45:03 +00:00
Daoud Clarke
6bb8bdf0c2 Initialize with new URLs 2023-02-25 10:48:22 +00:00
Daoud Clarke
a9e2b48840
Merge pull request #96 from mwmbl/unique-urls-in-queue
Unique URLs in queue
2023-02-25 10:35:32 +00:00
Daoud Clarke
5c94dfa669 Shuffle URLs before batching 2023-02-25 10:35:10 +00:00
Daoud Clarke
6ff62fb119 Ensure URLs in queue are unique 2023-02-25 10:34:09 +00:00
Daoud Clarke
c36e1dffcb Remove picolisp as a top domain since there are duplicate URLs 2023-02-25 09:56:26 +00:00