Daoud Clarke
|
0838157185
|
Merge pull request #97 from mwmbl/initialize-with-found-urls
Initialize with found urls
|
2023-02-25 18:20:11 +00:00 |
|
Daoud Clarke
|
7d0c55c015
|
Fix broken test
|
2023-02-25 18:18:09 +00:00 |
|
Daoud Clarke
|
e5c08e0d24
|
Fix big with other URLs
|
2023-02-25 16:48:59 +00:00 |
|
Daoud Clarke
|
a24156ce5c
|
Initialize URLs by processing them like all other URLs to avoid bias
|
2023-02-25 13:45:03 +00:00 |
|
Daoud Clarke
|
6bb8bdf0c2
|
Initialize with new URLs
|
2023-02-25 10:48:22 +00:00 |
|
Daoud Clarke
|
a9e2b48840
|
Merge pull request #96 from mwmbl/unique-urls-in-queue
Unique URLs in queue
|
2023-02-25 10:35:32 +00:00 |
|
Daoud Clarke
|
5c94dfa669
|
Shuffle URLs before batching
|
2023-02-25 10:35:10 +00:00 |
|
Daoud Clarke
|
6ff62fb119
|
Ensure URLs in queue are unique
|
2023-02-25 10:34:09 +00:00 |
|
Daoud Clarke
|
c36e1dffcb
|
Remove picolisp as a top domain since there are duplicate URLs
|
2023-02-25 09:56:26 +00:00 |
|
Daoud Clarke
|
362f9bfa9e
|
Write page to the correct location (metadata size offset bug fix)
|
2023-02-24 21:46:18 +00:00 |
|
Daoud Clarke
|
5616626fc1
|
Merge pull request #89 from mwmbl/update-urls-queue-quickly
Update urls queue quickly
|
2023-02-24 21:39:40 +00:00 |
|
Daoud Clarke
|
bc6be8b6d5
|
Merge branch 'master' into update-urls-queue-quickly
|
2023-02-24 21:37:54 +00:00 |
|
Daoud Clarke
|
a03b76e5cc
|
Fix broken test
|
2023-02-24 21:37:32 +00:00 |
|
Daoud Clarke
|
c97d946fcf
|
Go back to processing 10,000 batches at a time
|
2023-02-24 21:29:42 +00:00 |
|
Rishabh Singh Ahluwalia
|
38a5dbbf3c
|
Merge pull request #94 from mwmbl/rishabh-port-configuration
Allow configuration of port
|
2023-02-23 07:31:07 -08:00 |
|
Rishabh Singh Ahluwalia
|
2aa61a5121
|
Merge pull request #95 from mwmbl/rishabh-unit-testing-with-ci
Add PyUnit dependency + Unit Tests for completer.py + Github Actions CI for running unit tests
|
2023-02-23 07:30:48 -08:00 |
|
Rishabh Singh Ahluwalia
|
30aff3b920
|
Add pytest, unit tests for completer,gh actions ci
|
2023-02-22 21:37:10 -08:00 |
|
Rishabh Singh Ahluwalia
|
842aec19e2
|
Add port to args
|
2023-02-22 19:59:42 -08:00 |
|
Daoud Clarke
|
50a059410b
|
Merge pull request #93 from mwmbl/add-code-of-conduct-1
Create CODE_OF_CONDUCT.md
|
2023-02-15 20:36:31 +00:00 |
|
Rishabh Singh Ahluwalia
|
084a870f65
|
Merge pull request #92 from mwmbl/rishabh-add-launch-json
Add launch.json for vscode run/debugging
|
2023-02-12 07:17:47 -08:00 |
|
Daoud Clarke
|
68ecdee145
|
Create CONTRIBUTING.md
|
2023-02-11 15:17:35 +00:00 |
|
Daoud Clarke
|
3a07fb54b5
|
Create CODE_OF_CONDUCT.md
|
2023-02-11 15:13:08 +00:00 |
|
Daoud Clarke
|
d8dbe54f9c
|
Update README.md
|
2023-02-11 15:10:30 +00:00 |
|
Daoud Clarke
|
2daf902ca3
|
Merge pull request #90 from mwmbl/m1-mmap-issue-fix-2
Offset by metadata size manually to increase compatibility
|
2023-02-11 08:30:46 +00:00 |
|
Rishabh Singh Ahluwalia
|
7fdc8480bd
|
add launch.json for vscode debugging
|
2023-02-10 20:59:09 -08:00 |
|
Daoud Clarke
|
e890e56661
|
Offset by metadata size manually to increase compatibility
|
2023-02-05 15:49:09 +00:00 |
|
Daoud Clarke
|
5783cee6b7
|
Fix bugs
|
2023-01-24 22:52:58 +00:00 |
|
Daoud Clarke
|
77e39b4a89
|
Optimise URL update
|
2023-01-22 20:28:18 +00:00 |
|
Daoud Clarke
|
66700f8a3e
|
Speed up domain parsing
|
2023-01-20 20:53:50 +00:00 |
|
Daoud Clarke
|
2b36f2ccc1
|
Try and balance URLs before adding to queue
|
2023-01-19 21:56:40 +00:00 |
|
Daoud Clarke
|
603fcd4eb2
|
Create a custom URL queue
|
2023-01-14 21:59:31 +00:00 |
|
Daoud Clarke
|
01f08fd88d
|
Return updated URLs
|
2023-01-14 19:17:16 +00:00 |
|
Daoud Clarke
|
bd0cc3863e
|
Don't try and update an empty list of URLs
|
2023-01-09 21:02:40 +00:00 |
|
Daoud Clarke
|
d347a17d63
|
Update URL queue separately from the other background process to speed it up
|
2023-01-09 20:50:28 +00:00 |
|
Daoud Clarke
|
7bd12c1ead
|
Fix some bugs in URL fetching query
|
2023-01-02 20:51:23 +00:00 |
|
Daoud Clarke
|
a50f1d8ae3
|
Fix postgres install
|
2023-01-02 12:19:10 +00:00 |
|
Daoud Clarke
|
1ab16b1fb4
|
Install postgres client
|
2023-01-02 12:18:03 +00:00 |
|
Daoud Clarke
|
dda5a25ad0
|
Add core domains
|
2023-01-02 12:05:22 +00:00 |
|
Daoud Clarke
|
ab37bbe0a5
|
Exclude google plus
|
2023-01-01 22:18:47 +00:00 |
|
Daoud Clarke
|
2336ed7f7d
|
Allow posting extra links with lower score weighting
|
2023-01-01 20:37:41 +00:00 |
|
Daoud Clarke
|
6edf48693b
|
Check the domain is correct, potential bug in psql
|
2023-01-01 01:30:44 +00:00 |
|
Daoud Clarke
|
b7984684c9
|
Tidy, improve logging
|
2023-01-01 01:14:05 +00:00 |
|
Daoud Clarke
|
7c14cd99f8
|
Update the URL queue earlier
|
2022-12-31 23:37:59 +00:00 |
|
Daoud Clarke
|
0d33b4f68f
|
Merge pull request #86 from mwmbl/improve-crawling
Improve crawling
|
2022-12-31 22:56:21 +00:00 |
|
Daoud Clarke
|
a86e172bf3
|
Reinstate background tasks
|
2022-12-31 22:52:17 +00:00 |
|
Daoud Clarke
|
d9cd3c585b
|
Get results from other domains
|
2022-12-31 22:51:00 +00:00 |
|
Daoud Clarke
|
77f08d8f0a
|
Update URL status
|
2022-12-31 22:25:05 +00:00 |
|
Daoud Clarke
|
36af579f7c
|
Sample domains
|
2022-12-31 17:04:38 +00:00 |
|
Daoud Clarke
|
ea16e7b5cd
|
WIP: improve method of getting URLs for crawling
|
2022-12-31 13:37:40 +00:00 |
|
Daoud Clarke
|
7dae39b780
|
WIP: improve method of getting URLs for crawling
|
2022-12-31 13:32:15 +00:00 |
|