Commit graph

18 commits

Author SHA1 Message Date
Daoud Clarke
e2eb405083 Combine crawler and search servers 2022-06-16 22:49:41 +01:00
Daoud Clarke
14107acc75 Use new server 2022-06-09 22:24:54 +01:00
Daoud Clarke
aaca8b2b6e Record historical batches via the API 2022-06-05 09:15:04 +01:00
Daoud Clarke
f5b20d0128 Index link counts 2022-02-24 20:47:36 +00:00
Daoud Clarke
b5b2005323 Store computed link counts 2022-02-23 22:13:38 +00:00
Daoud Clarke
00d18c3474 Remove unused code 2022-02-23 21:59:24 +00:00
Daoud Clarke
e03e379ccf Refactor to enable easier evaluation 2022-02-09 22:43:47 +00:00
Daoud Clarke
2fc999b402 Count unique domains instead of links 2022-02-02 20:09:59 +00:00
Daoud Clarke
d77b72d7df Analyse links to find most popular ones 2022-02-02 19:47:38 +00:00
Daoud Clarke
ef36513f64 Analyse the pages that are crawled most often 2022-01-29 07:06:53 +00:00
Daoud Clarke
70254ae160 Analyse crawled URLs and domains 2022-01-26 18:51:58 +00:00
Daoud Clarke
171fa645d2 Add script to export top domains 2022-01-23 22:04:30 +00:00
Daoud Clarke
25918e42ef Export URLs to sqlite for evaluation purposes 2022-01-02 20:06:13 +00:00
nitred
11eedcde84 renamed package to mwmbl
- renamed package to mwmbl in pyproject.toml
- tinysearchengine and indexer modules have been moved into mwmbl package folder
- analyse module has been left as is in the root of the repo
- import statements in tinysearchengine now use mwmbl.tinysearchengine
- import statements in indexer now use mwmbl.indexer or mwmbl.tinysearchengine or relative imports like .paths
- import statements in analyse now use mwmbl.indexer or mwmbl.tinysearchengine
- final CMD in Dockerfile now uses updated path mwmbl.tinysearchengine.app
- fixed a couple of import statement errors in tinysearchengine/indexer.py
2021-12-28 12:35:46 +01:00
Daoud Clarke
baede32298 Move indexer code to a separate package 2021-12-26 08:55:09 +00:00
Daoud Clarke
9c65bf3c8f WIP: implement docker image. TODO: copy index and set the correct index path using env var 2021-12-22 23:21:23 +00:00
Daoud Clarke
9ee6f37a60 Analysis to confirm that 'leek and potato soup' page was really missing 2021-12-19 21:09:00 +00:00
Daoud Clarke
4cbed29c08 Show the extract 2021-12-19 20:48:28 +00:00