Commit graph

9 commits

Author SHA1 Message Date
Daoud Clarke
ef36513f64 Analyse the pages that are crawled most often 2022-01-29 07:06:53 +00:00
Daoud Clarke
70254ae160 Analyse crawled URLs and domains 2022-01-26 18:51:58 +00:00
Daoud Clarke
171fa645d2 Add script to export top domains 2022-01-23 22:04:30 +00:00
Daoud Clarke
25918e42ef Export URLs to sqlite for evaluation purposes 2022-01-02 20:06:13 +00:00
nitred
11eedcde84 renamed package to mwmbl
- renamed package to mwmbl in pyproject.toml
- tinysearchengine and indexer modules have been moved into mwmbl package folder
- analyse module has been left as is in the root of the repo
- import statements in tinysearchengine now use mwmbl.tinysearchengine
- import statements in indexer now use mwmbl.indexer or mwmbl.tinysearchengine or relative imports like .paths
- import statements in analyse now use mwmbl.indexer or mwmbl.tinysearchengine
- final CMD in Dockerfile now uses updated path mwmbl.tinysearchengine.app
- fixed a couple of import statement errors in tinysearchengine/indexer.py
2021-12-28 12:35:46 +01:00
Daoud Clarke
baede32298 Move indexer code to a separate package 2021-12-26 08:55:09 +00:00
Daoud Clarke
9c65bf3c8f WIP: implement docker image. TODO: copy index and set the correct index path using env var 2021-12-22 23:21:23 +00:00
Daoud Clarke
9ee6f37a60 Analysis to confirm that 'leek and potato soup' page was really missing 2021-12-19 21:09:00 +00:00
Daoud Clarke
4cbed29c08 Show the extract 2021-12-19 20:48:28 +00:00