Commit graph

447 commits

Author SHA1 Message Date
Daoud Clarke
f5afbed2e5 Handle empty list 2022-02-25 22:11:09 +00:00
Daoud Clarke
efafec5214 Rank using item score as well as match score 2022-02-25 22:08:37 +00:00
Daoud Clarke
e1e9e404a3 Dedupe before indexing 2022-02-24 22:01:42 +00:00
Daoud Clarke
f5b20d0128 Index link counts 2022-02-24 20:47:36 +00:00
Daoud Clarke
b5b2005323 Store computed link counts 2022-02-23 22:13:38 +00:00
Daoud Clarke
00d18c3474 Remove unused code 2022-02-23 21:59:24 +00:00
Daoud Clarke
d19e0e51f7
Merge pull request #47 from mwmbl/include-metadata-in-index
Include metadata in index
2022-02-23 21:10:24 +00:00
Daoud Clarke
04a33a134b Fixes to mwmbl API for changes to the index 2022-02-22 22:27:02 +00:00
Daoud Clarke
ae3b334a7f Fixes for API changes 2022-02-22 22:12:39 +00:00
Daoud Clarke
326f7e3d7f Use JSON instead of struct to store metadata 2022-02-18 22:22:47 +00:00
Daoud Clarke
e6273c7f76 WIP: include metadata in index - using struct approach 2022-02-18 22:12:22 +00:00
Daoud Clarke
82c46b50bc
Merge pull request #46 from mwmbl/refactor-for-evaluation
Refactor to enable easier evaluation
2022-02-16 21:28:21 +00:00
Daoud Clarke
e03e379ccf Refactor to enable easier evaluation 2022-02-09 22:43:47 +00:00
Daoud Clarke
4e36ee198c
Merge pull request #42 from mwmbl/update-readme-for-new-crawler
Update readme for recent changes
2022-02-04 23:26:11 +00:00
Daoud Clarke
c4e86ce313 Update readme for recent changes 2022-02-04 22:07:09 +00:00
Daoud Clarke
51f2dd2690 Merge branch 'master' of github.com:mwmbl/mwmbl 2022-02-04 21:49:40 +00:00
Daoud Clarke
9f78d19c8c
Merge pull request #41 from ColinEspinas/add-branding
Add branding to readme
2022-02-04 21:28:41 +00:00
ColinEspinas
b2e01d33e8 docs: better title display on readme 2022-02-04 20:53:55 +01:00
Colin Espinas
95c9bcfe3b Merge branch 'mwmbl:master' into add-branding 2022-02-04 20:51:38 +01:00
ColinEspinas
cd57372a84 docs: added branding to readme and required assets files 2022-02-04 20:50:43 +01:00
Daoud Clarke
6e5e56f99a New index; more pages 2022-02-04 18:08:23 +00:00
Daoud Clarke
bdf0fd1797
Merge pull request #39 from mwmbl/analyse-links
Analyse links
2022-02-03 19:33:52 +00:00
Daoud Clarke
2fc999b402 Count unique domains instead of links 2022-02-02 20:09:59 +00:00
Daoud Clarke
26e90c6e57 Merge branch 'master' into analyse-links 2022-02-02 19:48:47 +00:00
Daoud Clarke
07d4b36052
Merge pull request #38 from mwmbl/stop-indexing-partial-words
Improve handling of partial words
2022-02-02 19:48:31 +00:00
Daoud Clarke
d77b72d7df Analyse links to find most popular ones 2022-02-02 19:47:38 +00:00
Daoud Clarke
fe6ace93e6 Improve handling of incomplete words:
- Correctly generate regex for incomplete vs complete words
 - Return more than one top word from completer
 - Correctly handle no terms
2022-01-31 21:20:59 +00:00
Daoud Clarke
7d829bc319 Use python 3.10; complete terms 2022-01-30 23:24:00 +00:00
Daoud Clarke
3c75dd1a74 WIP: implement term completer 2022-01-30 22:20:28 +00:00
Daoud Clarke
01a21337a9 Don't index partial words 2022-01-30 14:30:02 +00:00
Daoud Clarke
2ef8304919 Remove some debug print statements 2022-01-30 13:16:24 +00:00
Daoud Clarke
66696ad76b
Merge pull request #37 from mwmbl/index-mwmbl-crawl
Index mwmbl crawl
2022-01-30 13:12:06 +00:00
Daoud Clarke
5b89bbf05d Index Mwmbl crawled data 2022-01-29 08:26:42 +00:00
Daoud Clarke
ef36513f64 Analyse the pages that are crawled most often 2022-01-29 07:06:53 +00:00
Daoud Clarke
70254ae160 Analyse crawled URLs and domains 2022-01-26 18:51:58 +00:00
Daoud Clarke
171fa645d2 Add script to export top domains 2022-01-23 22:04:30 +00:00
Daoud Clarke
908a9cf0b6
Merge pull request #36 from ColinEspinas/remove-old-frontend
Remove old front-end files and routes
2022-01-20 18:06:54 +00:00
ColinEspinas
3481ad372b Removed old front-end files and routes 2022-01-19 23:33:37 +01:00
Daoud Clarke
a41088ca9a Add CORS; revert back to previous index as it timed out deploying 2022-01-03 18:31:03 +00:00
Daoud Clarke
25918e42ef Export URLs to sqlite for evaluation purposes 2022-01-02 20:06:13 +00:00
Daoud Clarke
ae7312c32a
Merge pull request #31 from nitred/fix-python-m-run
Using the app object to start uvicorn, instead of using a reference like "mwmbl.tinysearchengine.app:app"
2021-12-31 22:11:15 +00:00
nitred
fbdb93c86a Using the app object to start uvicorn, instead of using a reference like "mwmbl.tinysearchengine.app:app"
- fixes the issue when running the server using python -m mwmbl.tinysearchengine.app

When running the server using python -m, uvicorn seems to spawn a new process or interpreter session.
At least it appears that way since already initialized & imported modules and variables appear to be uninitialized.
2021-12-31 02:15:16 +01:00
Daoud Clarke
e6655101ef Add a component of the HN domain score when ranking 2021-12-30 22:20:10 +00:00
Daoud Clarke
f347fe29ac Add .gcloudignore file to fix gcloud run deploy 2021-12-30 21:17:18 +00:00
Daoud Clarke
3f74229ae9
Explain pronounciation 2021-12-30 20:35:11 +00:00
Daoud Clarke
02bcef640c
Merge pull request #25 from ColinEspinas/search-debounce
Added debounce on search input
2021-12-29 20:59:29 +00:00
Daoud Clarke
3d7e655ebc
Merge pull request #24 from nitred/config-and-entrypoint
added config and binary/entrypoint for mwmbl.tinysearchengine
2021-12-29 20:54:23 +00:00
ColinEspinas
c636be9089 Added debounce on search input (#8) 2021-12-29 21:03:47 +01:00
nitred
a72a08a7d9 added config and binary/entrypoint for mwmbl.tinysearchengine
- using pydantic to validate the config
- added a default bootstrap config at config/tinysearchengine.yaml
- refactored app.py to include parsing CLI argument using argparse
- refactored app.py to use fewer global variables
- added "mwmbl-tinysearchengine" binary/entrypoint in pyproject.toml
- updated Dockerfile to work with these changes and added comments to it
2021-12-29 15:26:33 +01:00
Daoud Clarke
da8797f5ef
Merge pull request #18 from nitred/mwmbl-package
renamed package to mwmbl
2021-12-29 09:34:05 +00:00