Daoud Clarke
1227ae33c8
Run poetry lock
2023-10-10 20:21:37 +01:00
Daoud Clarke
a55a027107
Store stats in redis
2023-09-29 13:37:54 +01:00
Daoud Clarke
019095a4c1
Exclude blacklisted domains
2023-09-22 21:53:53 +01:00
Daoud Clarke
8d64af4f1b
Keep track of curated couments
2023-04-30 18:25:48 +01:00
Rishabh Singh Ahluwalia
30aff3b920
Add pytest, unit tests for completer,gh actions ci
2023-02-22 21:37:10 -08:00
Daoud Clarke
d400950689
Add script to process historical data
2022-06-18 15:31:35 +01:00
Daoud Clarke
a003914e91
Fix boto3 dependency
2022-06-17 22:14:55 +01:00
Daoud Clarke
e2eb405083
Combine crawler and search servers
2022-06-16 22:49:41 +01:00
Daoud Clarke
aaca8b2b6e
Record historical batches via the API
2022-06-05 09:15:04 +01:00
Daoud Clarke
af6a28fac3
Implement learning to rank feature extraction and thresholding
2022-03-20 22:01:45 +00:00
Daoud Clarke
e6273c7f76
WIP: include metadata in index - using struct approach
2022-02-18 22:12:22 +00:00
Daoud Clarke
7d829bc319
Use python 3.10; complete terms
2022-01-30 23:24:00 +00:00
nitred
a72a08a7d9
added config and binary/entrypoint for mwmbl.tinysearchengine
...
- using pydantic to validate the config
- added a default bootstrap config at config/tinysearchengine.yaml
- refactored app.py to include parsing CLI argument using argparse
- refactored app.py to use fewer global variables
- added "mwmbl-tinysearchengine" binary/entrypoint in pyproject.toml
- updated Dockerfile to work with these changes and added comments to it
2021-12-29 15:26:33 +01:00
nitred
c02c052281
Fixes #12 , Added dependencies for indexer as extra or extra_requires
...
- dependencies for indexer can be installed using "pip install .[indexer]" or "poetry install -E indexer"
2021-12-27 15:46:24 +01:00
Daoud Clarke
9c65bf3c8f
WIP: implement docker image. TODO: copy index and set the correct index path using env var
2021-12-22 23:21:23 +00:00
Daoud Clarke
23eb341832
Add search page
2021-12-14 22:01:59 +00:00
Daoud Clarke
2844c1df75
Index common crawl data
2021-12-13 11:23:01 +00:00
Daoud Clarke
65b366d30d
Add spacy
2021-12-12 20:58:44 +00:00
Daoud Clarke
c46257c6d1
Use our own filesystem-based queue
2021-12-11 16:57:17 +00:00
Daoud Clarke
14817d7657
Optimise imports
2021-12-05 20:38:05 +00:00
Daoud Clarke
312f32bf61
Add common crawl extract script and dependency management with poetry
2021-12-05 20:31:49 +00:00