Commit graph

30 commits

Author SHA1 Message Date
Daoud Clarke
bd017079d5 Add login using allauth 2023-10-24 10:32:06 +01:00
Daoud Clarke
db658daa88 Store stats in redis 2023-09-28 17:48:29 +01:00
Daoud Clarke
177324353f Merge branch 'main' into django-rewrite 2023-09-23 12:56:22 +01:00
Daoud Clarke
019095a4c1 Exclude blacklisted domains 2023-09-22 21:53:53 +01:00
Daoud Clarke
19cc196e34 Add django ninja 2023-09-22 19:56:42 +01:00
Daoud Clarke
4aefc48716 Add django 2023-08-27 07:37:15 +01:00
Daoud Clarke
8851d86ff4 Justext is not extra 2023-05-20 22:17:23 +01:00
Daoud Clarke
8d64af4f1b Keep track of curated couments 2023-04-30 18:25:48 +01:00
Rishabh Singh Ahluwalia
30aff3b920 Add pytest, unit tests for completer,gh actions ci 2023-02-22 21:37:10 -08:00
Masanori Ogino
71187a3938 Rework installation of spaCy models for clarity
- Install the wheel package for compatibility with future pip
- Use `spacy download` for installing model(s)
- Use `spacy validate` for checking model compatibility explicitly

Signed-off-by: Masanori Ogino <167209+omasanori@users.noreply.github.com>
2022-12-27 11:33:52 +09:00
Daoud Clarke
d400950689 Add script to process historical data 2022-06-18 15:31:35 +01:00
Daoud Clarke
a003914e91 Fix boto3 dependency 2022-06-17 22:14:55 +01:00
Daoud Clarke
363103468e Update Dockerfile for changes 2022-06-17 21:26:21 +01:00
Daoud Clarke
e2eb405083 Combine crawler and search servers 2022-06-16 22:49:41 +01:00
Daoud Clarke
aaca8b2b6e Record historical batches via the API 2022-06-05 09:15:04 +01:00
Daoud Clarke
af6a28fac3 Implement learning to rank feature extraction and thresholding 2022-03-20 22:01:45 +00:00
Daoud Clarke
e6273c7f76 WIP: include metadata in index - using struct approach 2022-02-18 22:12:22 +00:00
Daoud Clarke
7d829bc319 Use python 3.10; complete terms 2022-01-30 23:24:00 +00:00
nitred
a72a08a7d9 added config and binary/entrypoint for mwmbl.tinysearchengine
- using pydantic to validate the config
- added a default bootstrap config at config/tinysearchengine.yaml
- refactored app.py to include parsing CLI argument using argparse
- refactored app.py to use fewer global variables
- added "mwmbl-tinysearchengine" binary/entrypoint in pyproject.toml
- updated Dockerfile to work with these changes and added comments to it
2021-12-29 15:26:33 +01:00
nitred
be40a15b27 Merge branch 'master' into mwmbl-package 2021-12-29 00:25:37 +01:00
nitred
11eedcde84 renamed package to mwmbl
- renamed package to mwmbl in pyproject.toml
- tinysearchengine and indexer modules have been moved into mwmbl package folder
- analyse module has been left as is in the root of the repo
- import statements in tinysearchengine now use mwmbl.tinysearchengine
- import statements in indexer now use mwmbl.indexer or mwmbl.tinysearchengine or relative imports like .paths
- import statements in analyse now use mwmbl.indexer or mwmbl.tinysearchengine
- final CMD in Dockerfile now uses updated path mwmbl.tinysearchengine.app
- fixed a couple of import statement errors in tinysearchengine/indexer.py
2021-12-28 12:35:46 +01:00
nitred
c02c052281 Fixes #12, Added dependencies for indexer as extra or extra_requires
- dependencies for indexer can be installed using "pip install .[indexer]" or "poetry install -E indexer"
2021-12-27 15:46:24 +01:00
Daoud Clarke
8cfb8b7a44 Remove debug print code 2021-12-26 08:47:33 +00:00
Daoud Clarke
9c65bf3c8f WIP: implement docker image. TODO: copy index and set the correct index path using env var 2021-12-22 23:21:23 +00:00
Daoud Clarke
23eb341832 Add search page 2021-12-14 22:01:59 +00:00
Daoud Clarke
2844c1df75 Index common crawl data 2021-12-13 11:23:01 +00:00
Daoud Clarke
65b366d30d Add spacy 2021-12-12 20:58:44 +00:00
Daoud Clarke
c46257c6d1 Use our own filesystem-based queue 2021-12-11 16:57:17 +00:00
Daoud Clarke
14817d7657 Optimise imports 2021-12-05 20:38:05 +00:00
Daoud Clarke
312f32bf61 Add common crawl extract script and dependency management with poetry 2021-12-05 20:31:49 +00:00