Daoud Clarke
|
a003914e91
|
Fix boto3 dependency
|
2022-06-17 22:14:55 +01:00 |
|
Daoud Clarke
|
363103468e
|
Update Dockerfile for changes
|
2022-06-17 21:26:21 +01:00 |
|
Daoud Clarke
|
e2eb405083
|
Combine crawler and search servers
|
2022-06-16 22:49:41 +01:00 |
|
Daoud Clarke
|
7771657684
|
Merge pull request #53 from mwmbl/record-historical-batches
Record historical batches
|
2022-06-16 22:09:12 +01:00 |
|
Daoud Clarke
|
14107acc75
|
Use new server
|
2022-06-09 22:24:54 +01:00 |
|
Daoud Clarke
|
aaca8b2b6e
|
Record historical batches via the API
|
2022-06-05 09:15:04 +01:00 |
|
Daoud Clarke
|
617666e3b7
|
Merge pull request #51 from mwmbl/learning-to-rank
Learning to rank
|
2022-06-04 12:36:15 +01:00 |
|
Daoud Clarke
|
770b4b945b
|
Refactor feature extraction
|
2022-05-07 22:52:36 +01:00 |
|
Daoud Clarke
|
87d8b40cad
|
Make order_results public
|
2022-05-06 23:15:50 +01:00 |
|
Daoud Clarke
|
229819e57e
|
Refactor to allow LTR ranker
|
2022-03-27 22:32:44 +01:00 |
|
Daoud Clarke
|
94287cec01
|
Get features for each string separately
|
2022-03-21 21:49:10 +00:00 |
|
Daoud Clarke
|
4740d89c6a
|
Add domain score feature
|
2022-03-21 21:13:20 +00:00 |
|
Daoud Clarke
|
af6a28fac3
|
Implement learning to rank feature extraction and thresholding
|
2022-03-20 22:01:45 +00:00 |
|
Daoud Clarke
|
2d334074af
|
Make get_results() public for learning to rank
|
2022-03-20 17:25:54 +00:00 |
|
Daoud Clarke
|
ee5ca6bcf6
|
Experiment with score variations (best is simple weighted domain score)
|
2022-02-27 21:24:16 +00:00 |
|
Daoud Clarke
|
6fb310c363
|
Use addition instead of multiplication
|
2022-02-25 22:19:26 +00:00 |
|
Daoud Clarke
|
4e6516ccf1
|
Scale by 0.99
|
2022-02-25 22:14:49 +00:00 |
|
Daoud Clarke
|
f5afbed2e5
|
Handle empty list
|
2022-02-25 22:11:09 +00:00 |
|
Daoud Clarke
|
efafec5214
|
Rank using item score as well as match score
|
2022-02-25 22:08:37 +00:00 |
|
Daoud Clarke
|
e1e9e404a3
|
Dedupe before indexing
|
2022-02-24 22:01:42 +00:00 |
|
Daoud Clarke
|
f5b20d0128
|
Index link counts
|
2022-02-24 20:47:36 +00:00 |
|
Daoud Clarke
|
b5b2005323
|
Store computed link counts
|
2022-02-23 22:13:38 +00:00 |
|
Daoud Clarke
|
00d18c3474
|
Remove unused code
|
2022-02-23 21:59:24 +00:00 |
|
Daoud Clarke
|
d19e0e51f7
|
Merge pull request #47 from mwmbl/include-metadata-in-index
Include metadata in index
|
2022-02-23 21:10:24 +00:00 |
|
Daoud Clarke
|
04a33a134b
|
Fixes to mwmbl API for changes to the index
|
2022-02-22 22:27:02 +00:00 |
|
Daoud Clarke
|
ae3b334a7f
|
Fixes for API changes
|
2022-02-22 22:12:39 +00:00 |
|
Daoud Clarke
|
326f7e3d7f
|
Use JSON instead of struct to store metadata
|
2022-02-18 22:22:47 +00:00 |
|
Daoud Clarke
|
e6273c7f76
|
WIP: include metadata in index - using struct approach
|
2022-02-18 22:12:22 +00:00 |
|
Daoud Clarke
|
82c46b50bc
|
Merge pull request #46 from mwmbl/refactor-for-evaluation
Refactor to enable easier evaluation
|
2022-02-16 21:28:21 +00:00 |
|
Daoud Clarke
|
e03e379ccf
|
Refactor to enable easier evaluation
|
2022-02-09 22:43:47 +00:00 |
|
Daoud Clarke
|
4e36ee198c
|
Merge pull request #42 from mwmbl/update-readme-for-new-crawler
Update readme for recent changes
|
2022-02-04 23:26:11 +00:00 |
|
Daoud Clarke
|
c4e86ce313
|
Update readme for recent changes
|
2022-02-04 22:07:09 +00:00 |
|
Daoud Clarke
|
51f2dd2690
|
Merge branch 'master' of github.com:mwmbl/mwmbl
|
2022-02-04 21:49:40 +00:00 |
|
Daoud Clarke
|
9f78d19c8c
|
Merge pull request #41 from ColinEspinas/add-branding
Add branding to readme
|
2022-02-04 21:28:41 +00:00 |
|
ColinEspinas
|
b2e01d33e8
|
docs: better title display on readme
|
2022-02-04 20:53:55 +01:00 |
|
Colin Espinas
|
95c9bcfe3b
|
Merge branch 'mwmbl:master' into add-branding
|
2022-02-04 20:51:38 +01:00 |
|
ColinEspinas
|
cd57372a84
|
docs: added branding to readme and required assets files
|
2022-02-04 20:50:43 +01:00 |
|
Daoud Clarke
|
6e5e56f99a
|
New index; more pages
|
2022-02-04 18:08:23 +00:00 |
|
Daoud Clarke
|
bdf0fd1797
|
Merge pull request #39 from mwmbl/analyse-links
Analyse links
|
2022-02-03 19:33:52 +00:00 |
|
Daoud Clarke
|
2fc999b402
|
Count unique domains instead of links
|
2022-02-02 20:09:59 +00:00 |
|
Daoud Clarke
|
26e90c6e57
|
Merge branch 'master' into analyse-links
|
2022-02-02 19:48:47 +00:00 |
|
Daoud Clarke
|
07d4b36052
|
Merge pull request #38 from mwmbl/stop-indexing-partial-words
Improve handling of partial words
|
2022-02-02 19:48:31 +00:00 |
|
Daoud Clarke
|
d77b72d7df
|
Analyse links to find most popular ones
|
2022-02-02 19:47:38 +00:00 |
|
Daoud Clarke
|
fe6ace93e6
|
Improve handling of incomplete words:
- Correctly generate regex for incomplete vs complete words
- Return more than one top word from completer
- Correctly handle no terms
|
2022-01-31 21:20:59 +00:00 |
|
Daoud Clarke
|
7d829bc319
|
Use python 3.10; complete terms
|
2022-01-30 23:24:00 +00:00 |
|
Daoud Clarke
|
3c75dd1a74
|
WIP: implement term completer
|
2022-01-30 22:20:28 +00:00 |
|
Daoud Clarke
|
01a21337a9
|
Don't index partial words
|
2022-01-30 14:30:02 +00:00 |
|
Daoud Clarke
|
2ef8304919
|
Remove some debug print statements
|
2022-01-30 13:16:24 +00:00 |
|
Daoud Clarke
|
66696ad76b
|
Merge pull request #37 from mwmbl/index-mwmbl-crawl
Index mwmbl crawl
|
2022-01-30 13:12:06 +00:00 |
|
Daoud Clarke
|
5b89bbf05d
|
Index Mwmbl crawled data
|
2022-01-29 08:26:42 +00:00 |
|