Daoud Clarke
|
f7660bcd27
|
Merge pull request #73 from mwmbl/completion
Completion
|
2022-08-13 23:55:22 +01:00 |
|
Daoud Clarke
|
627f82d19f
|
Suggest searching Google if there are no search results
|
2022-08-13 23:54:57 +01:00 |
|
Daoud Clarke
|
f1c77d1389
|
Search google if there are no results
|
2022-08-13 23:47:48 +01:00 |
|
Daoud Clarke
|
fe5eff7b64
|
Exclude web.archive.org as we're only crawling that right now
|
2022-08-13 10:52:31 +01:00 |
|
Daoud Clarke
|
00705703f3
|
Require matching at least half the terms
|
2022-08-11 23:27:30 +01:00 |
|
Daoud Clarke
|
eda7870788
|
Restrict to https and strip the prefix and / on the end
|
2022-08-11 22:23:14 +01:00 |
|
Daoud Clarke
|
23e47e963b
|
Simplify completions
|
2022-08-11 17:34:52 +01:00 |
|
Daoud Clarke
|
c6773b46c4
|
Merge pull request #72 from mwmbl/improve-ranking-with-multi-term-search
Improve ranking with multi term search
|
2022-08-10 21:43:51 +01:00 |
|
Daoud Clarke
|
74107667b4
|
Improve printing of search results in script
|
2022-08-10 21:43:13 +01:00 |
|
Daoud Clarke
|
3bcb7f42c1
|
Use heuristic ranker
|
2022-08-09 22:56:12 +01:00 |
|
Daoud Clarke
|
c1b9e70743
|
Add new LTR model
|
2022-08-09 22:47:59 +01:00 |
|
Daoud Clarke
|
57476ed2c8
|
Tweak features
|
2022-08-09 22:23:36 +01:00 |
|
Daoud Clarke
|
c99e813398
|
Get best-performing configuration
|
2022-08-09 20:56:15 +01:00 |
|
Daoud Clarke
|
8b50643303
|
Add in match score feature (although it hurts the results)
|
2022-08-09 00:08:55 +01:00 |
|
Daoud Clarke
|
c60b73a403
|
Create a get_features function and make it work like the heuristic approach
|
2022-08-08 23:42:34 +01:00 |
|
Daoud Clarke
|
c1d361c0a0
|
New LTR model trained on more data
|
2022-08-08 22:52:37 +01:00 |
|
Daoud Clarke
|
b99d9d1c6a
|
Search for the term itself as well as its completion
|
2022-08-08 22:51:09 +01:00 |
|
Daoud Clarke
|
f40d82c449
|
Allow running with no background script
|
2022-08-01 23:33:02 +01:00 |
|
Daoud Clarke
|
046f86f7e3
|
Merge pull request #71 from mwmbl/fix-missing-scores
Store the best items, not the worst ones
|
2022-08-01 23:32:24 +01:00 |
|
Daoud Clarke
|
ae658906dd
|
Store the best items, not the worst ones
|
2022-07-31 22:55:15 +01:00 |
|
Daoud Clarke
|
aa5878fd2f
|
Merge pull request #70 from mwmbl/reduce-new-batch-contention
Reduce new batch contention
|
2022-07-31 21:02:05 +01:00 |
|
Daoud Clarke
|
fc1742e24f
|
Reinstate correct num_pages
|
2022-07-31 00:45:00 +01:00 |
|
Daoud Clarke
|
bb5186196f
|
Use an in-memory queue
|
2022-07-31 00:43:58 +01:00 |
|
Daoud Clarke
|
62ba9ddc7e
|
Use a randomised timeout for getting a new batch
|
2022-07-30 23:10:37 +01:00 |
|
Daoud Clarke
|
a54e093cf1
|
Merge pull request #69 from mwmbl/reduce-contention-for-client-queries
Reduce contention for client queries
|
2022-07-30 17:11:34 +01:00 |
|
Daoud Clarke
|
2942d83673
|
Get URL scores in batches
|
2022-07-30 14:35:21 +01:00 |
|
Daoud Clarke
|
3709cb236f
|
Use correct index path; retrieve historical batches
|
2022-07-30 11:08:15 +01:00 |
|
Daoud Clarke
|
063ebb4504
|
args.index no longer exists
|
2022-07-30 10:57:15 +01:00 |
|
Daoud Clarke
|
ea32c0ba00
|
Double index size
|
2022-07-30 10:37:07 +01:00 |
|
Daoud Clarke
|
2d5235f6f6
|
More threads for retrieving batches
|
2022-07-30 10:10:11 +01:00 |
|
Daoud Clarke
|
218d873654
|
Delete unused SQL
|
2022-07-30 10:10:03 +01:00 |
|
Daoud Clarke
|
6209382d76
|
Index batches in memory
|
2022-07-24 15:44:01 +01:00 |
|
Daoud Clarke
|
1bceeae3df
|
Implement new indexing approach
|
2022-07-23 23:19:36 +01:00 |
|
Daoud Clarke
|
a8a6c67239
|
Use URL path to store locally so that we can easily get a local path from a URL
|
2022-07-20 22:21:35 +01:00 |
|
Daoud Clarke
|
0d1e7d841c
|
Implement a batch cache to store files locally before preprocessing
|
2022-07-19 21:18:43 +01:00 |
|
Daoud Clarke
|
27a4784d08
|
Merge pull request #68 from mwmbl/fix-missing-query
Fix missing query
|
2022-07-19 20:17:20 +01:00 |
|
Daoud Clarke
|
5ce333cc9a
|
Log at info level
|
2022-07-18 23:46:01 +01:00 |
|
Daoud Clarke
|
a097ec9fbe
|
Allow more tries so that popular terms can be indexed
|
2022-07-18 23:42:09 +01:00 |
|
Daoud Clarke
|
cfca015efe
|
Enough preprocessing
|
2022-07-18 22:36:37 +01:00 |
|
Daoud Clarke
|
003cd217f4
|
Run preprocessing
|
2022-07-18 22:21:20 +01:00 |
|
Daoud Clarke
|
bcd31326b8
|
Just index a single page for now
|
2022-07-18 22:17:15 +01:00 |
|
Daoud Clarke
|
a471bc2437
|
Use a more specific exception in case we're discarding ones we shouldn't
|
2022-07-18 22:05:24 +01:00 |
|
Daoud Clarke
|
ce9f52267a
|
Run update
|
2022-07-18 21:55:27 +01:00 |
|
Daoud Clarke
|
09a9390c92
|
Catch corrupt data
|
2022-07-18 21:40:38 +01:00 |
|
Daoud Clarke
|
93307ad1ec
|
Add util script to send batch; add logging
|
2022-07-18 21:37:19 +01:00 |
|
Daoud Clarke
|
3c97fdb3a0
|
Merge pull request #66 from mwmbl/fix-unicode-encode-error
Fix unicode encode error; bigger index
|
2022-07-16 10:59:14 +01:00 |
|
Daoud Clarke
|
680fe1ca0c
|
Fix unicode encoding error
|
2022-07-16 10:54:25 +01:00 |
|
Daoud Clarke
|
e1e1b0057b
|
Merge pull request #61 from milovanderlinden/issue-60-consistent-use-of-env-vars
Fix issue #60
|
2022-07-10 21:06:09 +01:00 |
|
Daoud Clarke
|
fee5cbb400
|
10x index size
|
2022-07-10 17:15:10 +01:00 |
|
milovanderlinden
|
dfd3f3962e
|
Fix issue #60
|
2022-07-10 11:10:03 +02:00 |
|