Daoud Clarke
|
9c65bf3c8f
|
WIP: implement docker image. TODO: copy index and set the correct index path using env var
|
2021-12-22 23:21:23 +00:00 |
|
Daoud Clarke
|
f754b38f71
|
Prevent default for up and down keys
|
2021-12-20 23:21:40 +00:00 |
|
Daoud Clarke
|
8f8fc43c9f
|
Improve focus on reload, back, etc
|
2021-12-20 23:15:42 +00:00 |
|
Daoud Clarke
|
202ef35d7a
|
Make Enter key work when pressing Enter
|
2021-12-20 23:05:22 +00:00 |
|
Daoud Clarke
|
30a00425ae
|
Follow selected item on enter
|
2021-12-20 23:02:12 +00:00 |
|
Daoud Clarke
|
5e7c5a905e
|
Select item with arrow keys
|
2021-12-20 21:28:01 +00:00 |
|
Daoud Clarke
|
2d7bb0efd7
|
Add background colour and hover highlighting
|
2021-12-20 20:55:46 +00:00 |
|
Daoud Clarke
|
c22f522c07
|
Improve styling
|
2021-12-19 22:48:53 +00:00 |
|
Daoud Clarke
|
7c745ef87b
|
Show the URL
|
2021-12-19 22:34:44 +00:00 |
|
Daoud Clarke
|
585f4bd00c
|
Format extract differently
|
2021-12-19 22:16:01 +00:00 |
|
Daoud Clarke
|
734798e4de
|
Prefer items that find the result early on
|
2021-12-19 21:38:17 +00:00 |
|
Daoud Clarke
|
9ee6f37a60
|
Analysis to confirm that 'leek and potato soup' page was really missing
|
2021-12-19 21:09:00 +00:00 |
|
Daoud Clarke
|
4cbed29c08
|
Show the extract
|
2021-12-19 20:48:28 +00:00 |
|
Daoud Clarke
|
16121d2b19
|
Index extracts
|
2021-12-18 22:56:39 +00:00 |
|
Daoud Clarke
|
4fa1c4a39a
|
Filter results with low scores
|
2021-12-18 22:35:59 +00:00 |
|
Daoud Clarke
|
6b72a056b2
|
Improve results ordering
|
2021-12-18 12:42:04 +00:00 |
|
Daoud Clarke
|
cc290bfc07
|
Bold search terms in results
|
2021-12-17 21:31:26 +00:00 |
|
Daoud Clarke
|
e4d2a45d6c
|
Add css
|
2021-12-16 22:26:50 +00:00 |
|
Daoud Clarke
|
1d8b37add1
|
Set cursor at the end of the input
|
2021-12-16 21:55:04 +00:00 |
|
Daoud Clarke
|
af29b4c039
|
Update results as you type
|
2021-12-16 21:36:01 +00:00 |
|
Daoud Clarke
|
23eb341832
|
Add search page
|
2021-12-14 22:01:59 +00:00 |
|
Daoud Clarke
|
869127c6ec
|
Add an error state
|
2021-12-14 19:59:31 +00:00 |
|
Daoud Clarke
|
2844c1df75
|
Index common crawl data
|
2021-12-13 11:23:01 +00:00 |
|
Daoud Clarke
|
65b366d30d
|
Add spacy
|
2021-12-12 20:58:44 +00:00 |
|
Daoud Clarke
|
16a8356a23
|
Run multiple processes in parallel
|
2021-12-12 09:09:44 +00:00 |
|
Daoud Clarke
|
34dc50a6ed
|
Output processed items to an output queue
|
2021-12-11 17:18:00 +00:00 |
|
Daoud Clarke
|
c46257c6d1
|
Use our own filesystem-based queue
|
2021-12-11 16:57:17 +00:00 |
|
Daoud Clarke
|
a76fd2d8f9
|
Use multiprocessing
|
2021-12-07 22:56:46 +00:00 |
|
Daoud Clarke
|
2d554b14e7
|
Save results to gzip file
|
2021-12-07 22:10:16 +00:00 |
|
Daoud Clarke
|
2562a5257a
|
Extract locally
|
2021-12-05 22:25:37 +00:00 |
|
Daoud Clarke
|
c151fe3777
|
Extract archive info
|
2021-12-05 21:42:23 +00:00 |
|
Daoud Clarke
|
a173db319b
|
Add EMR deploy scripts
|
2021-12-05 21:02:17 +00:00 |
|
Daoud Clarke
|
14817d7657
|
Optimise imports
|
2021-12-05 20:38:05 +00:00 |
|
Daoud Clarke
|
312f32bf61
|
Add common crawl extract script and dependency management with poetry
|
2021-12-05 20:31:49 +00:00 |
|
Daoud Clarke
|
896f782379
|
Improve typing of indexer
|
2021-06-13 21:41:19 +01:00 |
|
Daoud Clarke
|
0578f41a73
|
Limit number of chars used in query
|
2021-06-11 21:43:12 +01:00 |
|
Daoud Clarke
|
c81fc83900
|
Abstract index to allow storing anything
|
2021-06-05 22:22:31 +01:00 |
|
Daoud Clarke
|
fb5b6ffd45
|
Count terms
|
2021-05-30 21:30:34 +01:00 |
|
Daoud Clarke
|
62d22d9d52
|
Optimise imports
|
2021-05-30 20:46:39 +01:00 |
|
Daoud Clarke
|
16aec145d0
|
Replace dots in query with spaces
|
2021-05-25 21:47:19 +01:00 |
|
Daoud Clarke
|
550c6f6acc
|
Check for term in title
|
2021-05-25 21:21:38 +01:00 |
|
Daoud Clarke
|
d6cc81278f
|
Order results by Levenshtein distance to improve recall
|
2021-05-23 22:14:07 +01:00 |
|
Daoud Clarke
|
0e3069fdb3
|
Use top urls for performance test
|
2021-05-21 11:30:42 +01:00 |
|
Daoud Clarke
|
974f18647a
|
Index queued items
|
2021-05-19 21:48:03 +01:00 |
|
Daoud Clarke
|
87fd458218
|
Smaller queue
|
2021-05-19 21:25:12 +01:00 |
|
Daoud Clarke
|
cc841c8b7e
|
Use a filesystem-based queue
|
2021-05-05 22:16:27 +01:00 |
|
Daoud Clarke
|
7b4a3897b5
|
Set multithreading=True (but it doesn't seem to help)
|
2021-05-03 08:37:30 +01:00 |
|
Daoud Clarke
|
ba45d950ef
|
Catch connection errors
|
2021-04-25 11:41:44 +01:00 |
|
Daoud Clarke
|
61ce4bb832
|
Use queues
|
2021-04-25 08:55:15 +01:00 |
|
Daoud Clarke
|
e76ce691d0
|
Retrieve domain titles
|
2021-04-25 07:58:01 +01:00 |
|