Commit graph

1308 commits

Author SHA1 Message Date
Mikkel Denker
5dfeafcb0f faster optics 2023-03-21 21:48:43 +01:00
Mikkel Denker
bdd6bc0674 actually load highlightjs when needed 2023-03-21 15:06:40 +01:00
Mikkel Denker
ea231fc780 webgraph didn't properly merge segments. The new segment paths didn't line up with where the segment should actually be stored. This should now be fixed 2023-03-21 11:57:44 +01:00
Mikkel Denker
1f64c14a22 forgot to remove dbg 2023-03-20 16:18:43 +01:00
Mikkel Denker
687db2d8b0 only merge webgraph segments in the end 2023-03-20 15:36:48 +01:00
Mikkel Denker
2aa6b458c1 improve webgraph segment merge speed 2023-03-20 15:20:06 +01:00
Mikkel Denker
8277a0521c sometimes you just gotta say fuck async 2023-03-20 14:31:44 +01:00
Mikkel Denker
3a5b573cc4 use futures executor for single job instead of tokio since we got an error when trying to build the webgraph. It looked like rayon tried to spawn multiple tokio runtimes on the same thread due to the threadpool 2023-03-20 13:49:03 +01:00
Mikkel Denker
b003d21c4f hopefully improve webgraph merges by merging in parallel 2023-03-20 12:58:47 +01:00
Mikkel Denker
816b9660f1 better responsiveness mobile 2023-03-20 11:58:19 +01:00
Mikkel Denker
bec41d4f8b fixed new search button borders 2023-03-20 09:53:45 +01:00
Mikkel Denker
ad984251b4 new ui design 2023-03-20 09:47:36 +01:00
Mikkel Denker
913c5502c3 let number of webgraph segments depend on number of cores 2023-03-17 17:40:42 +01:00
Mikkel Denker
20f822dde4 Stackoverflow snippet box text cutoff 2023-03-17 17:26:23 +01:00
Mikkel Denker
2cd5e6568b a bunch of frontend quirks. Also made frontend lighter by loading less static files unless they are needed 2023-03-17 16:17:10 +01:00
Mikkel Denker
d6cb9cb316 simplify webgraph merge logic a little 2023-03-15 17:14:34 +01:00
Mikkel Denker
29ee8edfeb forgot to remove old index after merge 2023-03-15 15:29:25 +01:00
Mikkel Denker
2b36327fa1 way faster similarity index creation. Also improves inverted index merging by dividing the merge into num_cpu merges where each merge merges into a single segment 2023-03-15 14:20:57 +01:00
Mikkel Denker
0cc4cac100 memory mapped graph store 2023-03-14 15:46:53 +01:00
Mikkel Denker
80ac0bd4bc return ranking signals from api 2023-03-12 17:38:24 +01:00
Mikkel Denker
ba6cdd4da1 boost final score using optics, not just tantivy bm25 2023-03-12 16:25:17 +01:00
Mikkel Denker
44f61b98ce prepare for ltr models 2023-03-11 18:50:46 +01:00
Mikkel Denker
f74fe6934c Rename to stract 2023-03-06 09:43:54 +01:00
Mikkel Denker
2151fcfca8 chitchat dynamic cluster membership 2023-03-01 10:24:25 +01:00
Mikkel Denker
9ae23fc7a6 Log number of failed searches separately from number of successful searches 2023-02-28 11:38:53 +01:00
Mikkel Denker
a8dfdf5df8 Hours are rounded, not stripped 2023-02-28 10:39:39 +01:00
Mikkel Denker
a24715d4d3 Update stored query link 2023-02-28 10:38:12 +01:00
Mikkel Denker
8f021da1f2 Update stored query link 2023-02-28 10:37:41 +01:00
Mikkel Denker
2714d9fcc3 Usage statistics 2023-02-28 10:35:52 +01:00
Mikkel Denker
b91d655b2f change configure command to use new object store 2023-02-26 20:25:53 +01:00
Mikkel Denker
53d72eb2f2 Export number of search requests as a prometheus metric 2023-02-21 16:22:24 +01:00
Mikkel Denker
a2fcf02218 Make index smaller by not storing positions for unecesarry fields 2023-02-21 10:34:17 +01:00
Mikkel Denker
74c9d1a133 Use site rankings for discussion widget 2023-02-20 16:42:43 +01:00
Mikkel Denker
cdc2cd2a6f Make next page arrow in-active when there are no next results 2023-02-20 16:35:13 +01:00
Mikkel Denker
6abf22abad Fix '’' not showing bug in summarizer 2023-02-20 16:22:03 +01:00
Mikkel Denker
650e4b6201 Ability to turn off QA model in config 2023-02-20 15:45:49 +01:00
Mikkel Denker
6d44ec0556 Prefer snippets that don't break sentences 2023-02-20 15:31:21 +01:00
Mikkel Denker
e04b5069b9 removed fastfield cache and introduced a fastfield reader instead 2023-02-18 15:45:03 +01:00
Mikkel Denker
a4c142f5f3 Merge index segments into num_segments/2 to only merge segments once every num_segments/2 index merges 2023-02-17 14:28:07 +01:00
Mikkel Denker
4947fccdfe Adjust just-text default parameters since we now have the option to only index webpages with clean text 2023-02-17 14:12:30 +01:00
Mikkel Denker
20d95f5104 Summarization feature somewhat finished 2023-02-14 14:29:28 +01:00
Mikkel Denker
e6b9cdce5f use beginning of text as extractive summary if query-specific summary fails 2023-02-14 13:51:07 +01:00
Mikkel Denker
963052417c cleanup overlapping passages from extractive summary 2023-02-14 13:33:44 +01:00
Mikkel Denker
38fa0762a9 summarization button and stream summary to frontend 2023-02-14 11:06:26 +01:00
Mikkel Denker
0309386dfa Stream output from summarization in iterator 2023-02-11 11:44:17 +01:00
Mikkel Denker
7d78b2ecde summarization model length norm 2023-02-07 20:16:52 +01:00
Mikkel Denker
2a1fa6109a abstractive summarization model with beam search 2023-02-07 15:11:23 +01:00
Mikkel Denker
785e610db8 Unravel recursive functions to avoid stackoverflow. Summarizer is also WIP (there might be some 'never used' warnings when compiling) 2023-02-03 15:57:39 +01:00
Mikkel Denker
a53fd68085 Tokenizer index out of bounds 2023-02-01 14:55:03 +01:00
Mikkel Denker
8b9753d4a0 Optionally skip some warc files when indexing 2023-02-01 13:06:26 +01:00