Commit graph

1308 commits

Author SHA1 Message Date
Mikkel Denker
be4696e57d I forgot to pop from cache in webgraph..... 2023-01-30 13:44:36 +01:00
Mikkel Denker
8a6751cf24 Split centrality building into separate processes. This is a hotfix to reduce the memory for each step 2023-01-30 10:29:47 +01:00
Mikkel Denker
347df04d73 Merge branch 'main' of github.com:Cuely/Cuely 2023-01-30 09:36:20 +01:00
Oliver Bøving
f70d307c3a
Update to Astro 2 (#71)
* Update to Astro 2.0

All other dependencies have been bumped as well.

* Run formatter on frontend using new version of Prettier
2023-01-30 09:35:54 +01:00
Mikkel Denker
f7ad311a0f Improve speed of online similarity index build by only considering top-n nodes by harmonic centrality 2023-01-30 09:31:41 +01:00
Mikkel Denker
c8c79f5672 Hopefully reduce memory fragmentation when building inbound similarity vectors 2023-01-27 10:12:22 +01:00
Mikkel Denker
8c6a6a1a48 Qa model 2023-01-25 11:13:48 +01:00
Mikkel Denker
7e834db765 Use less memory to build centrality store 2023-01-24 13:56:58 +01:00
Mikkel Denker
cebbc2c6a0 Use less memory to build centrality store 2023-01-24 13:52:01 +01:00
Mikkel Denker
28186fb526 Simple calculator widget 2023-01-24 10:52:47 +01:00
Mikkel Denker
72fa54a945 Quantize crossencoder 2023-01-23 12:33:01 +01:00
Mikkel Denker
a74d518bc5 Inbound similarity threshold for nodes (took waaaay too long to calculate for all nodes in graph) 2023-01-23 11:26:09 +01:00
Mikkel Denker
7408ebf505 Phrase query 2023-01-23 11:15:49 +01:00
Mikkel Denker
191652ccd5 Ability to open search or indexer specific centrality store. This reduces the required memory for each part 2023-01-22 10:40:30 +01:00
Mikkel Denker
f0274317b4 Use inbound edges similarity between liked sites and search results as a ranking metric. 2023-01-22 10:03:45 +01:00
Mikkel Denker
79ac702fdc Higher threshold for inbound similarity nodes 2023-01-20 09:24:49 +01:00
Mikkel Denker
8138f5e6f8 Build inbound similarity index 2023-01-18 16:23:04 +01:00
Mikkel Denker
03ea6e5c2d Use best harmonic centrality nodes as proxy nodes for online centrality, and set a limit on number of nodes considered for each proxy node. We had an issue with some nodes having extreeeeeemely many outbound edges 2023-01-18 15:54:44 +01:00
Mikkel Denker
955cf16d0e Someone on reddit suggested that BTreeMap might be more memory effecient. Let's try 2023-01-17 20:04:07 +01:00
Mikkel Denker
fba49145e7 Counter hashmaps should be dropped when we don't need them anymore 2023-01-17 17:39:25 +01:00
Mikkel Denker
792502891c Faster online_harmonic proxy node selection 2023-01-17 16:12:52 +01:00
Mikkel Denker
14172e6d8a More memory efficient shortest path 2023-01-17 15:44:32 +01:00
Mikkel Denker
6c52398a38 Dependabot stuff 2023-01-16 16:58:11 +01:00
Mikkel Denker
85b4c0b14f Empty bang redirect to top search result 2023-01-16 16:47:10 +01:00
Mikkel Denker
f1ad006799 Fixed bug where liked sites would show up in discardall optics, even though they matched none of the rules 2023-01-16 16:16:50 +01:00
Mikkel Denker
58a77e60bd fix offset bug in distributed searcher 2023-01-15 21:07:50 +01:00
Mikkel Denker
d3cb5d3215 collapsible for discussion snippets and more discussions 2023-01-15 18:42:38 +01:00
Mikkel Denker
d9e84c311f Fixed bug where crossencoder scores didn't make sense. For some reason, the ouput of the model needs to be f32 and can then later be cast to f64. No idea why 2023-01-15 14:23:04 +01:00
Mikkel Denker
5a0f2f60f6 Discussions optic 2023-01-14 19:07:04 +01:00
Mikkel Denker
5d20184b88 refactor 2023-01-14 11:55:10 +01:00
Mikkel Denker
9986d05629 Cleanup search result types 2023-01-13 16:07:12 +01:00
Mikkel Denker
7f40cac3c2 Remove old webgraph files when merged from CLI and merge into 1/2 num_segments during commit to make use of parallelism 2023-01-13 16:03:19 +01:00
Mikkel Denker
6cffad6e28 Major refactor. Generate stackoverflow sidebar in distributed searcher as this allows us to re-use the existing search infrastructure 2023-01-11 17:40:34 +01:00
Mikkel Denker
05407ba200 adjust warc download retry times 2023-01-09 19:56:04 +01:00
Mikkel Denker
c9f938d434 limit exponential backoff 2023-01-09 19:44:55 +01:00
Mikkel Denker
d2caac61c6 treat non-200 response as a download failed when downloading warc files 2023-01-09 11:11:18 +01:00
Mikkel Denker
8478595b72 create segments folder when graph is created 2023-01-09 10:19:42 +01:00
Mikkel Denker
a75b1b62c4 use in_degree + out_degree to choose proxy nodes for online centrality. This will hopefully make online centrality more accurate by having better proxy nodes 2023-01-08 17:02:01 +01:00
Mikkel Denker
689b5b0127 add onnx to dockerfile 2023-01-08 16:20:39 +01:00
Mikkel Denker
ea244b147b save webgraph metadata at all commits 2023-01-04 19:59:41 +01:00
Mikkel Denker
ee7e90f890 add long name to num-segments for webgraph merge 2023-01-04 19:46:31 +01:00
Mikkel Denker
6084ded032 better performance in online-harmonic save by saving directly into file. This prevents double memory allocation 2023-01-04 18:46:42 +01:00
Mikkel Denker
29fe3ad652 webgraph CLI merge segments 2023-01-04 12:43:32 +01:00
Mikkel Denker
0a46504402 crossencoder ranking (finally!) 2023-01-03 16:51:17 +01:00
Mikkel Denker
eee628bea6 simple json api 2022-12-28 22:47:23 +01:00
Mikkel Denker
08f9f322b8 Merge branch 'main' of github.com:Cuely/Cuely 2022-12-28 15:13:39 +01:00
Mikkel Denker
e4a27f27a0 Huge refactor of webgraph to split graph into multiple segments. This allows for more parallelism 2022-12-28 15:13:34 +01:00
Oliver Bøving
5fbb52a946
Frontend: Update dependencies and run Prettier (#70)
Since the last version of some of the prettier packages, a lot of bugs
regarding tailwind class sorting, and astro formatting in general has
been fixed!

This means that the formatters are now more compatible, and no longer
produces bogos outputs when formatting certain nested structures.

This commit also adds a npm script for formatting all frontend files,
and should now produce the same formatting as when saving/formatting in
vscode.

Also, some of the packages have bumped quite a few minor version numbers
so other improvements might also be included!
(And hopefully no regressions 🤞)
2022-12-26 17:25:09 +01:00
Mikkel Denker
4ab3ec95ea Update optics testcases 2022-12-23 16:35:51 +01:00
Mikkel Denker
e0162eebcc update optics syntax to allow for multi-stage ranking 2022-12-23 16:34:27 +01:00