Mikkel Denker
be4696e57d
I forgot to pop from cache in webgraph.....
2023-01-30 13:44:36 +01:00
Mikkel Denker
8a6751cf24
Split centrality building into separate processes. This is a hotfix to reduce the memory for each step
2023-01-30 10:29:47 +01:00
Mikkel Denker
347df04d73
Merge branch 'main' of github.com:Cuely/Cuely
2023-01-30 09:36:20 +01:00
Oliver Bøving
f70d307c3a
Update to Astro 2 ( #71 )
...
* Update to Astro 2.0
All other dependencies have been bumped as well.
* Run formatter on frontend using new version of Prettier
2023-01-30 09:35:54 +01:00
Mikkel Denker
f7ad311a0f
Improve speed of online similarity index build by only considering top-n nodes by harmonic centrality
2023-01-30 09:31:41 +01:00
Mikkel Denker
c8c79f5672
Hopefully reduce memory fragmentation when building inbound similarity vectors
2023-01-27 10:12:22 +01:00
Mikkel Denker
8c6a6a1a48
Qa model
2023-01-25 11:13:48 +01:00
Mikkel Denker
7e834db765
Use less memory to build centrality store
2023-01-24 13:56:58 +01:00
Mikkel Denker
cebbc2c6a0
Use less memory to build centrality store
2023-01-24 13:52:01 +01:00
Mikkel Denker
28186fb526
Simple calculator widget
2023-01-24 10:52:47 +01:00
Mikkel Denker
72fa54a945
Quantize crossencoder
2023-01-23 12:33:01 +01:00
Mikkel Denker
a74d518bc5
Inbound similarity threshold for nodes (took waaaay too long to calculate for all nodes in graph)
2023-01-23 11:26:09 +01:00
Mikkel Denker
7408ebf505
Phrase query
2023-01-23 11:15:49 +01:00
Mikkel Denker
191652ccd5
Ability to open search or indexer specific centrality store. This reduces the required memory for each part
2023-01-22 10:40:30 +01:00
Mikkel Denker
f0274317b4
Use inbound edges similarity between liked sites and search results as a ranking metric.
2023-01-22 10:03:45 +01:00
Mikkel Denker
79ac702fdc
Higher threshold for inbound similarity nodes
2023-01-20 09:24:49 +01:00
Mikkel Denker
8138f5e6f8
Build inbound similarity index
2023-01-18 16:23:04 +01:00
Mikkel Denker
03ea6e5c2d
Use best harmonic centrality nodes as proxy nodes for online centrality, and set a limit on number of nodes considered for each proxy node. We had an issue with some nodes having extreeeeeemely many outbound edges
2023-01-18 15:54:44 +01:00
Mikkel Denker
955cf16d0e
Someone on reddit suggested that BTreeMap might be more memory effecient. Let's try
2023-01-17 20:04:07 +01:00
Mikkel Denker
fba49145e7
Counter hashmaps should be dropped when we don't need them anymore
2023-01-17 17:39:25 +01:00
Mikkel Denker
792502891c
Faster online_harmonic proxy node selection
2023-01-17 16:12:52 +01:00
Mikkel Denker
14172e6d8a
More memory efficient shortest path
2023-01-17 15:44:32 +01:00
Mikkel Denker
6c52398a38
Dependabot stuff
2023-01-16 16:58:11 +01:00
Mikkel Denker
85b4c0b14f
Empty bang redirect to top search result
2023-01-16 16:47:10 +01:00
Mikkel Denker
f1ad006799
Fixed bug where liked sites would show up in discardall optics, even though they matched none of the rules
2023-01-16 16:16:50 +01:00
Mikkel Denker
58a77e60bd
fix offset bug in distributed searcher
2023-01-15 21:07:50 +01:00
Mikkel Denker
d3cb5d3215
collapsible for discussion snippets and more discussions
2023-01-15 18:42:38 +01:00
Mikkel Denker
d9e84c311f
Fixed bug where crossencoder scores didn't make sense. For some reason, the ouput of the model needs to be f32 and can then later be cast to f64. No idea why
2023-01-15 14:23:04 +01:00
Mikkel Denker
5a0f2f60f6
Discussions optic
2023-01-14 19:07:04 +01:00
Mikkel Denker
5d20184b88
refactor
2023-01-14 11:55:10 +01:00
Mikkel Denker
9986d05629
Cleanup search result types
2023-01-13 16:07:12 +01:00
Mikkel Denker
7f40cac3c2
Remove old webgraph files when merged from CLI and merge into 1/2 num_segments during commit to make use of parallelism
2023-01-13 16:03:19 +01:00
Mikkel Denker
6cffad6e28
Major refactor. Generate stackoverflow sidebar in distributed searcher as this allows us to re-use the existing search infrastructure
2023-01-11 17:40:34 +01:00
Mikkel Denker
05407ba200
adjust warc download retry times
2023-01-09 19:56:04 +01:00
Mikkel Denker
c9f938d434
limit exponential backoff
2023-01-09 19:44:55 +01:00
Mikkel Denker
d2caac61c6
treat non-200 response as a download failed when downloading warc files
2023-01-09 11:11:18 +01:00
Mikkel Denker
8478595b72
create segments folder when graph is created
2023-01-09 10:19:42 +01:00
Mikkel Denker
a75b1b62c4
use in_degree + out_degree to choose proxy nodes for online centrality. This will hopefully make online centrality more accurate by having better proxy nodes
2023-01-08 17:02:01 +01:00
Mikkel Denker
689b5b0127
add onnx to dockerfile
2023-01-08 16:20:39 +01:00
Mikkel Denker
ea244b147b
save webgraph metadata at all commits
2023-01-04 19:59:41 +01:00
Mikkel Denker
ee7e90f890
add long name to num-segments for webgraph merge
2023-01-04 19:46:31 +01:00
Mikkel Denker
6084ded032
better performance in online-harmonic save by saving directly into file. This prevents double memory allocation
2023-01-04 18:46:42 +01:00
Mikkel Denker
29fe3ad652
webgraph CLI merge segments
2023-01-04 12:43:32 +01:00
Mikkel Denker
0a46504402
crossencoder ranking (finally!)
2023-01-03 16:51:17 +01:00
Mikkel Denker
eee628bea6
simple json api
2022-12-28 22:47:23 +01:00
Mikkel Denker
08f9f322b8
Merge branch 'main' of github.com:Cuely/Cuely
2022-12-28 15:13:39 +01:00
Mikkel Denker
e4a27f27a0
Huge refactor of webgraph to split graph into multiple segments. This allows for more parallelism
2022-12-28 15:13:34 +01:00
Oliver Bøving
5fbb52a946
Frontend: Update dependencies and run Prettier ( #70 )
...
Since the last version of some of the prettier packages, a lot of bugs
regarding tailwind class sorting, and astro formatting in general has
been fixed!
This means that the formatters are now more compatible, and no longer
produces bogos outputs when formatting certain nested structures.
This commit also adds a npm script for formatting all frontend files,
and should now produce the same formatting as when saving/formatting in
vscode.
Also, some of the packages have bumped quite a few minor version numbers
so other improvements might also be included!
(And hopefully no regressions 🤞 )
2022-12-26 17:25:09 +01:00
Mikkel Denker
4ab3ec95ea
Update optics testcases
2022-12-23 16:35:51 +01:00
Mikkel Denker
e0162eebcc
update optics syntax to allow for multi-stage ranking
2022-12-23 16:34:27 +01:00