Commit graph

1308 commits

Author SHA1 Message Date
Mikkel Denker
0ccfb24b1b remove image stuff from indexer. When we want it in the future, it should be added to the crawler 2023-06-20 09:41:36 +02:00
Mikkel Denker
b2937661fb refactor crawl_db out from crawl coordinator 2023-06-18 19:42:22 +02:00
Mikkel Denker
aacc1e3dc5 crawler 2023-06-18 12:26:08 +02:00
Mikkel Denker
dd62dfc108 clippy fix 2023-06-10 14:38:16 +02:00
Mikkel Denker
0a32ba6490 if model cites reuslts that don't exist, there is no point in showing them 2023-06-08 19:55:18 +02:00
Mikkel Denker
9636275180 use mozilla suffix list to parse domains 2023-06-07 16:54:09 +02:00
Mikkel Denker
18f7ef1842 Alice; show claim confidence level 2023-06-07 15:43:33 +02:00
Mikkel Denker
87d87bed8d Alice; render citations in frontend 2023-06-07 11:35:44 +02:00
Mikkel Denker
8f3853d2d1 openai alice 2023-06-05 17:26:51 +02:00
Mikkel Denker
25f4b282eb alice improvement endpoint 2023-06-04 16:33:38 +02:00
Mikkel Denker
00768ca3c4 also use inbound similarity during query centrality 2023-06-03 22:02:22 +02:00
Mikkel Denker
18fe4a99e2 show some information about alice at the top of each conversation 2023-06-03 21:39:17 +02:00
Mikkel Denker
c04cef7de7 if Alice stops using <|endoftext|> token, go back to prev state and load ‘\n\n’ 2023-06-03 20:09:11 +02:00
Mikkel Denker
44baa1931f refactor alice into multiple files 2023-06-02 17:37:43 +02:00
Mikkel Denker
7972ad3ef3 'beta' under logo 2023-06-02 14:13:32 +02:00
Mikkel Denker
f4a2395769 allow user to specify optic in alice 2023-06-01 21:11:53 +02:00
Mikkel Denker
b16a1b9629 alice 2023-06-01 15:43:27 +02:00
Mikkel Denker
f1f5cde7c5 show homepage description in explorer 2023-05-17 16:55:39 +02:00
Mikkel Denker
6f0fce06cf More stable and consistent design 2023-05-15 20:53:06 +02:00
Mikkel Denker
09243c9df0 Explore the web, gui 2023-05-15 15:52:21 +02:00
Mikkel Denker
cb64b49ad9 Fixed a bug where distance calculation in online-harmonic used the wrong node from the edge 2023-05-10 16:29:47 +02:00
Mikkel Denker
f0129d724f find similar sites in webgraph 2023-05-09 11:44:25 +02:00
Mikkel Denker
516d350e33 academic optic 2023-05-08 16:07:30 +02:00
Mikkel Denker
e7ea348d7a Privacy policy refer to multiple lines regarding exactly what data is collected 2023-05-08 15:46:48 +02:00
Mikkel Denker
72d1086672 Dual-encoder as passage scorer for extractive summarization 2023-05-08 15:43:22 +02:00
Mikkel Denker
fe713a8737 Move from onnx to libtorch bindings for ML inference.
Fuck onnx. It was an enormous hassle to get onnx to play ball with more advanced models and execute the onnx models on GPU since onnx is only compiled to older cuda versions. This commit removes our dependency to onnx and replaces it with direct bindings to libtorch which gives us more flexibility and still allows us to easily deploy simple models with tracing. Time will tell if this is sufficiently performant or if we may want to develop some kind of JIT that can fuse matrix operations to increase performance.
2023-05-08 11:11:49 +02:00
Mikkel Denker
560b8bdcbf devdocs optic 2023-05-02 12:40:56 +02:00
Mikkel Denker
be78c1dab5 blogroll optic 2023-05-02 11:30:51 +02:00
Mikkel Denker
1a8f1ec095 10k short optic and optimizations to make large optics faster 2023-05-02 09:48:53 +02:00
Mikkel Denker
7567acdc93 Feedback when optic parsing fails 2023-05-01 16:12:17 +02:00
Mikkel Denker
5ba12367d5 optic descriptions and ability to choose optic from frontpage 2023-05-01 14:29:47 +02:00
Mikkel Denker
e9692044f8 allow bloked sites from site rankings to use FastSiteDomainPatternScorer for faster queries 2023-05-01 11:09:58 +02:00
Mikkel Denker
3633408201 handle more edge cases in optics correctly 2023-05-01 11:07:27 +02:00
Mikkel Denker
5ab900eea5 fixed a bunch of problems with pattern_query implementation and wrote some tests to make sure it works correctly 2023-04-29 18:26:23 +02:00
Mikkel Denker
05242ac69d inbound similarity cache 2023-04-25 11:24:50 +02:00
Mikkel Denker
2258da4ca2 Fixed alignment issue for rkyv. Subslice of a memory mapped file might have alignmet 1 but should have atleast 8 to work well with rkyv 2023-04-25 10:28:23 +02:00
Mikkel Denker
4576106fa6 don't *need* beta anymore to contribute 2023-04-24 11:32:14 +02:00
Mikkel Denker
605c1fc286 use rwkv for ltr data and some performance optimizations for personal centrality + inbound similarity 2023-04-13 09:35:42 +02:00
Mikkel Denker
9d3c2836af derank similar in ranking pipeline 2023-04-06 22:57:08 +02:00
Mikkel Denker
5a97ae61cd forgot to comment models for dev 2023-04-05 21:36:12 +02:00
Mikkel Denker
b14088b85c ltr lambdamart all da things 2023-04-05 21:35:15 +02:00
Mikkel Denker
a61bd453b7 ltr linear ranking model 2023-03-31 14:22:03 +02:00
Mikkel Denker
7a19797e2c tests to ensure proximity values are calculated correctly 2023-03-30 11:37:40 +02:00
Mikkel Denker
dcf238f4fc Fixed bug that caused some optics to match websites they shouldn't 2023-03-30 10:23:44 +02:00
Mikkel Denker
2ad3f93ebf Updated tantivy to a release. Currently investigating why some docs show up in searches when they shouldn't 2023-03-29 17:32:25 +02:00
Mikkel Denker
0401bd4cfa got proximity scoring back 2023-03-28 16:46:03 +02:00
Mikkel Denker
693800f817 Fixed bug where monogram term was not defined. This can happen if a monogram is not part of top_n but the same term in a bigram was 2023-03-28 15:38:13 +02:00
Mikkel Denker
2856a5feab [WIP] generate ltr data 2023-03-28 15:16:21 +02:00
Mikkel Denker
dfe618543e bump dependencies 2023-03-28 13:31:41 +02:00
Mikkel Denker
4b7b304306 take bigrams and trigrams into account during spell correction and as separate bm25 fields 2023-03-28 13:15:14 +02:00