Mikkel Denker
0ccfb24b1b
remove image stuff from indexer. When we want it in the future, it should be added to the crawler
2023-06-20 09:41:36 +02:00
Mikkel Denker
b2937661fb
refactor crawl_db out from crawl coordinator
2023-06-18 19:42:22 +02:00
Mikkel Denker
aacc1e3dc5
crawler
2023-06-18 12:26:08 +02:00
Mikkel Denker
dd62dfc108
clippy fix
2023-06-10 14:38:16 +02:00
Mikkel Denker
0a32ba6490
if model cites reuslts that don't exist, there is no point in showing them
2023-06-08 19:55:18 +02:00
Mikkel Denker
9636275180
use mozilla suffix list to parse domains
2023-06-07 16:54:09 +02:00
Mikkel Denker
18f7ef1842
Alice; show claim confidence level
2023-06-07 15:43:33 +02:00
Mikkel Denker
87d87bed8d
Alice; render citations in frontend
2023-06-07 11:35:44 +02:00
Mikkel Denker
8f3853d2d1
openai alice
2023-06-05 17:26:51 +02:00
Mikkel Denker
25f4b282eb
alice improvement endpoint
2023-06-04 16:33:38 +02:00
Mikkel Denker
00768ca3c4
also use inbound similarity during query centrality
2023-06-03 22:02:22 +02:00
Mikkel Denker
18fe4a99e2
show some information about alice at the top of each conversation
2023-06-03 21:39:17 +02:00
Mikkel Denker
c04cef7de7
if Alice stops using <|endoftext|> token, go back to prev state and load ‘\n\n’
2023-06-03 20:09:11 +02:00
Mikkel Denker
44baa1931f
refactor alice into multiple files
2023-06-02 17:37:43 +02:00
Mikkel Denker
7972ad3ef3
'beta' under logo
2023-06-02 14:13:32 +02:00
Mikkel Denker
f4a2395769
allow user to specify optic in alice
2023-06-01 21:11:53 +02:00
Mikkel Denker
b16a1b9629
alice
2023-06-01 15:43:27 +02:00
Mikkel Denker
f1f5cde7c5
show homepage description in explorer
2023-05-17 16:55:39 +02:00
Mikkel Denker
6f0fce06cf
More stable and consistent design
2023-05-15 20:53:06 +02:00
Mikkel Denker
09243c9df0
Explore the web, gui
2023-05-15 15:52:21 +02:00
Mikkel Denker
cb64b49ad9
Fixed a bug where distance calculation in online-harmonic used the wrong node from the edge
2023-05-10 16:29:47 +02:00
Mikkel Denker
f0129d724f
find similar sites in webgraph
2023-05-09 11:44:25 +02:00
Mikkel Denker
516d350e33
academic optic
2023-05-08 16:07:30 +02:00
Mikkel Denker
e7ea348d7a
Privacy policy refer to multiple lines regarding exactly what data is collected
2023-05-08 15:46:48 +02:00
Mikkel Denker
72d1086672
Dual-encoder as passage scorer for extractive summarization
2023-05-08 15:43:22 +02:00
Mikkel Denker
fe713a8737
Move from onnx to libtorch bindings for ML inference.
...
Fuck onnx. It was an enormous hassle to get onnx to play ball with more advanced models and execute the onnx models on GPU since onnx is only compiled to older cuda versions. This commit removes our dependency to onnx and replaces it with direct bindings to libtorch which gives us more flexibility and still allows us to easily deploy simple models with tracing. Time will tell if this is sufficiently performant or if we may want to develop some kind of JIT that can fuse matrix operations to increase performance.
2023-05-08 11:11:49 +02:00
Mikkel Denker
560b8bdcbf
devdocs optic
2023-05-02 12:40:56 +02:00
Mikkel Denker
be78c1dab5
blogroll optic
2023-05-02 11:30:51 +02:00
Mikkel Denker
1a8f1ec095
10k short optic and optimizations to make large optics faster
2023-05-02 09:48:53 +02:00
Mikkel Denker
7567acdc93
Feedback when optic parsing fails
2023-05-01 16:12:17 +02:00
Mikkel Denker
5ba12367d5
optic descriptions and ability to choose optic from frontpage
2023-05-01 14:29:47 +02:00
Mikkel Denker
e9692044f8
allow bloked sites from site rankings to use FastSiteDomainPatternScorer for faster queries
2023-05-01 11:09:58 +02:00
Mikkel Denker
3633408201
handle more edge cases in optics correctly
2023-05-01 11:07:27 +02:00
Mikkel Denker
5ab900eea5
fixed a bunch of problems with pattern_query implementation and wrote some tests to make sure it works correctly
2023-04-29 18:26:23 +02:00
Mikkel Denker
05242ac69d
inbound similarity cache
2023-04-25 11:24:50 +02:00
Mikkel Denker
2258da4ca2
Fixed alignment issue for rkyv. Subslice of a memory mapped file might have alignmet 1 but should have atleast 8 to work well with rkyv
2023-04-25 10:28:23 +02:00
Mikkel Denker
4576106fa6
don't *need* beta anymore to contribute
2023-04-24 11:32:14 +02:00
Mikkel Denker
605c1fc286
use rwkv for ltr data and some performance optimizations for personal centrality + inbound similarity
2023-04-13 09:35:42 +02:00
Mikkel Denker
9d3c2836af
derank similar in ranking pipeline
2023-04-06 22:57:08 +02:00
Mikkel Denker
5a97ae61cd
forgot to comment models for dev
2023-04-05 21:36:12 +02:00
Mikkel Denker
b14088b85c
ltr lambdamart all da things
2023-04-05 21:35:15 +02:00
Mikkel Denker
a61bd453b7
ltr linear ranking model
2023-03-31 14:22:03 +02:00
Mikkel Denker
7a19797e2c
tests to ensure proximity values are calculated correctly
2023-03-30 11:37:40 +02:00
Mikkel Denker
dcf238f4fc
Fixed bug that caused some optics to match websites they shouldn't
2023-03-30 10:23:44 +02:00
Mikkel Denker
2ad3f93ebf
Updated tantivy to a release. Currently investigating why some docs show up in searches when they shouldn't
2023-03-29 17:32:25 +02:00
Mikkel Denker
0401bd4cfa
got proximity scoring back
2023-03-28 16:46:03 +02:00
Mikkel Denker
693800f817
Fixed bug where monogram term was not defined. This can happen if a monogram is not part of top_n but the same term in a bigram was
2023-03-28 15:38:13 +02:00
Mikkel Denker
2856a5feab
[WIP] generate ltr data
2023-03-28 15:16:21 +02:00
Mikkel Denker
dfe618543e
bump dependencies
2023-03-28 13:31:41 +02:00
Mikkel Denker
4b7b304306
take bigrams and trigrams into account during spell correction and as separate bm25 fields
2023-03-28 13:15:14 +02:00