0ct0pu5/search-engine-stract

Author	SHA1	Message	Date
Mikkel Denker	a1381d667b	fixed bug that caused error model in spell correction to always be empty	2024-05-27 11:45:07 +02:00
Mikkel Denker	e4e3044e47	finally ditch that pesky libtorch dependency!	2024-02-02 13:11:06 +01:00
Mikkel Denker	cc91935d0a	Move entity index out of normal search index and have dedicated search server for it	2024-01-23 14:53:33 +01:00
Mikkel Denker	fbc01ad865	summarization using mistral and 'chain-of-density' approach. the summarization becomes much better if we allow the model to first generate a candidate summarization and then improving on it. doing the improvement step just once seems to significantly improve the summary. we also now use an llm (mistral 7b) for the summarisations, as we can then use the same model for multiple tasks and serve it using gpus, thus significantly decreasing the latency.	2024-01-19 11:08:17 +01:00
Mikkel Denker	7ea3dbcca4	[ranking] add a host_centrality_rank and page_centrality_rank signal it might be easier to score pages based on their rank of the sorted their centralities. for instance the centralities for page A and page B might be very similar numerically, but if a lot of pages are between A and B when looking at the sorted list, the highest ranking page might in reality be a better result than the lower ranking one. the rankings are calculated using an external sorting algorithm to account for the fact that we might need to sort more nodes than we can feasibly keep in memory at once.	2024-01-05 12:20:24 +01:00
Oliver Bøving	369d5031df	Refactor `Justfile` and tracing with enabled debug tracing for stract (#87 ) * Refactor Justfile and tracing with enabled debug tracing for stract * Use `just dev` in `CONTRIBUTING.md`	2023-09-04 08:53:17 +00:00

6 commits