Commit graph

6 commits

Author SHA1 Message Date
Mikkel Denker
a1381d667b fixed bug that caused error model in spell correction to always be empty 2024-05-27 11:45:07 +02:00
Mikkel Denker
e4e3044e47 finally ditch that pesky libtorch dependency! 2024-02-02 13:11:06 +01:00
Mikkel Denker
cc91935d0a Move entity index out of normal search index and have dedicated search server for it 2024-01-23 14:53:33 +01:00
Mikkel Denker
fbc01ad865 summarization using mistral and 'chain-of-density' approach.
the summarization becomes much better if we allow the model to first generate a candidate summarization and then improving on it.
doing the improvement step just once seems to significantly improve the summary.
we also now use an llm (mistral 7b) for the summarisations, as we can then use the same model for multiple tasks and serve it using gpus, thus significantly decreasing the latency.
2024-01-19 11:08:17 +01:00
Mikkel Denker
7ea3dbcca4 [ranking] add a host_centrality_rank and page_centrality_rank signal
it might be easier to score pages based on their rank of the sorted their centralities. for instance the centralities for page A and page B might be very similar numerically, but if a lot of pages are between A and B when looking at the sorted list, the highest ranking page might in reality be a better result than the lower ranking one.

the rankings are calculated using an external sorting algorithm to account for the fact that we might need to sort more nodes than we can feasibly keep in memory at once.
2024-01-05 12:20:24 +01:00
Oliver Bøving
369d5031df
Refactor Justfile and tracing with enabled debug tracing for stract (#87)
* Refactor Justfile and tracing with enabled debug tracing for stract

* Use `just dev` in `CONTRIBUTING.md`
2023-09-04 08:53:17 +00:00