search-engine-stract/docs/.gitignore
Mikkel Denker 36f22e801e
Overview docs (#73)
* Begin overview documentation in mdbook format

* Overview of the different docs

* Move overview documentation to mkdocs

* Reduce webgraph segment merges by introducing a webgraph commit mode that commits the live segment directly to the stored segment

* Parallel harmonic centrality calculations

* Even more parallelism in harmonic centrality calculations

* Way faster hyperloglog but also less accurate

* Dynamic exact counting threshold proportional to size of graph

* improve inbound similarity speed and fix hyperloglog out-of-bounds bug

* no need to load all nodes into memory for harmonic centrality

* Use rayon directly in indexer.
Hopefully this fixes the bug where the indexer takes a new job before it has finished the first one. I think what happened was that the indexer thread took a new job when hitting the webgraph executor.

* single threaded webgraph when indexing

* No need for node2id anymore

* Use single thread in tantviy by default.
We introduce a method to optmize the index for search, which currently just sets the tantivy executor to be multithreaded. This should improve the indexing performance.

* Reduce memory arena in tantivy

* try jemalloc

* Revert tantivy memory arena reduction. Caused too many files to be created when indexing warc files
2023-08-08 06:32:44 +00:00

1 line
5 B
Text