
* Begin overview documentation in mdbook format * Overview of the different docs * Move overview documentation to mkdocs * Reduce webgraph segment merges by introducing a webgraph commit mode that commits the live segment directly to the stored segment * Parallel harmonic centrality calculations * Even more parallelism in harmonic centrality calculations * Way faster hyperloglog but also less accurate * Dynamic exact counting threshold proportional to size of graph * improve inbound similarity speed and fix hyperloglog out-of-bounds bug * no need to load all nodes into memory for harmonic centrality * Use rayon directly in indexer. Hopefully this fixes the bug where the indexer takes a new job before it has finished the first one. I think what happened was that the indexer thread took a new job when hitting the webgraph executor. * single threaded webgraph when indexing * No need for node2id anymore * Use single thread in tantviy by default. We introduce a method to optmize the index for search, which currently just sets the tantivy executor to be multithreaded. This should improve the indexing performance. * Reduce memory arena in tantivy * try jemalloc * Revert tantivy memory arena reduction. Caused too many files to be created when indexing warc files
1 line
5 B
Text
1 line
5 B
Text
site
|