This moves building the astro frontend from build.rs into the justfile.
This streamlines the build process for the frontend astro part, and the
frontend application itself by letting cargo watch rebuild the astro and
then the Rust binary, instead of building astro in build.rs.
Non-conclusive results says that this improves build times from about
13s to 6s, while being more consistent :)
This commit removes the Montserrat font stored in public and replaces
them with the fonts installed with @fontsource/montserrat.
This also streamlines the font import process, and ensures that correct
typeface formats are loaded.
One thing to consider, is using variable fonts instead. I failed to get
loading with this setup however, but it seems to be supported.
* goggles parser
* support weird urls in goggle
* quite significant speedboost during indexing (approx x1.7)
* merge index explicitly in indexer
* query benchmark
* search performance improvements primarily by not using hash during search
* turn goggle into tantivy query
* goggle benchmark
* Goggles are working!!
* fixed bug where goggle would enforce that sitename must be A and B and C etc. instead of A or B or C
* document some more goggle syntax
* less bold search highlights
* Build frontend using Astro
This commit replaces the prior pure Askama templates, into templates
statically generated by Astro.
The purpose of this is to enable features such as MD/MDX,
minimization, better Tailwind integration, and JSX'ish component syntax.
In the future a frontend framework, like React, could also be added, while
still compiling the frontend templates statically.
* Delete README.md
* Make the searchbar more round
* Run `npm install` in build.rs
* Fix some image paths
* Make search bar suggestion visibility CSS only
The suggestions are only shown when the input has focus and is not empty
* Add prettier astro and tailwind plugin
* Remove meta astro tag
* Move privacy.astro to privacy-and-happy-lawyers.astro
* Convert privacy and about page to MDX
* Remove old about page
Co-authored-by: Mikkel Denker <Mikkeldenker@gmail.com>
* [WIP] refactoring ranking signal coefficients into a trait
* refactoring ranking signal coefficients into a trait
* parser for custom signal aggregator
* ability to customize signal aggregation
* update readme
* minor ranking tweaks
* Remove term proximity ranking.
Term proximity has been saved to a seperate branch for the future, but will not be used during ranking.
Experimenting with 10 warc files seemed to indicate, that the term proximity ranking actually worsened the
search results quite substantially. We might want to re-introduce something like it in the future, but will
probably have to devote more time into paramater-tuning. Maybe doing something naive like searching for bi-grams and
tri-grams with bm25 ranking is sufficient?
* support 'not' search query
* site query
* title query
* body query
* url query
* Bm25 requires at least one term #7
* move empty query handling into the searcher and handle error gracefully
Co-authored-by: Mikkel Denker <mikkel@cuely.io>
* minor ranking tweaks
* Remove term proximity ranking.
Term proximity has been saved to a seperate branch for the future, but will not be used during ranking.
Experimenting with 10 warc files seemed to indicate, that the term proximity ranking actually worsened the
search results quite substantially. We might want to re-introduce something like it in the future, but will
probably have to devote more time into paramater-tuning. Maybe doing something naive like searching for bi-grams and tri-grams with bm25 ranking is sufficient?
* simple spell corrections
* Only store top n terms in spell dictionary
* remove some weird characters from terms
* split compound words
* small cleanup