* add docusaurus scalar api documentation structure
* bump openapi 3.0 to 3.1 so we can mark internal endpoints
* improve search api docs
* webgraph api docs
* point docs to prod
* Update to Svelte 5
This was rather straight forward actually! Just needed to bump deps, and replace one instance of `enum` decl in a `.svelte` file since that's no longer supported.
* Fix CI errors
There seems to be a bug in svelte 5 where '<..>' typecasts are incorrectly parsed (issue 13179 in svelte repo)
* change selfclosing tags from <tag /> to <tag></tag>
fixes svelte warnings
* high contrast theme
* improve '/explore' error messages when site is invalid
* change aria-expanded when search suggestions are displayed
* wrap search suggestions in <ul> and <li> items to ensure screen reader knows how many suggestions there are
* move focus to modal when it opens. trap focus until modal is closed again
* add aria-expanded to each result that is true iff. the modal is expanded for that result
* entire navigation bar inside <nav> element
* add skip link to navbar to jump to main content of page
* improve focus indicators for selected /explore sites
* more descriptive titles for explore page interactive elements
* group settings in fieldsets and use title+description as legend
* add language to setting input fields to ensure required fields error is read in correct language on screenreaders
* add headings to serp
* add title to hamburger menu on mobile
* fix firefox accessibility errors
* only show button outline on tab focus
I'm starting to think that the discord and matrix chats were a bad idea. I had originally hoped that a small community would form around them, where people interested in the inner workings of a search engine would share interesting resources/ideas with eachother and simply hang out and chat. Instead, the chats have mainly turned into support chats where people would ask me questions about Stract directly. There is absolutely nothing wrong with these types of questions, but given that this project is already very constrained on resources as is (I'm the only one to answer questions) I am starting to come to the conclusion that having support as a giant group chat might not have been the best idea after all. The answers I give need to be easily findable so they can benefit other people in a similar situation as well, instead of having to answer the same question multiple times.
Hopefully this will instead encourage questions to be posted in issues/discussions on github. Here they can be found in posterity and won't introduce a social expectation that all questions must be answered immediately, which I simply cannot fulfill.
Doesn't handle concurrent writes and flushes after each write. This will cause a lot of fsync's which will impact performance, but as this will be used for the live index where each item (a full webpage) is quite large, this will hopefully not be too detrimental.
e.g. the term 'the' might not be very common in titles but should still be scaled as a less important term than other terms in the query. instead of duplicating all text in the index we approximate the bm25f IDF weight as the highest IDF across the fields
* ranking diff tool structure
* fix missing icon types
* add admin for queries and experiments
* minor cleanup
* show experiment progress
* upgrade node adapter for svelte
* hopefully fix ci
* display common queries between experiments
* display serp diffs with top signals for each result
* like experiments and show overview in queries
* settings to toggle experiment shuffle and show/hide signals
* keyboard shortcuts
* visualise improvements by query category
* document how to use tool
the probabilistic nature of llms means they have an inherent risc of hallucinating. even if they tend to cite correctly most of the time, the probability of hallucinations is still too large to be able to trust the output, thus defeating the purpose of the summary entirely. until these hallucinations are fixed (or the probability is extremely low) i don't see how it makes sense to include llms in search
this allows us to short-cirquit the query by default which significantly improves performance as we therefore don't have to iterate the non-scored results simply to count them