Commit graph

277 commits

Author SHA1 Message Date
Mikkel Denker
d69fd5b8c3 update documentation links 2024-12-03 15:56:45 +01:00
Mikkel Denker
cd9a794cd5 just update 2024-12-03 15:05:07 +01:00
Mikkel Denker
de7291daa1 just update 2024-12-03 15:00:08 +01:00
Mikkel Denker
3945651330
Check if either the request or response ip is an internal ip. Fail the request if this is the case (#238) 2024-11-29 11:27:30 +01:00
Mikkel Denker
12e9502e80
Improve API documentation (#235)
* add docusaurus scalar api documentation structure

* bump openapi 3.0 to 3.1 so we can mark internal endpoints

* improve search api docs

* webgraph api docs

* point docs to prod
2024-11-19 13:43:42 +01:00
Mikkel Denker
0d0405caa6 [webgraph] rename *params -> *query 2024-11-15 10:23:26 +01:00
Mikkel Denker
ac57777c58 add potential meta key to keybindings 2024-11-08 14:42:46 +01:00
Oliver Bøving
2b3f6a13fa
Update to Svelte 5 (#233)
* Update to Svelte 5

This was rather straight forward actually! Just needed to bump deps, and replace one instance of `enum` decl in a `.svelte` file since that's no longer supported.

* Fix CI errors

There seems to be a bug in svelte 5 where '<..>' typecasts are incorrectly parsed (issue 13179 in svelte repo)

* change selfclosing tags from <tag /> to <tag></tag>

fixes svelte warnings
2024-11-04 10:44:05 +01:00
Mikkel Denker
cae6339671 just update 2024-11-01 15:35:26 +01:00
Mikkel Denker
4e8426888b just update 2024-11-01 15:28:40 +01:00
Mikkel Denker
9083d600c7 take extra edges to ensure the remote has enough for deduplication 2024-10-30 10:56:39 +01:00
Mikkel Denker
3763c51348 convert RelFlags from a bitset to a vec of enums for public api 2024-10-28 12:26:29 +01:00
Mikkel Denker
375ab7b7f9 npm run format 2024-10-25 10:09:15 +02:00
Mikkel Denker
f521c5d09d update cookie to ^0.7.0 2024-10-25 09:58:15 +02:00
Mikkel Denker
31bfebf2c9 just update 2024-10-25 09:37:45 +02:00
Mikkel Denker
f494a11a1a
Accessibility overhaul (#231)
* high contrast theme

* improve '/explore' error messages when site is invalid

* change aria-expanded when search suggestions are displayed

* wrap search suggestions in <ul> and <li> items to ensure screen reader knows how many suggestions there are

* move focus to modal when it opens. trap focus until modal is closed again

* add aria-expanded to each result that is true iff. the modal is expanded for that result

* entire navigation bar inside <nav> element

* add skip link to navbar to jump to main content of page

* improve focus indicators for selected /explore sites

* more descriptive titles for explore page interactive elements

* group settings in fieldsets and use title+description as legend

* add language to setting input fields to ensure required fields error is read in correct language on screenreaders

* add headings to serp

* add title to hamburger menu on mobile

* fix firefox accessibility errors

* only show button outline on tab focus
2024-10-09 11:12:09 +02:00
Mikkel Denker
5ebdb24a07 just update 2024-10-01 09:51:11 +02:00
Mikkel Denker
4805aa83e3 removing discord/matrix links
I'm starting to think that the discord and matrix chats were a bad idea. I had originally hoped that a small community would form around them, where people interested in the inner workings of a search engine would share interesting resources/ideas with eachother and simply hang out and chat. Instead, the chats have mainly turned into support chats where people would ask me questions about Stract directly. There is absolutely nothing wrong with these types of questions, but given that this project is already very constrained on resources as is (I'm the only one to answer questions) I am starting to come to the conclusion that having support as a giant group chat might not have been the best idea after all. The answers I give need to be easily findable so they can benefit other people in a similar situation as well, instead of having to answer the same question multiple times.

Hopefully this will instead encourage questions to be posted in issues/discussions on github. Here they can be found in posterity and won't introduce a social expectation that all questions must be answered immediately, which I simply cannot fulfill.
2024-09-25 15:00:27 +02:00
Mikkel Denker
55b39555aa npm update 2024-09-18 12:19:14 +02:00
Mikkel Denker
c244eacb3a use table in explore page instead of div's 2024-09-17 19:56:02 +02:00
Mikkel Denker
3fefd80408 small accessibility improvements to entity sidebar 2024-09-17 14:28:00 +02:00
Mikkel Denker
bc7f7f14de npm update 2024-09-16 09:34:02 +02:00
Mikkel Denker
21d28ea86d remove 'package-lock.json' from gitignore 2024-09-12 12:54:15 +02:00
Mikkel Denker
c43da8cf5c detail how to exclude stractbot using robots.txt 2024-09-12 12:53:49 +02:00
Mikkel Denker
8d0ad573a7 start politeness factor at 2, decrease iff. we don't receive any 429 responses. also increase max delay to 180 seconds 2024-09-06 13:41:22 +02:00
Mikkel Denker
365ed02813
Very simple WAL built on top of file-store primitives (#219)
Doesn't handle concurrent writes and flushes after each write. This will cause a lot of fsync's which will impact performance, but as this will be used for the live index where each item (a full webpage) is quite large, this will hopefully not be too detrimental.
2024-09-05 14:35:52 +02:00
Mikkel Denker
fef81ee86d fix captcha checkmark color so it looks nice on different themes 2024-09-05 11:15:06 +02:00
Mikkel Denker
ece1640458 forgot to remove a console.log... 2024-09-05 11:01:30 +02:00
Mikkel Denker
9d0dba7da2 move captcha images/audio to correct location during build 2024-09-05 10:49:15 +02:00
Mikkel Denker
0f3de6c0dc make sure correct address is used when frontend is behind nginx 2024-09-04 17:22:06 +02:00
Mikkel Denker
40a6dde924 custom captcha to reduce the number of bots scraping the search results 2024-09-04 15:44:57 +02:00
Mikkel Denker
9c567ab720
add tooltip to show what the eye icon in optics settings does (#218) 2024-09-02 09:56:51 +02:00
Mikkel Denker
9cba6c13fd split 'Signal' trait into 'CoreSignal' and 'Signal' to distinguish between the signals that are calculated initially and later during ranking 2024-08-28 16:33:56 +02:00
Mikkel Denker
c6119e31d7 giant ranking pipeline refactor to separate ranking stages from sorting/offset logic
this should make it easier to implement additional ranking stages in the future
2024-08-27 20:16:17 +02:00
Mikkel Denker
90aa27232c fixed bug where new queries in serp wouldn't re-trigger search 2024-08-19 15:33:09 +02:00
Mikkel Denker
ee516eff95
Remove 'sr' parameter (#215)
* Remove 'sr' parameter and use svelte store directly instead

* remove lz-string dependency
2024-08-19 14:51:54 +02:00
Mikkel Denker
e63d6dc2e6 clamp title to 2 lines on mobile 2024-08-19 13:39:29 +02:00
Mikkel Denker
1f09a4247d ltr experiment
use differential evolution to optimize linear model live
2024-08-19 12:13:59 +02:00
Mikkel Denker
efa522a008 add small icon next to liked/disliked sites 2024-07-26 11:55:19 +02:00
Mikkel Denker
f6995402a1 bump crawler version 2024-06-28 13:50:58 +02:00
Mikkel Denker
5beae3b9a9 simplified bm25f that uses same IDF weight across all fields
e.g. the term 'the' might not be very common in titles but should still be scaled as a less important term than other terms in the query. instead of duplicating all text in the index we approximate the bm25f IDF weight as the highest IDF across the fields
2024-06-20 15:01:41 +02:00
Mikkel Denker
265b1b7871
Ranking diff tool (#207)
* ranking diff tool structure

* fix missing icon types

* add admin for queries and experiments

* minor cleanup

* show experiment progress

* upgrade node adapter for svelte

* hopefully fix ci

* display common queries between experiments

* display serp diffs with top signals for each result

* like experiments and show overview in queries

* settings to toggle experiment shuffle and show/hide signals

* keyboard shortcuts

* visualise improvements by query category

* document how to use tool
2024-06-03 15:00:16 +02:00
Mikkel Denker
e39987a2f7 remove summarizer
the probabilistic nature of llms means they have an inherent risc of hallucinating. even if they tend to cite correctly most of the time, the probability of hallucinations is still too large to be able to trust the output, thus defeating the purpose of the summary entirely. until these hallucinations are fixed (or the probability is extremely low) i don't see how it makes sense to include llms in search
2024-05-27 10:04:48 +02:00
Mikkel Denker
17fed5a75c
Show ranking signals (#201) 2024-05-17 16:39:33 +02:00
Mikkel Denker
b302a8d5c7 coordinate changed nodes between workers in distrivuted harmonic to make sure a node that has been update in worker A is also considered for updates on worker B 2024-05-10 15:09:44 +02:00
Mikkel Denker
e5e2126e54 align snippet text and date horizontally 2024-05-07 09:33:30 +02:00
Mikkel Denker
3c94cb7f81 approximate number of hits by assuming that each term is independent
this allows us to short-cirquit the query by default which significantly improves performance as we therefore don't have to iterate the non-scored results simply to count them
2024-05-06 15:21:17 +02:00
Mikkel Denker
bfa4ce9043 make sure long titles are truncated in serp 2024-05-06 12:14:48 +02:00
Mikkel Denker
d687270b78 small serp accessibility improvements
group results and make sure heading is read first
2024-05-06 11:52:05 +02:00
Mikkel Denker
ccc28d7ade make sure keyed each blocks have unique id to prevent infinite render retries from svelte when search results contain duplicate results 2024-05-04 17:54:30 +02:00