0ct0pu5/search-engine-stract

Author	SHA1	Message	Date
Mikkel Denker	d69fd5b8c3	update documentation links	2024-12-03 15:56:45 +01:00
Mikkel Denker	cd9a794cd5	just update	2024-12-03 15:05:07 +01:00
Mikkel Denker	de7291daa1	just update	2024-12-03 15:00:08 +01:00
Mikkel Denker	3945651330	Check if either the request or response ip is an internal ip. Fail the request if this is the case (#238 )	2024-11-29 11:27:30 +01:00
Mikkel Denker	12e9502e80	Improve API documentation (#235 ) * add docusaurus scalar api documentation structure * bump openapi 3.0 to 3.1 so we can mark internal endpoints * improve search api docs * webgraph api docs * point docs to prod	2024-11-19 13:43:42 +01:00
Mikkel Denker	0d0405caa6	[webgraph] rename params -> query	2024-11-15 10:23:26 +01:00
Mikkel Denker	ac57777c58	add potential meta key to keybindings	2024-11-08 14:42:46 +01:00
Oliver Bøving	2b3f6a13fa	Update to Svelte 5 (#233 ) * Update to Svelte 5 This was rather straight forward actually! Just needed to bump deps, and replace one instance of `enum` decl in a `.svelte` file since that's no longer supported. * Fix CI errors There seems to be a bug in svelte 5 where '<..>' typecasts are incorrectly parsed (issue 13179 in svelte repo) * change selfclosing tags from <tag /> to <tag></tag> fixes svelte warnings	2024-11-04 10:44:05 +01:00
Mikkel Denker	cae6339671	just update	2024-11-01 15:35:26 +01:00
Mikkel Denker	4e8426888b	just update	2024-11-01 15:28:40 +01:00
Mikkel Denker	9083d600c7	take extra edges to ensure the remote has enough for deduplication	2024-10-30 10:56:39 +01:00
Mikkel Denker	3763c51348	convert RelFlags from a bitset to a vec of enums for public api	2024-10-28 12:26:29 +01:00
Mikkel Denker	375ab7b7f9	npm run format	2024-10-25 10:09:15 +02:00
Mikkel Denker	f521c5d09d	update cookie to ^0.7.0	2024-10-25 09:58:15 +02:00
Mikkel Denker	31bfebf2c9	just update	2024-10-25 09:37:45 +02:00
Mikkel Denker	f494a11a1a	Accessibility overhaul (#231 ) * high contrast theme * improve '/explore' error messages when site is invalid * change aria-expanded when search suggestions are displayed * wrap search suggestions in <ul> and <li> items to ensure screen reader knows how many suggestions there are * move focus to modal when it opens. trap focus until modal is closed again * add aria-expanded to each result that is true iff. the modal is expanded for that result * entire navigation bar inside <nav> element * add skip link to navbar to jump to main content of page * improve focus indicators for selected /explore sites * more descriptive titles for explore page interactive elements * group settings in fieldsets and use title+description as legend * add language to setting input fields to ensure required fields error is read in correct language on screenreaders * add headings to serp * add title to hamburger menu on mobile * fix firefox accessibility errors * only show button outline on tab focus	2024-10-09 11:12:09 +02:00
Mikkel Denker	5ebdb24a07	just update	2024-10-01 09:51:11 +02:00
Mikkel Denker	4805aa83e3	removing discord/matrix links I'm starting to think that the discord and matrix chats were a bad idea. I had originally hoped that a small community would form around them, where people interested in the inner workings of a search engine would share interesting resources/ideas with eachother and simply hang out and chat. Instead, the chats have mainly turned into support chats where people would ask me questions about Stract directly. There is absolutely nothing wrong with these types of questions, but given that this project is already very constrained on resources as is (I'm the only one to answer questions) I am starting to come to the conclusion that having support as a giant group chat might not have been the best idea after all. The answers I give need to be easily findable so they can benefit other people in a similar situation as well, instead of having to answer the same question multiple times. Hopefully this will instead encourage questions to be posted in issues/discussions on github. Here they can be found in posterity and won't introduce a social expectation that all questions must be answered immediately, which I simply cannot fulfill.	2024-09-25 15:00:27 +02:00
Mikkel Denker	55b39555aa	npm update	2024-09-18 12:19:14 +02:00
Mikkel Denker	c244eacb3a	use table in explore page instead of div's	2024-09-17 19:56:02 +02:00
Mikkel Denker	3fefd80408	small accessibility improvements to entity sidebar	2024-09-17 14:28:00 +02:00
Mikkel Denker	bc7f7f14de	npm update	2024-09-16 09:34:02 +02:00
Mikkel Denker	21d28ea86d	remove 'package-lock.json' from gitignore	2024-09-12 12:54:15 +02:00
Mikkel Denker	c43da8cf5c	detail how to exclude stractbot using robots.txt	2024-09-12 12:53:49 +02:00
Mikkel Denker	8d0ad573a7	start politeness factor at 2, decrease iff. we don't receive any 429 responses. also increase max delay to 180 seconds	2024-09-06 13:41:22 +02:00
Mikkel Denker	365ed02813	Very simple WAL built on top of file-store primitives (#219 ) Doesn't handle concurrent writes and flushes after each write. This will cause a lot of fsync's which will impact performance, but as this will be used for the live index where each item (a full webpage) is quite large, this will hopefully not be too detrimental.	2024-09-05 14:35:52 +02:00
Mikkel Denker	fef81ee86d	fix captcha checkmark color so it looks nice on different themes	2024-09-05 11:15:06 +02:00
Mikkel Denker	ece1640458	forgot to remove a console.log...	2024-09-05 11:01:30 +02:00
Mikkel Denker	9d0dba7da2	move captcha images/audio to correct location during build	2024-09-05 10:49:15 +02:00
Mikkel Denker	0f3de6c0dc	make sure correct address is used when frontend is behind nginx	2024-09-04 17:22:06 +02:00
Mikkel Denker	40a6dde924	custom captcha to reduce the number of bots scraping the search results	2024-09-04 15:44:57 +02:00
Mikkel Denker	9c567ab720	add tooltip to show what the eye icon in optics settings does (#218 )	2024-09-02 09:56:51 +02:00
Mikkel Denker	9cba6c13fd	split 'Signal' trait into 'CoreSignal' and 'Signal' to distinguish between the signals that are calculated initially and later during ranking	2024-08-28 16:33:56 +02:00
Mikkel Denker	c6119e31d7	giant ranking pipeline refactor to separate ranking stages from sorting/offset logic this should make it easier to implement additional ranking stages in the future	2024-08-27 20:16:17 +02:00
Mikkel Denker	90aa27232c	fixed bug where new queries in serp wouldn't re-trigger search	2024-08-19 15:33:09 +02:00
Mikkel Denker	ee516eff95	Remove 'sr' parameter (#215 ) * Remove 'sr' parameter and use svelte store directly instead * remove lz-string dependency	2024-08-19 14:51:54 +02:00
Mikkel Denker	e63d6dc2e6	clamp title to 2 lines on mobile	2024-08-19 13:39:29 +02:00
Mikkel Denker	1f09a4247d	ltr experiment use differential evolution to optimize linear model live	2024-08-19 12:13:59 +02:00
Mikkel Denker	efa522a008	add small icon next to liked/disliked sites	2024-07-26 11:55:19 +02:00
Mikkel Denker	f6995402a1	bump crawler version	2024-06-28 13:50:58 +02:00
Mikkel Denker	5beae3b9a9	simplified bm25f that uses same IDF weight across all fields e.g. the term 'the' might not be very common in titles but should still be scaled as a less important term than other terms in the query. instead of duplicating all text in the index we approximate the bm25f IDF weight as the highest IDF across the fields	2024-06-20 15:01:41 +02:00
Mikkel Denker	265b1b7871	Ranking diff tool (#207 ) * ranking diff tool structure * fix missing icon types * add admin for queries and experiments * minor cleanup * show experiment progress * upgrade node adapter for svelte * hopefully fix ci * display common queries between experiments * display serp diffs with top signals for each result * like experiments and show overview in queries * settings to toggle experiment shuffle and show/hide signals * keyboard shortcuts * visualise improvements by query category * document how to use tool	2024-06-03 15:00:16 +02:00
Mikkel Denker	e39987a2f7	remove summarizer the probabilistic nature of llms means they have an inherent risc of hallucinating. even if they tend to cite correctly most of the time, the probability of hallucinations is still too large to be able to trust the output, thus defeating the purpose of the summary entirely. until these hallucinations are fixed (or the probability is extremely low) i don't see how it makes sense to include llms in search	2024-05-27 10:04:48 +02:00
Mikkel Denker	17fed5a75c	Show ranking signals (#201 )	2024-05-17 16:39:33 +02:00
Mikkel Denker	b302a8d5c7	coordinate changed nodes between workers in distrivuted harmonic to make sure a node that has been update in worker A is also considered for updates on worker B	2024-05-10 15:09:44 +02:00
Mikkel Denker	e5e2126e54	align snippet text and date horizontally	2024-05-07 09:33:30 +02:00
Mikkel Denker	3c94cb7f81	approximate number of hits by assuming that each term is independent this allows us to short-cirquit the query by default which significantly improves performance as we therefore don't have to iterate the non-scored results simply to count them	2024-05-06 15:21:17 +02:00
Mikkel Denker	bfa4ce9043	make sure long titles are truncated in serp	2024-05-06 12:14:48 +02:00
Mikkel Denker	d687270b78	small serp accessibility improvements group results and make sure heading is read first	2024-05-06 11:52:05 +02:00
Mikkel Denker	ccc28d7ade	make sure keyed each blocks have unique id to prevent infinite render retries from svelte when search results contain duplicate results	2024-05-04 17:54:30 +02:00

1 2 3 4 5 ...

277 commits