Commit graph

277 commits

Author SHA1 Message Date
Oliver Bøving
16e10e421a
Fix iOS searchbar bluring and HTML escape suggestions (#95)
iOS Safari has this behavior where they will issue the blur event for
focused elements before handling onclick events. This means that
clicking the suggestions would blur the input field, remove the
`focus-within:flex` on the suggestions, and then handle the click event
which now clicks nothing.

To prevent this, we manually control the focus, and prevent the blur
if the blur event was issued by an element within the suggestions.
It thus becomes the job of the suggestion to issue the blur after
updating the query and submitting the form.
2023-09-11 13:11:03 +00:00
Mikkel Denker
7bae5aa686 related entities in sidebar had the title of the original entity 2023-09-11 13:18:38 +02:00
Mikkel Denker
61b7a3deb2 remove webpage body from api and use snippets in alice and qa model to generate answers 2023-09-11 11:31:47 +02:00
Mikkel Denker
c02924d05c Move snippet highlighting into svelte.
This also removes the need to trust that the api has correctly sanitized the snippet from svelte.
2023-09-11 11:11:42 +02:00
Mikkel Denker
87a54eb20a add 'http' protocol to site links in /explore 2023-09-10 21:28:17 +02:00
Mikkel Denker
2784789a39 webmasters 2023-09-10 21:09:32 +02:00
Mikkel Denker
d4cb9cd81c fix improvements 2023-09-10 20:24:34 +02:00
Mikkel Denker
95fb9a871d autosuggest browser 2023-09-10 19:56:56 +02:00
Mikkel Denker
83fd76c63b bang redirect in new frontend 2023-09-10 19:52:33 +02:00
Mikkel Denker
f70dee9e98 need svelte-kit tsconfig 2023-09-10 19:35:48 +02:00
Oliver Bøving
2e2aff3da0
🥬 Svelte frontend (#91)
* remove deno frontend

* Add Svelte frontend

* change frontend port to 8000 and autofocus searchbar on frontpage

* Setup formatting of the new frontend with the new monorepo

* Add "show more" button to explore

* Add searchbar arrow key navigation

* Update query based on navigation in search bar

* Highlight mathcing prefix in search results

* Add toggling of site rankings to search results

* Fix crashing when having multiple semi-identical optics

* Refactor searchbar visibility

---------

Co-authored-by: Mikkel Denker <mikkel@trystract.com>
2023-09-10 16:32:03 +00:00
Oliver Bøving
2ae815177e
Send the API_BASE from env to the client during hydration (#89) 2023-09-04 14:21:30 +00:00
Mikkel Denker
6fa3ac8d78 make searchbar and ranking adjuster more visible on low-res displays 2023-09-04 15:16:52 +02:00
Mikkel Denker
eef1f763d3 if optic is empty it should be null to correctly trigger discussions search 2023-09-04 13:50:49 +02:00
Mikkel Denker
f8480ada94 control api_base with env variables for frontend 2023-09-04 13:03:28 +02:00
Oliver Bøving
9358db9933
Move all injectGlobal into components (#88)
Turns out they cannot be called from a global context when running
`deno run -A main.ts` since twind is not setup by the time the files are
loaded.

Fortunately, it turns out that the `injectGlobal` does not duplicate the
CSS for every component it is rendered in!
2023-09-04 09:00:26 +00:00
Oliver Bøving
369d5031df
Refactor Justfile and tracing with enabled debug tracing for stract (#87)
* Refactor Justfile and tracing with enabled debug tracing for stract

* Use `just dev` in `CONTRIBUTING.md`
2023-09-04 08:53:17 +00:00
Mikkel Denker
4f4f97eb8c don't show alice when disabled 2023-09-04 08:59:29 +02:00
Oliver Bøving
072a6323e9
🍋 Fresh frontend (#84)
* Add fresh frontend

This reimplements the existing frontend using Fresh. Primay highlights of
this new frontend is:

- Uses deno instead of node/npm for less dependencies. Deno for example
  includes a formatter and linter, and dependencies are downloaded
  automatically.
- Everything is TypeScript. There is no more .astro or similar, which
  reduces complexity.
- The frontend is built up of components entirely, which can either be
  server side rendered only, or rehidrated on the client for
  interactivity (islands).
- Fresh server side renderes all requests, populated by using the API,
  which is typesafe and generated from the OpenAPI spec.
- Combining the last two, it becomes much easier to add high levels of
  interactivity, which needed to be written in external JS files. Now
  these are Preact component and can use all lthe benefits that comes
  from this.

Future work includes:
- [ ] Integrating Alice in the new UI
- [ ] Direct answers UI
- [ ] Default Optics. Should they come from the API or the frontend?
- [ ] Integrating the new fresh server with the existing backend
- [ ] Rutes supplying `queryUrlPart` to `Header`

* Update fresh frontend to use "type" rather than "@type"

* Add placeholder Tailwind config for VSCode intellisense

* Add discussions UI

* Clean up some left over template `{{...}}`

* './icons' might not exist before generation

* some UI/UX changes for consistency with old frontend

* Remove unused ENABLE_CSP flag since it is always enabled now

* Store icons used for the frontend in the repository

* Don't generate icons when starting the frontend

* Fix chat textarea sizing in Firefox

* Add Chat UI to new frontend

* Only allow one of liked, disliked, blocked at a time

* Add `curosr-pointer` to safe search radio buttons

* Add `leading-6` to articles to get more line spacing

Almost equivalent to the old frontend

* Prefix explore and site ranking links with https://

Perhaps we should determine the protocol in a more robust way?

* Fix explore sites regressions from adding tailwind-forms

* Refactor manage optics UI

* Add API endpoint for exporting optic from site rankings

`/beta/api/sites/export` is a JSON equivilant of the existing
`/settings/sites/export` endpoint.

* Add "Export as optic" and "Clear all and export as optic" buttons

These new buttons use the new `/beta/api/sites/export` endpoint to
download the generated optic

* Store site rankings in URL and send it during searching

* Use the tailwind config to extend the twind theme

* Add `/beta/api/explore/export` API endpoint

* Fix optics export button on explore

* Reflect the currently searched optic in the optic selector

* Add `noscript:hidden` class to hide fx search result adjust buttons

* Re-search when changing ranking of a webpage

* Refactor searchbar interaction and suggestion highlighting

We now do the highlighting on the frontend

* Change site blocking to be domain blocking when converting site rankings to optics.
The domain field uses the public suffix list which already handles suffixes that can be shared by multiple users (netlify.app etc.).
In other words, the domain of 'site.netlify.app' is 'site.netlify.app', so users of stract can still block specific netlify sites without blocking them all.

* Pass around `queryUrlPart` between pages

* Do syntax highlighting server-side using HighlightJS

* Remove `facebook.com` as default site in explore

* Add webmasters page to new frontend

* Remove old frontend

* Remove dead code from old Rust frontend

* Rename webstract to frontend

* remove more stuff from old frontend

---------

Co-authored-by: Mikkel Denker <mikkel@trystract.com>
2023-09-04 05:59:28 +00:00
Mikkel Denker
5c6e552dcc forgot to update js files when renaming type in api 2023-08-28 13:30:09 +02:00
Mikkel Denker
24ffa32983
Safe search (#79)
* This should fix the byte/char index mixups identified in issue 77

* script to generate dataset

* naive bayes classification with tf-idf features

* Add prediction confidence to naive bayes.
we report the confidence as $log_probs[best] / sum(log_probs)$.
I'm not really sure this confidence calculation can be seen as a probability that the model has predicted the correct label, but should still give a picture of the confidence of the prediction. It's therefore named confidence and not probability.

Also, even though naive bayes is a pretty decent classifier some people on stackexchange report that it's a pretty bad probability estimator. Further tests will determine if this confidence score is actually useful.

* naive bayes benchmark

* store safe search classification in index

* search preferences page where user can control safe search settings
2023-08-26 16:36:33 +00:00
Mikkel Denker
7706dcdfa2 Partial support for compounded words.
the query "wishlist" now also matches search results that has the terms "wish list". This is done using the bigram- and trigram fields.

Support for "wish list" to match "wishlist" results is not included in this commit as this would require each term in the query to be aware of the succeeding terms and it is not immediatly clear how best to approach this.
2023-08-25 09:23:53 +02:00
Oliver Bøving
50304467c4
Add #[serde(tag = "type", content = "value")] to OpenAPI exposed types (#76)
* Add `serde(tag = "type", content = "value")` to OpenAPI exposed types

Makes them more ergonomic to work with in TypeScript in some scenarios.

* Add `#[serde(rename_all="camelCase")]` to all types deriving `ToSchema`

Currently two types are exempt: `Region` and `Expr`

* Update schema names to camelCase in external files
2023-08-21 20:02:56 +00:00
Mikkel Denker
f1403fa7aa fix brokwn link highlights in settings 2023-08-18 13:51:20 +02:00
Mikkel Denker
912dcc5a8e cannot use alpine when having strict CSP headers.
User security > developer convenience
2023-08-18 13:37:19 +02:00
Mikkel Denker
d2dc28215e forgot to remove explore script to separate file 2023-08-17 18:13:55 +02:00
Mikkel Denker
13ad5d7834 Moved more inline javascripts into files 2023-08-17 18:08:39 +02:00
Mikkel Denker
8819e6e6db Moved all inline scripts into separate files.
This allows us to set CSP headers that only allows js files from self which reduces the XSS attack surface quite substantially.
2023-08-17 15:22:43 +02:00
Mikkel Denker
7eb5387c90 ability to easily export site rankings as an optic 2023-08-17 13:40:12 +02:00
Mikkel Denker
22a8e7d4df preliminary api docs 2023-08-16 14:57:25 +02:00
Mikkel Denker
d2719b333e remove alice improvements 2023-08-16 10:09:21 +02:00
Mikkel Denker
cb9a04508b less locking in crawldb 2023-08-15 12:26:26 +02:00
Mikkel Denker
49db3704f7 entity sidebar went off-screen on mobile 2023-08-14 09:02:00 +02:00
Mikkel Denker
ce1e15a11b Fix url root domain parsing and faster memory mapped crawldb. 2023-08-09 14:44:28 +02:00
Mikkel Denker
2e885e9d11 rocksdb bloom filters caused OOM errors. Updated filter sizes to more sane defaults 2023-08-03 15:01:09 +02:00
Mikkel Denker
b323fa9e4b limit cache of robots.txt files 2023-08-02 14:14:01 +02:00
Mikkel Denker
69e06bb574 Some domain states got set to "Pending" even though a worker was already crawling them.
This primarily happened for popular sites that has many incoming links from many different sites.
2023-08-01 08:48:26 +02:00
Mikkel Denker
05b87c95dd http protocol in explore 2023-07-20 15:48:15 +02:00
Mikkel Denker
7c95231011 update crawler description text 2023-07-19 17:00:22 +02:00
Mikkel Denker
a545770b6e Change user agent of crawler.
Reddit seems to look for "bot" in the user agent. If they cannot find the substring, they return a page that updates the title with javascript. This causes us to have a bunch of reddit pages with the title "Reddit - Dive into anything" in the search results.
2023-07-19 16:40:30 +02:00
Mikkel Denker
330c02a96f rust update and new git screenshot 2023-07-18 11:04:58 +02:00
Mikkel Denker
1af94339bb increase max politeness, faster setup and some documentation 2023-07-13 17:53:46 +02:00
Mikkel Denker
14b922902e Don't show alice settings when alice is disabled.
That would just be confusing, as the user might not know what alice is
2023-06-29 13:39:16 +02:00
Mikkel Denker
9aba33cb99 Fuuuuuuuck had a bug that caused the crawler not to respect robots.txt :(. I'm terribly sorry if this has caused any inconvenience (luckily it was caught before any big crawls) 2023-06-27 14:14:34 +02:00
Mikkel Denker
4e100322fc Forgot to update the changelog from yesterdays privacy policy updates 2023-06-26 09:06:20 +02:00
Mikkel Denker
62eb00804e Make Alice configurable in frontend 2023-06-25 22:17:46 +02:00
Mikkel Denker
50fb827f72 Update privacy policy with Alice stuff 2023-06-25 17:45:47 +02:00
Mikkel Denker
20d6bb562f fix astrojs mdx to a specific version 2023-06-24 11:33:32 +02:00
Mikkel Denker
b5b1467631 added warning image to alice 2023-06-20 10:50:16 +02:00
Mikkel Denker
aacc1e3dc5 crawler 2023-06-18 12:26:08 +02:00