Update readme for recent changes

This commit is contained in:
Daoud Clarke 2022-02-04 22:07:09 +00:00
parent 51f2dd2690
commit c4e86ce313

View file

@ -14,6 +14,19 @@ crawler is still to be implemented.
Our vision is a community working to provide top quality search
particularly for hackers, funded purely by donations.
Crawling
========
**Update 2022-02-05:** We now have a distributed crawler that runs on
our volunteers' machines! If you have Firefox you can help out by
[installing our
extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/). This
will crawl the web in the background, retrieving one page a second. It
does not use or access any of your personal data. Instead it crawls
the web at random, using the top scoring sites on Hacker News as seed
pages. After extracting a summary of each page, it batches these up
and sends the data to a central server to be stored and indexed.
Why a non-profit search engine?
===============================
@ -87,26 +100,16 @@ single term and maintain an index smaller than the inverted index
design. Well, that's the theory. This idea has yet to be tested out on
a large scale.
Crawling
========
Our current index is a small sample of the excellent Common Crawl,
restricted to English content and domains which score highly on
average in Hacker News submissions. It is likely for a variety of
reasons that we will want to go beyond Common Crawl data at some
point, so building a crawler becomes inevitable. We plan to start work
on a distributed crawler, probably implemented as a browser extension
that can be installed by volunteers.
How to contribute
=================
There are lots of ways to help:
- [Help us crawl the
web](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/)
- [Donate some money](https://opencollective.com/mwmbl) towards
hosting costs and supporting our volunteers
- Give feedback/suggestions
- Volunteer to test out the distributed crawler when it's ready
- Help out with development of the engine itself
- Donate some money towards hosting costs and/or founding an official
non-profit organisation
If you would like to help in any of these or other ways, thank you!
Please join our [Matrix chat
@ -123,7 +126,7 @@ Development
4. Run `$ docker run -p 8080:8080 mwmbl`
### Local Testing
1. Create and activate a python (3.9) environment using any tool you like e.g. poetry,venv, conda etc.
1. Create and activate a python (3.10) environment using any tool you like e.g. poetry,venv, conda etc.
2. Run `$ pip install .`
3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
@ -132,4 +135,7 @@ Frequently Asked Question
### How do you pronounce "mwmbl"?
Like "mumble". I live in [Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!"
Like "mumble". I live in
[Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt
"Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in
"don't search, just mwmbl!"