Update readme for recent changes
This commit is contained in:
parent
51f2dd2690
commit
c4e86ce313
1 changed files with 22 additions and 16 deletions
38
README.md
38
README.md
|
@ -14,6 +14,19 @@ crawler is still to be implemented.
|
||||||
Our vision is a community working to provide top quality search
|
Our vision is a community working to provide top quality search
|
||||||
particularly for hackers, funded purely by donations.
|
particularly for hackers, funded purely by donations.
|
||||||
|
|
||||||
|
Crawling
|
||||||
|
========
|
||||||
|
|
||||||
|
**Update 2022-02-05:** We now have a distributed crawler that runs on
|
||||||
|
our volunteers' machines! If you have Firefox you can help out by
|
||||||
|
[installing our
|
||||||
|
extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/). This
|
||||||
|
will crawl the web in the background, retrieving one page a second. It
|
||||||
|
does not use or access any of your personal data. Instead it crawls
|
||||||
|
the web at random, using the top scoring sites on Hacker News as seed
|
||||||
|
pages. After extracting a summary of each page, it batches these up
|
||||||
|
and sends the data to a central server to be stored and indexed.
|
||||||
|
|
||||||
Why a non-profit search engine?
|
Why a non-profit search engine?
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
|
@ -87,26 +100,16 @@ single term and maintain an index smaller than the inverted index
|
||||||
design. Well, that's the theory. This idea has yet to be tested out on
|
design. Well, that's the theory. This idea has yet to be tested out on
|
||||||
a large scale.
|
a large scale.
|
||||||
|
|
||||||
Crawling
|
|
||||||
========
|
|
||||||
|
|
||||||
Our current index is a small sample of the excellent Common Crawl,
|
|
||||||
restricted to English content and domains which score highly on
|
|
||||||
average in Hacker News submissions. It is likely for a variety of
|
|
||||||
reasons that we will want to go beyond Common Crawl data at some
|
|
||||||
point, so building a crawler becomes inevitable. We plan to start work
|
|
||||||
on a distributed crawler, probably implemented as a browser extension
|
|
||||||
that can be installed by volunteers.
|
|
||||||
|
|
||||||
How to contribute
|
How to contribute
|
||||||
=================
|
=================
|
||||||
|
|
||||||
There are lots of ways to help:
|
There are lots of ways to help:
|
||||||
|
- [Help us crawl the
|
||||||
|
web](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/)
|
||||||
|
- [Donate some money](https://opencollective.com/mwmbl) towards
|
||||||
|
hosting costs and supporting our volunteers
|
||||||
- Give feedback/suggestions
|
- Give feedback/suggestions
|
||||||
- Volunteer to test out the distributed crawler when it's ready
|
|
||||||
- Help out with development of the engine itself
|
- Help out with development of the engine itself
|
||||||
- Donate some money towards hosting costs and/or founding an official
|
|
||||||
non-profit organisation
|
|
||||||
|
|
||||||
If you would like to help in any of these or other ways, thank you!
|
If you would like to help in any of these or other ways, thank you!
|
||||||
Please join our [Matrix chat
|
Please join our [Matrix chat
|
||||||
|
@ -123,7 +126,7 @@ Development
|
||||||
4. Run `$ docker run -p 8080:8080 mwmbl`
|
4. Run `$ docker run -p 8080:8080 mwmbl`
|
||||||
|
|
||||||
### Local Testing
|
### Local Testing
|
||||||
1. Create and activate a python (3.9) environment using any tool you like e.g. poetry,venv, conda etc.
|
1. Create and activate a python (3.10) environment using any tool you like e.g. poetry,venv, conda etc.
|
||||||
2. Run `$ pip install .`
|
2. Run `$ pip install .`
|
||||||
3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
|
3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
|
||||||
|
|
||||||
|
@ -132,4 +135,7 @@ Frequently Asked Question
|
||||||
|
|
||||||
### How do you pronounce "mwmbl"?
|
### How do you pronounce "mwmbl"?
|
||||||
|
|
||||||
Like "mumble". I live in [Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!"
|
Like "mumble". I live in
|
||||||
|
[Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt
|
||||||
|
"Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in
|
||||||
|
"don't search, just mwmbl!"
|
||||||
|
|
Loading…
Reference in a new issue