Просмотр исходного кода

Update readme for recent changes

Daoud Clarke 3 лет назад
Родитель
Сommit
c4e86ce313
1 измененных файлов с 22 добавлено и 16 удалено
  1. 22 16
      README.md

+ 22 - 16
README.md

@@ -14,6 +14,19 @@ crawler is still to be implemented.
 Our vision is a community working to provide top quality search
 particularly for hackers, funded purely by donations.
 
+Crawling
+========
+
+**Update 2022-02-05:** We now have a distributed crawler that runs on
+our volunteers' machines! If you have Firefox you can help out by
+[installing our
+extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/). This
+will crawl the web in the background, retrieving one page a second. It
+does not use or access any of your personal data. Instead it crawls
+the web at random, using the top scoring sites on Hacker News as seed
+pages. After extracting a summary of each page, it batches these up
+and sends the data to a central server to be stored and indexed.
+
 Why a non-profit search engine?
 ===============================
 
@@ -87,26 +100,16 @@ single term and maintain an index smaller than the inverted index
 design. Well, that's the theory. This idea has yet to be tested out on
 a large scale.
 
-Crawling
-========
-
-Our current index is a small sample of the excellent Common Crawl,
-restricted to English content and domains which score highly on
-average in Hacker News submissions. It is likely for a variety of
-reasons that we will want to go beyond Common Crawl data at some
-point, so building a crawler becomes inevitable. We plan to start work
-on a distributed crawler, probably implemented as a browser extension
-that can be installed by volunteers.
-
 How to contribute
 =================
 
 There are lots of ways to help:
+ - [Help us crawl the
+   web](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/)
+ - [Donate some money](https://opencollective.com/mwmbl) towards
+   hosting costs and supporting our volunteers
  - Give feedback/suggestions
- - Volunteer to test out the distributed crawler when it's ready
  - Help out with development of the engine itself
- - Donate some money towards hosting costs and/or founding an official
-   non-profit organisation
 
 If you would like to help in any of these or other ways, thank you!
 Please join our [Matrix chat
@@ -123,7 +126,7 @@ Development
 4. Run `$ docker run -p 8080:8080 mwmbl`
 
 ### Local Testing
-1. Create and activate a python (3.9) environment using any tool you like e.g. poetry,venv, conda etc.
+1. Create and activate a python (3.10) environment using any tool you like e.g. poetry,venv, conda etc.
 2. Run `$ pip install .`
 3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
 
@@ -132,4 +135,7 @@ Frequently Asked Question
 
 ### How do you pronounce "mwmbl"?
 
-Like "mumble". I live in [Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!"
+Like "mumble". I live in
+[Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt
+"Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in
+"don't search, just mwmbl!"