Ver código fonte

Merge pull request #42 from mwmbl/update-readme-for-new-crawler

Update readme for recent changes
Daoud Clarke 3 anos atrás
pai
commit
4e36ee198c
1 arquivos alterados com 22 adições e 16 exclusões
  1. 22 16
      README.md

+ 22 - 16
README.md

@@ -14,6 +14,19 @@ crawler is still to be implemented.
 Our vision is a community working to provide top quality search
 Our vision is a community working to provide top quality search
 particularly for hackers, funded purely by donations.
 particularly for hackers, funded purely by donations.
 
 
+Crawling
+========
+
+**Update 2022-02-05:** We now have a distributed crawler that runs on
+our volunteers' machines! If you have Firefox you can help out by
+[installing our
+extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/). This
+will crawl the web in the background, retrieving one page a second. It
+does not use or access any of your personal data. Instead it crawls
+the web at random, using the top scoring sites on Hacker News as seed
+pages. After extracting a summary of each page, it batches these up
+and sends the data to a central server to be stored and indexed.
+
 Why a non-profit search engine?
 Why a non-profit search engine?
 ===============================
 ===============================
 
 
@@ -87,26 +100,16 @@ single term and maintain an index smaller than the inverted index
 design. Well, that's the theory. This idea has yet to be tested out on
 design. Well, that's the theory. This idea has yet to be tested out on
 a large scale.
 a large scale.
 
 
-Crawling
-========
-
-Our current index is a small sample of the excellent Common Crawl,
-restricted to English content and domains which score highly on
-average in Hacker News submissions. It is likely for a variety of
-reasons that we will want to go beyond Common Crawl data at some
-point, so building a crawler becomes inevitable. We plan to start work
-on a distributed crawler, probably implemented as a browser extension
-that can be installed by volunteers.
-
 How to contribute
 How to contribute
 =================
 =================
 
 
 There are lots of ways to help:
 There are lots of ways to help:
+ - [Help us crawl the
+   web](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/)
+ - [Donate some money](https://opencollective.com/mwmbl) towards
+   hosting costs and supporting our volunteers
  - Give feedback/suggestions
  - Give feedback/suggestions
- - Volunteer to test out the distributed crawler when it's ready
  - Help out with development of the engine itself
  - Help out with development of the engine itself
- - Donate some money towards hosting costs and/or founding an official
-   non-profit organisation
 
 
 If you would like to help in any of these or other ways, thank you!
 If you would like to help in any of these or other ways, thank you!
 Please join our [Matrix chat
 Please join our [Matrix chat
@@ -123,7 +126,7 @@ Development
 4. Run `$ docker run -p 8080:8080 mwmbl`
 4. Run `$ docker run -p 8080:8080 mwmbl`
 
 
 ### Local Testing
 ### Local Testing
-1. Create and activate a python (3.9) environment using any tool you like e.g. poetry,venv, conda etc.
+1. Create and activate a python (3.10) environment using any tool you like e.g. poetry,venv, conda etc.
 2. Run `$ pip install .`
 2. Run `$ pip install .`
 3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
 3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
 
 
@@ -132,4 +135,7 @@ Frequently Asked Question
 
 
 ### How do you pronounce "mwmbl"?
 ### How do you pronounce "mwmbl"?
 
 
-Like "mumble". I live in [Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!"
+Like "mumble". I live in
+[Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt
+"Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in
+"don't search, just mwmbl!"