Update guide.html

This commit is contained in:
wibyweb 2022-10-07 00:10:20 -04:00 committed by GitHub
parent b4eb3d8926
commit 61139dd566
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -242,7 +242,7 @@ You may want to run this on startup, easiest way to set that is with a cron job
<br>
<h3>Start the Crawler</h3>
It is best to run the crawler in a screen session so that you can monitor its output. You can have more than one crawler running as long as you keep them in separate directories, include a symlink to the same robots folder, and also set the correct parameters on each.
To view the parameters, type './cr -h'. Without any parameters set, you can only run one crawler (which is probably all you need anyway). If necessary, you can change the connection from 'localhost' to a different IP from inside cr.c, then rebuild.
To view the parameters, type './cr -h'. Without any parameters set, you can only run one crawler (which is probably all you need anyway). If necessary, you can change the database connection from 'localhost' to a different IP from inside cr.c, then rebuild.
<br>
<br>
Note that you may need to change the crawler's user-agent if you have issues indexing some websites. Pages that fail to index are noted inside of abandoned.txt.
@ -413,7 +413,7 @@ If you need to stop the web crawler in a situation where it was accidently queue
<hr>
<h2><a name="scale">Scaling the Search Engine</a></h2>
<br>
You can help ensure sub-second search queries as your index grows by building MySQL replica servers on a local netowork close to eachother, run the core application AND replication tracker (rt) on one or more replica servers and point your reverse proxy to use it.
You can help ensure sub-second search queries as your index grows by building MySQL replica servers on a local network close to eachother, run the core application AND replication tracker (rt) on one or more replica servers and point your reverse proxy to use it.
Edit the servers.csv file for rt to indicate all available replica servers. If you have a machine with a huge amount of resources and cores, entering multiple duplicate entries to the same sever inside servers.csv (e.g. one for each core) works also.
<br>
<br>