diff --git a/html/about/guide.html b/html/about/guide.html index aa40209..94b10dc 100755 --- a/html/about/guide.html +++ b/html/about/guide.html @@ -46,9 +46,9 @@ Wiby is a search engine for the World Wide Web. The source code is now free as o It includes a web interface allowing guardians to control where, how far, and how often it crawls websites and follows hyperlinks. The search index is stored inside of a MySQL full-text index.

-Fast queries are maintained by concurrently reading different sections of the index across multiple replication servers or across duplicate server connections, returning a list of top results from each connection, +Fast queries are maintained by concurrently searching different sections of the index across multiple replication servers or across duplicate server connections, returning a list of top results from each connection, then searching the combined list to ensure correct ordering. Replicas that fail are automatically excluded; new replicas are easy to include. -As new pages are crawled, they are stored randomly across the index, ensuring each replica can obtain relevant results.
+As new pages are crawled, they are stored randomly across the index, ensuring each search section can obtain relevant results.

The search engine is not meant to index the entire web and then sort it with a ranking algorithm. It prefers to seed its index through human submissions made by guests, or by the guardian(s) of the search engine. @@ -114,16 +114,7 @@ If you don't have a physical server, you can rent computing space by looking for

Install the following additional packages:

-
apt install build-essential php-gd libcurl4-openssl-dev libmysqlclient-dev mysql-server golang git
-
-For Ubuntu 20:
-go get -u github.com/go-sql-driver/mysql
-
-For Ubuntu 22 or latest Golang versions:
-go install github.com/go-sql-driver/mysql@latest
-go mod init mysql
-go get github.com/go-sql-driver/mysql
-
+
apt install build-essential php-gd libcurl4-openssl-dev libmysqlclient-dev mysql-server golang git

Get Wiby Source Files

Download the source directly from Wiby here, or from GitHub. The source is released under the GPLv2 license. Copy the source files for Wiby to your server. @@ -141,8 +132,15 @@ This could happen if you are not using Ubuntu 20. You might have to locate the c

Build the core server application:

+The core application is located inside the go folder. Run the following commands after copying the files over to your preferred location:
-Inside the go folder:
+For Ubuntu 20:
+go get -u github.com/go-sql-driver/mysql
+
+For Ubuntu 22 or latest Golang versions:
+go install github.com/go-sql-driver/mysql@latest
+go mod init mysql
+go get github.com/go-sql-driver/mysql
 
 go build core.go
 go build 1core.go
@@ -423,7 +421,7 @@ If you need to stop the web crawler in a situation where it was accidently queue
 

Scaling the Search Engine


You can help ensure sub-second search queries as your index grows by building MySQL replica servers on a local network close to eachother, run the core application AND replication tracker (rt) on one or more replica servers and point your reverse proxy to use it. -Edit the servers.csv file for rt to indicate all available replica servers. If you have a machine with a huge amount of resources and cores, entering multiple duplicate entries to the same sever inside servers.csv (e.g. one for each core) works also. +Edit the servers.csv file for rt to indicate all available replica servers. If you have a machine with a huge amount of resources and cores, entering multiple duplicate entries to the same sever inside servers.csv (e.g. one for each CPU core) works also.

The core application checks the replication tracker (rt) output to determine if any replicas are online, it will initiate a connection on those replicas and task each one to search a different section of the index, @@ -431,15 +429,15 @@ drastically speeding up search speeds especially for multi-word queries. By defa on line 373 and rebuild the core application.

-The number of available replicas must divide evenly into the search results per page limit (lim), OR, the search results per page limit must divide evenly into the number of available replicas. If there -is an excess of available replicas such that they do not divide evenly, those will remain in synch but will not be used for searches unless another replica fails. You can adjust the search results per page limit (lim) to a different value (default 12), -and then rebuild to make excess available replicas divide evenly (if necessary). +The search results per page limit must evenly divide 'into' OR 'by' the total number of connections defined in servers.csv. If there is an excess of available replicas such that +they do not divide evenly, those will remain in sync but will not be used for searches unless another replica fails. You can adjust the search results per page limit ('lim' inside core.go) to a different value (default 12), +then rebuild core.go and restart rt. Include the new page limit when you run rt since it is no longer default (eg for a limit of 10: './rt 10') to make excess available replicas divide evenly (if necessary).

The reverse proxy and replica servers can be connected through a VPN such as wireguard or openvpn, however the IPs for servers.csv should be the local IPs for the LAN the replicas are all connected on. Here is a tutorial for setting up MySQL replicas.

-Full instructions below: +Instructions for Building a MySQL Replica:

On the primary server add these lines to my.cnf under [mysqld] but only once you have a VPN to reach your replicas. Replace my.vpn.ip with your own.