public obeyRobotsTxt($mode)
$mode | bool | Set to TRUE if you want the crawler to obey robots.txt-files. |
bool |
If this is set to TRUE, the crawler looks for a robots.txt-file for every host that sites or files should be received
from during the crawling process. If a robots.txt-file for a host was found, the containig directives appliying to the
useragent-identification of the cralwer
("PHPCrawl" or manually set by calling setUserAgentString()) will be obeyed.
The default-value is FALSE (for compatibility reasons).
Pleas note that the directives found in a robots.txt-file have a higher priority than other settings made by the user.
If e.g. addFollowMatch("#http://foo\.com/path/file\.html#") was set, but a directive in the robots.txt-file of the host
foo.com says "Disallow: /path/", the URL http://foo.com/path/file.html will be ignored by the crawler anyway.