|
@@ -0,0 +1,139 @@
|
|
|
|
+<?xml version='1.0' encoding='UTF-8'?>
|
|
|
|
+<document>
|
|
|
|
+ <properties>
|
|
|
|
+ <title>The General crawl settings</title>
|
|
|
|
+ <author>Shinsuke Sugaya</author>
|
|
|
|
+ </properties>
|
|
|
|
+ <body>
|
|
|
|
+ <section name='The General crawl settings'>
|
|
|
|
+ <p>Describes the settings related to crawling.</p>
|
|
|
|
+ <subsection name='How to set up'>
|
|
|
|
+ <p>In Administrator account click crawl General menu after login.</p>
|
|
|
|
+ <img alt='Crawl General' src='/images/ja/4.0/crawl-1.png'/>
|
|
|
|
+ <p>You can specify the path to a generated index and replication capabilities to enable.</p>
|
|
|
|
+ <img alt='Replication features' src='/images/ja/4.0/crawl-2.png'/>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Scheduled full crawl frequency'>
|
|
|
|
+ <p>You can set the interval at which the crawl for a Web site or file system. By default, the following.</p>
|
|
|
|
+ <source><![CDATA[
|
|
|
|
+0 0 0 * * ?
|
|
|
|
+]]></source>
|
|
|
|
+ <p>Figures are from left, seconds, minutes, during the day, month, represents a day of the week. Description format is similar to the Unix cron settings. This example, and am 0 時 0 分 to crawling daily.</p>
|
|
|
|
+ <p>Following are examples of how to write.</p>
|
|
|
|
+ <table class='table table-striped table-bordered table-condensed'>
|
|
|
|
+ <tbody>
|
|
|
|
+ <tr class='a'>
|
|
|
|
+ <td align='left'>0 0 12 * *?</td>
|
|
|
|
+ <td align='left'>Each day starts at 12 pm</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='b'>
|
|
|
|
+ <td align='left'>0 15 10? * *</td>
|
|
|
|
+ <td align='left'>Day 10: 15 am start</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='a'>
|
|
|
|
+ <td align='left'>0 15 10 * *?</td>
|
|
|
|
+ <td align='left'>Day 10: 15 am start</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='b'>
|
|
|
|
+ <td align='left'>0 15 10 * *? *</td>
|
|
|
|
+ <td align='left'>Day 10: 15 am start</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='a'>
|
|
|
|
+ <td align='left'>0 15 10 * *? 2005</td>
|
|
|
|
+ <td align='left'>Each of the 2009 start am, 10:15</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='b'>
|
|
|
|
+ <td align='left'>0 * 14 * *?</td>
|
|
|
|
+ <td align='left'>Every day 2:00 in the PM-2: 59 pm start every 1 minute</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='a'>
|
|
|
|
+ <td align='left'>0 0 / 5 14 * *?</td>
|
|
|
|
+ <td align='left'>Every day 2:00 in the PM-2: 59 pm start every 5 minutes</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='b'>
|
|
|
|
+ <td align='left'>0 0 / 5 14, 18 * *?</td>
|
|
|
|
+ <td align='left'>Every day 2:00 pm-2: 59 pm and 6: 00 starts every 5 minutes at the PM-6: 59 pm</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='a'>
|
|
|
|
+ <td align='left'>0 0-5 14 * *?</td>
|
|
|
|
+ <td align='left'>Every day 2:00 in the PM-2: 05 pm start every 1 minute</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='b'>
|
|
|
|
+ <td align='left'>0 10, 44 14? 3 WED</td>
|
|
|
|
+ <td align='left'>Starts Wednesday March 2: 10 and 2: 44 pm</td>
|
|
|
|
+ </tr>
|
|
|
|
+ <tr class='a'>
|
|
|
|
+ <td align='left'>0 15 10? * MON-FRI</td>
|
|
|
|
+ <td align='left'>Monday through Friday at 10:15 am start</td>
|
|
|
|
+ </tr>
|
|
|
|
+ </tbody>
|
|
|
|
+ </table>
|
|
|
|
+ <p>Also check if the seconds can be set to run at intervals 60 seconds by default. If you set seconds exactly and you should customize webapps/fess/WEB-INF/classes/chronosCustomize.dicon taskScanIntervalTime value, if enough do I see in one-hour increments.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Search log'>
|
|
|
|
+ <p>When the user enters a search, the search the output log. If you want to get search statistics to enable.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Add search parameters'>
|
|
|
|
+ <p>Search results link attaches to the search term. To display the find search terms in PDF becomes possible.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='XML response'>
|
|
|
|
+ <p>Search results can be retrieved in XML format. http://localhost:8080/Fess/XML? can get access query = search term.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='JSON response'>
|
|
|
|
+ <p>Search results available in JSON format. http://localhost:8080/Fess/JSON? can get access query = search term.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Mobile translation'>
|
|
|
|
+ <p>If theses PC website search results on mobile devices may not display correctly. And select the mobile conversion, such as if the PC site for mobile terminals, and to show that you can. You can if you choose Google Google Wireless Transcoder allows to display content on mobile phones. For example, if site for PC and mobile devices browsing the results in the search for mobile terminals search results will link in the search result link passes the Google Wireless Transcoder. You can use smooth mobile transformation in mobile search.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='The default label value'>
|
|
|
|
+ <p>You can specify the label to see if the label by default,. Specifies the value of the label.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Search support'>
|
|
|
|
+ <p>You can specify whether or not to display a search screen. If you select Web unusable for mobile search screen. If not available not available search screen. And if you want to create a dedicated index server and select not available.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Featured keyword response'>
|
|
|
|
+ <p>In JSON format often find search words becomes available. can be retrieved by accessing the http://localhost:8080/Fess/hotsearchword.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Specify the number of days before session information removed'>
|
|
|
|
+ <p>Delete a session log for the specified number of days ago. One day in the one log purge old log is deleted.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Specify the number of days before search log delete'>
|
|
|
|
+ <p>Delete a search log for the specified number of days ago. One day in the one log purge old log is deleted.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Log deletion Bots name'>
|
|
|
|
+ <p>Specifies the Bots name Bots you want to remove from the search log logs included in the user agent by commas (,). Log is deleted by log purge once a day.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='CSV encoding'>
|
|
|
|
+ <p>Specifies the encoding for the CSV will be available in the backup and restore.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Replication features'>
|
|
|
|
+ <p>To enable replication features that can apply already copied the Solr index generated. For example, you can use them if you want to search only in the search servers crawled and indexed on a different server, placed in front.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Index commit, optimize'>
|
|
|
|
+ <p>After the data is registered for Solr. Index to commit or to optimize the registered data becomes available. If optimize is issued the Solr index optimization, if you have chosen, you choose to commit the commit is issued.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Server switchovers'>
|
|
|
|
+ <p>Fess can combine multiple Solr server as a group, the group can manage multiple. Solr server group for updates and search for different groups to use. For example, if you had two groups using the Group 2 for update, search for use of Group 1. After the crawl has been completed if switching server updates for Group 1, switches to group 2 for the search. It is only valid if you have registered multiple Solr server group.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Committed to the document number of each'>
|
|
|
|
+ <p>To raise the performance of the index in Fess while crawling and sends for Solr document in 20 units. For each value specified here because without committing to continue adding documents documents added in the Solr on performance, Solr issued document commits. By default, after you add documents 1000 is committed.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Number of concurrent crawls settings'>
|
|
|
|
+ <p>Fess document crawling is done on Web crawling, and file system CROLL. You can crawl to a set number of values in each crawl specified here only to run simultaneously multiple. For example, crawl setting number of concurrent as 3 Web crawling set 1-set 10 if the crawling runs until the set 3 3 set 1-. Complete crawl of any of them, and will start the crawl settings 4. Similarly, setting 10 to complete one each in we will start one.</p>
|
|
|
|
+ <p>But you can specify the number of threads in the crawl settings simultaneously run crawl setting number is not indicates the number of threads to start. For example, if 3 in the number of concurrent crawls settings, number of threads for each crawl settings and 5 3 x 5 = 15 thread count up and crawling.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Expiration date of the index'>
|
|
|
|
+ <p>You can automatically delete data after the data has been indexed. If you select the 5, with the expiration of index register at least 5 days before and had no update is removed. If you omit data content has been removed, can be used.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Disability types to exclude'>
|
|
|
|
+ <p>Registered disabled URL URL exceeds the failure count next time you crawl to crawl out. No need to worry about disability type is crawled next time by specifying this value.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Failure count'>
|
|
|
|
+ <p>Disaster URL exceeds the number of failures will crawl out.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ <subsection name='Snapshot path'>
|
|
|
|
+ <p>Copy index information from the index directory as the snapshot path, if replication is enabled, will be applied.</p>
|
|
|
|
+ </subsection>
|
|
|
|
+ </section>
|
|
|
|
+ </body>
|
|
|
|
+</document>
|