move english pages.
This commit is contained in:
parent
729ab8c814
commit
c24deb14ad
1059 changed files with 27121 additions and 27468 deletions
|
@ -1,111 +0,0 @@
|
|||
------------------
|
||||
Installation
|
||||
------------------
|
||||
Shinsuke Sugaya
|
||||
------------------
|
||||
2010-11-12
|
||||
------------------
|
||||
|
||||
Installation
|
||||
|
||||
A system requirement to run Fess server is:
|
||||
|
||||
* OS: an operation system which has Java runtime environment, such as Windows and Unix.
|
||||
|
||||
* Java: Java 7 or the above
|
||||
|
||||
[]
|
||||
|
||||
You can download Java 7 from {{{http://java.sun.com/}http://java.sun.com/}}.
|
||||
|
||||
Download
|
||||
|
||||
Download the latest Fess package from {{{http://sourceforge.jp/projects/fess/releases/}http://sourceforge.jp/projects/fess/releases/}}.
|
||||
|
||||
Installation
|
||||
|
||||
Unzip fess-server-x.y.zip.
|
||||
Grant an execution permission to script files in bin directory if you install on Unix environment.
|
||||
|
||||
+---------------------------------+
|
||||
$ unzip fess-server-x.y.zip
|
||||
$ cd fess-server-x.y
|
||||
$ chmod +x bin/*.sh # (Unix only)
|
||||
+---------------------------------+
|
||||
|
||||
Start Fess
|
||||
|
||||
Run startup script to start Fess server.
|
||||
|
||||
+---------------------------------+
|
||||
$ ./bin/startup.sh # (Use startup.bat on Windows)
|
||||
+---------------------------------+
|
||||
|
||||
Access to {{{http://localhost:8080/fess/}http://localhost:8080/fess/}} to check if Fess is running.
|
||||
|
||||
The administrative UI is {{{http://localhost:8080/fess/admin/}http://localhost:8080/fess/admin/}}.
|
||||
The default username/password is admin/admin.
|
||||
"admin" account has fess role, which is used as an administrative role.
|
||||
|
||||
Stop Fess
|
||||
|
||||
Run shutdown script to stop Fess server.
|
||||
|
||||
+---------------------------------+
|
||||
$ ./bin/shutdown.sh # (Use shutdown.bat on Windows)
|
||||
+---------------------------------+
|
||||
|
||||
Change Password for Administrator User
|
||||
|
||||
"admin" account is managed by an application server.
|
||||
Fess server contains Tomcat as the app server.
|
||||
Therefore, changing password is the same as Tomcat's steps.
|
||||
If you want to change a password, modify a password of admin user in conf/tomcat-user.xml.
|
||||
|
||||
+---------------------------------+
|
||||
<user username="admin" password="admin" roles="fess"/>
|
||||
+---------------------------------+
|
||||
|
||||
Change Password for Solr Server
|
||||
|
||||
Fess server contains Apache Solr as a search engine, and a password is needed to access to it.
|
||||
In a production envrionment, CHANGE the password.
|
||||
|
||||
First, changing the password, change solradmin of a password attribute in conf/tomcat-user.xml.
|
||||
|
||||
+---------------------------------+
|
||||
<user username="solradmin" password="solradmin" roles="solr"/>
|
||||
+---------------------------------+
|
||||
|
||||
Next, set the same one to the following Password element in webapps/fess/WEB-INF/classes/solrlib.dicon, webapps/fess/WEB-INF/classes/fess_suggest.dicon and solr/core1/conf/solrconfig.xml.
|
||||
|
||||
solrlib.dicon:
|
||||
|
||||
+---------------------------------+
|
||||
<component class="org.apache.commons.httpclient.UsernamePasswordCredentials">
|
||||
<arg>"solradmin"</arg> <!-- Username -->
|
||||
<arg>"solradmin"</arg> <!-- Password -->
|
||||
</component>
|
||||
+---------------------------------+
|
||||
|
||||
fess_suggest.dicon:
|
||||
|
||||
+---------------------------------+
|
||||
<component name="suggestCredentials" class="org.apache.http.auth.UsernamePasswordCredentials">
|
||||
<arg>"solradmin"</arg> <!-- Username -->
|
||||
<arg>"solradmin"</arg> <!-- Password -->
|
||||
</component>
|
||||
+---------------------------------+
|
||||
|
||||
solrconfig.xml:
|
||||
|
||||
+---------------------------------+
|
||||
<suggest>
|
||||
<solrServer class="org.codelibs.solr.lib.server.SolrLibHttpSolrServer">
|
||||
<arg>http://localhost:8080/solr/core1-suggest</arg>
|
||||
<credentials>
|
||||
<username>solradmin</username> <!-- Username -->
|
||||
<password>solradmin</password> <!-- Password -->
|
||||
</credentials>
|
||||
+---------------------------------+
|
||||
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting the browser type</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting the browser type'>
|
||||
<p>Describes the settings related to the browser type. Search results are browser type can be added to the data, for each type of browser browsing search results out into.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu browser types.</p>
|
||||
<img alt='Setting the browser type' src='/images/ja/2.0/browserType-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>You can set the display name and value. It is used if you want more new terminals. You do not need special customizations are used only where necessary.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,100 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>The General crawl settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='The General crawl settings'>
|
||||
<p>Describes the settings related to crawling.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account click crawl General menu after login.</p>
|
||||
<img alt='Crawl General' src='/images/ja/2.0/crawl-1.png'/>
|
||||
<p>You can specify the path to a generated index and replication capabilities to enable.</p>
|
||||
<img alt='Replication features' src='/images/ja/2.0/crawl-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='Scheduled full crawl frequency'>
|
||||
<p>You can set the interval at which the crawl for a Web site or file system. By default, the following.</p>
|
||||
<source><![CDATA[
|
||||
0 0 0 * * ?
|
||||
]]></source>
|
||||
<p>Figures are from left, seconds, minutes, during the day, month, represents a day of the week. Description format is similar to the Unix cron settings. This example, and am 0 時 0 分 to crawling daily.</p>
|
||||
<p>Following are examples of how to write.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0 12 * *?</td>
|
||||
<td align='left'>Each day starts at 12 pm</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 15 10? * *</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10 * *?</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 15 10 * *? *</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10 * *? 2005</td>
|
||||
<td align='left'>Each of the 2009 start am, 10:15</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 * 14 * *?</td>
|
||||
<td align='left'>Every day 2:00 in the PM-2: 59 pm start every 1 minute</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0 / 5 14 * *?</td>
|
||||
<td align='left'>Every day 2:00 in the PM-2: 59 pm start every 5 minutes</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 0 / 5 14, 18 * *?</td>
|
||||
<td align='left'>Every day 2:00 pm-2: 59 pm and 6: 00 starts every 5 minutes at the PM-6: 59 pm</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0-5 14 * *?</td>
|
||||
<td align='left'>Every day 2:00 in the PM-2: 05 pm start every 1 minute</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 10, 44 14? 3 WED</td>
|
||||
<td align='left'>Starts Wednesday March 2: 10 and 2: 44 pm</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10? * MON-FRI</td>
|
||||
<td align='left'>Monday through Friday at 10:15 am start</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>Also check if the seconds can be set to run at intervals 60 seconds by default. If you set seconds exactly and you should customize webapps/fess/WEB-INF/classes/chronosCustomize.dicon taskScanIntervalTime value, if enough do I see in one-hour increments.</p>
|
||||
</subsection>
|
||||
<subsection name='Mobile translation'>
|
||||
<p>If theses PC website search results on mobile devices may not display correctly. And select the mobile conversion, such as if the PC site for mobile terminals, and to show that you can. You can if you choose Google Google Wireless Transcoder allows to display content on mobile phones. For example, if site for PC and mobile devices browsing the results in the search for mobile terminals search results will link in the search result link passes the Google Wireless Transcoder. You can use smooth mobile transformation in mobile search.</p>
|
||||
</subsection>
|
||||
<subsection name='Replication features'>
|
||||
<p>To enable replication features that can apply already copied the Solr index generated. For example, you can use them if you want to search only in the search servers crawled and indexed on a different server, placed in front.</p>
|
||||
</subsection>
|
||||
<subsection name='Index commit, optimize'>
|
||||
<p>After the data is registered for Solr. Index to commit or to optimize the registered data becomes available. If optimize is issued the Solr index optimization, if you have chosen, you choose to commit the commit is issued.</p>
|
||||
</subsection>
|
||||
<subsection name='Server switchovers'>
|
||||
<p>Fess can combine multiple Solr server as a group, the group can manage multiple. Solr server group for updates and search for different groups to use. For example, if you had two groups using the Group 2 for update, search for use of Group 1. After the crawl has been completed if switching server updates for Group 1, switches to group 2 for the search. It is only valid if you have registered multiple Solr server group.</p>
|
||||
</subsection>
|
||||
<subsection name='Committed to the document number of each'>
|
||||
<p>To raise the performance of the index in Fess while crawling and sends for Solr document in 20 units. For each value specified here because without committing to continue adding documents documents added in the Solr on performance, Solr issued document commits. By default, after you add documents 1000 is committed.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of concurrent crawls settings'>
|
||||
<p>Fess document crawling is done on Web crawling, and file system CROLL. You can crawl to a set number of values in each crawl specified here only to run simultaneously multiple. For example, crawl setting number of concurrent as 3 Web crawling set 1-set 10 if the crawling runs until the set 3 3 set 1-. Complete crawl of any of them, and will start the crawl settings 4. Similarly, setting 10 to complete one each in we will start one.</p>
|
||||
<p>But you can specify the number of threads in the crawl settings simultaneously run crawl setting number is not indicates the number of threads to start. For example, if 3 in the number of concurrent crawls settings, number of threads for each crawl settings and 5 3 x 5 = 15 thread count up and crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Expiration date of the index'>
|
||||
<p>You can automatically delete data after the data has been indexed. If you select the 5, with the expiration of index register at least 5 days before and had no update is removed. If you omit data content has been removed, can be used.</p>
|
||||
</subsection>
|
||||
<subsection name='Snapshot path'>
|
||||
<p>Copy index information from the index directory as the snapshot path, if replication is enabled, will be applied.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,34 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Set session information</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Set session information'>
|
||||
<p>Describes the settings related to the session information. One time the crawl results saved as a single session information. You can check the run time and the number of indexed.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the session information menu.</p>
|
||||
</subsection>
|
||||
<subsection name='Session information list'>
|
||||
<img alt='Session information list' src='/images/ja/2.0/crawlingSession-1.png'/>
|
||||
<p>You can remove all session information and click the Delete link all in the running.</p>
|
||||
</subsection>
|
||||
<subsection name='Session details'>
|
||||
<img alt='Session details' src='/images/ja/2.0/crawlingSession-2.png'/>
|
||||
<p>To specify a session ID, you can see crawling content.</p>
|
||||
<ul>
|
||||
<li>Information about the entire crawl Cralwer *:</li>
|
||||
<li>FsCrawl *: information about the file system crawling</li>
|
||||
<li>WebCrawl *: crawling the Web information</li>
|
||||
<li>Information issued by Solr server optimization optimize *:</li>
|
||||
<li>Commit *: information about the commit was issued to the Solr server.</li>
|
||||
<li>* StartTime: start time</li>
|
||||
<li>* EndTime: end time</li>
|
||||
<li>* ExecTime: execution time (MS)</li>
|
||||
<li>* IndexSize: number of documents indexed</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,33 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Configuration backup and restore</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Configuration backup and restore'>
|
||||
<p>Here, describes Fess information backup and restore methods.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu backup and restore.</p>
|
||||
<img alt='Backup and restore' src='/images/ja/2.0/data-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Backup settings'>
|
||||
<p>Click the download link and Fess information output in XML format. Saved settings information is below.</p>
|
||||
<ul>
|
||||
<li>The General crawl settings</li>
|
||||
<li>Web crawl settings</li>
|
||||
<li>File system Crawl settings</li>
|
||||
<li>Path mapping</li>
|
||||
<li>Web authentication</li>
|
||||
<li>Compatible browsers</li>
|
||||
<li>Session information</li>
|
||||
</ul>
|
||||
<p>In the SOLR index data and data being crawled is not backed up. Those data can Fess setting information to crawl after the restore, regenerate.</p>
|
||||
</subsection>
|
||||
<subsection name='Restore settings'>
|
||||
<p>You can restore the configuration information by uploading the XML outputted by the backup. Specify the XML file, click the restore button on the data.</p>
|
||||
<p>If there is already to enable overwriting of data, the same data does update existing data.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,69 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Appearance settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Appearance settings'>
|
||||
<p>Here are settings for the design of search screens.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu design.</p>
|
||||
<img alt='Design' src='/images/ja/2.0/design-1.png'/>
|
||||
<p>You can edit the search screen in the screen below.</p>
|
||||
<img alt='JSP compilation screen' src='/images/ja/2.0/design-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='Image file'>
|
||||
<p>You can upload the image files to use in the search screen. Image file names are supported are jpg, gif and png.</p>
|
||||
</subsection>
|
||||
<subsection name='Image file name'>
|
||||
<p>If you want the file name to upload image files to use. Uploaded if you omit the file name will be used.</p>
|
||||
</subsection>
|
||||
<subsection name='Design JSP files'>
|
||||
<p>You can edit the JSP files in the search screen. You can by pressing the Edit button of the JSP file, edit the current JSP files. And pressing the button will default to edit as a JSP file when you install. To keep with the update button in the Edit screen, changes are reflected.</p>
|
||||
<p>Following are examples of how to write.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>Top page (frame)</td>
|
||||
<td align='left'>Is a JSP file search home page. This JSP include JSP file of each part.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Top page (within the Head tags)</td>
|
||||
<td align='left'>This is the express search home page head tag in JSP files. If you want to edit the meta tags, title tags, script tags, such as the change.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Top page (content)</td>
|
||||
<td align='left'>Is a JSP file to represent the body tag in the search home page.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results pages (frames)</td>
|
||||
<td align='left'>Search result is a list page of JSP files. This JSP include JSP file of each part.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (within the Head tags)</td>
|
||||
<td align='left'>Search result is a JSP file to represent within the head tag of the list page. If you want to edit the meta tags, title tags, script tags, such as the change.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results page (header)</td>
|
||||
<td align='left'>Search result is a JSP file to represent the header of the list page. Include search form at the top.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (footer)</td>
|
||||
<td align='left'>Search result is a JSP file that represents the footer part of the page. Contains the copyright page at the bottom.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results pages (content)</td>
|
||||
<td align='left'>Search results search results list page is a JSP file to represent the part. Is the search results when the JSP file. If you want to customize the search result representation change.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (result no)</td>
|
||||
<td align='left'>Search results search results list page is a JSP file to represent the part. Is a JSP file when the search result is not used.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>You can to edit for PCs and similar portable screen.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,96 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for crawling a file system using</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for crawling a file system using'>
|
||||
<p>Describes the settings for crawl here, using file system.</p>
|
||||
<p>Recommends that if you want to index document number 100000 over in Fess crawl settings for one to several tens of thousands of these. One crawl setting a target number 100000 from the indexed performance degrades.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu file.</p>
|
||||
<img alt='Setting file system Crawl' src='/images/ja/2.0/fileCrawlingConfig-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Setting name'>
|
||||
<p>Is the name that appears on the list page.</p>
|
||||
</subsection>
|
||||
<subsection name='Specifying a path'>
|
||||
<p>You can specify multiple paths. file: in the specify starting. For example,</p>
|
||||
<source><![CDATA[
|
||||
file:/home/taro/
|
||||
file:/home/documents/
|
||||
]]></source>
|
||||
<p>The so determines. Patrolling below the specified directory.</p>
|
||||
<p>So there is need to write URI if the Windows environment path that c:\Documents\taro in file/c: /Documents/taro and specify.</p>
|
||||
</subsection>
|
||||
<subsection name='Path filtering'>
|
||||
<p>By specifying regular expressions you can exclude the crawl and search for given path pattern.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>Path to crawl</th>
|
||||
<td>Crawl the path for the specified regular expression.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>The path to exclude from being crawled</th>
|
||||
<td>The path for the specified regular expression does not crawl. The path you want to crawl, even WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Path to be searched</th>
|
||||
<td>The path for the specified regular expression search. Even if specified path to find excluded and WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Path to exclude from searches</th>
|
||||
<td>Not search the path for the specified regular expression. Unable to search all links since they exclude from being crawled and crawled when the search and not just some.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For example, the path to target if you don't crawl less than/home /</p>
|
||||
<source><![CDATA[
|
||||
file:/home/.*
|
||||
]]></source>
|
||||
<p>Also the path to exclude if extension of png want to exclude from</p>
|
||||
<source><![CDATA[
|
||||
.*\.png$
|
||||
]]></source>
|
||||
<p>It specifies. It is possible to specify multiple line breaks in.</p>
|
||||
<p>How to specify the URI handling java.io.File: Looks like:</p>
|
||||
<source><![CDATA[
|
||||
/home/taro -> file:/home/taro
|
||||
c:\memo.txt -> file:/c:/memo.txt
|
||||
\\server\memo.txt -> file:////server/memo.txt
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Depth'>
|
||||
<p>Specify the depth of a directory hierarchy.</p>
|
||||
</subsection>
|
||||
<subsection name='Maximum access'>
|
||||
<p>You can specify the number of documents to retrieve crawl.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of threads'>
|
||||
<p>Specifies the number of threads you want to crawl. Value of 5 in 5 threads crawling the website at the same time.</p>
|
||||
</subsection>
|
||||
<subsection name='Interval'>
|
||||
<p>Is the time interval to crawl documents. 5000 when one thread is 5 seconds at intervals Gets the document.</p>
|
||||
<p>Number of threads, 5 pieces, will be to go to and get the 5 documents per second between when 1000 millisecond interval,.</p>
|
||||
</subsection>
|
||||
<subsection name='Boost value'>
|
||||
<p>You can search URL in this crawl setting to weight. Available in the search results on other than you want to. The standard is 1. Priority higher values, will be displayed at the top of the search results. If you want to see results other than absolutely in favor, including 10,000 sufficiently large value.</p>
|
||||
<p>Values that can be specified is an integer greater than 0. This value is used as the boost value when adding documents to Solr.</p>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>Register the browser type was selected as the crawled documents. Even if you select only the PC search on your mobile device not appear in results. If you want to see only specific mobile devices also available.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>You can control only when a particular user role can appear in search results. You must roll a set before you. > For example, available by the user in the system requires a login, such as portal servers, search results out if you want.</p>
|
||||
</subsection>
|
||||
<subsection name='Label'>
|
||||
<p>You can label with search results. Search on each label, such as enable, in the search screen, specify the label.</p>
|
||||
</subsection>
|
||||
<subsection name='State'>
|
||||
<p>Crawl crawl time, is set to enable. If you want to avoid crawling temporarily available.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Management UI Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 2.0 administrative UI Guide'>
|
||||
<p>Here is a description of Fess 2.0 administrative UI.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting a label</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting a label'>
|
||||
<p>Here are settings for the label. Label can classify documents that appear in search results, select the crawl settings in. If you register the label shown select label drop-down box to the right of the search box.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu label.</p>
|
||||
<img alt='List of labels' src='/images/ja/2.0/labelType-1.png'/>
|
||||
<img alt='Setting a label' src='/images/ja/2.0/labelType-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Display name'>
|
||||
<p>Specifies the name that is displayed when the search label drop-down select.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the identifier when a classified document. This value will be sent to Solr. Must be alphanumeric characters.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Log file download</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Log file download'>
|
||||
<p>Describes the log files will be output in the Fess download.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click log file menu.</p>
|
||||
<img alt='Session information' src='/images/ja/2.0/log-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Download'>
|
||||
<p>You can download the log file and click the log file name.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Duplicate host settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Duplicate host settings'>
|
||||
<p>Here are settings on the duplicate host. Available when the duplicate host to be treated as the same thing crawling at a different host name. For example, if you want the same site www.example.com and example.com in available.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu duplicate host.</p>
|
||||
<img alt='A list of the duplicate host' src='/images/ja/2.0/overlappingHost-1.png'/>
|
||||
<img alt='Duplicate host settings' src='/images/ja/2.0/overlappingHost-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Canonical name'>
|
||||
<p>Specify the canonical host name. Duplicate host names replace the canonical host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Duplicate names'>
|
||||
<p>Specify the host names are duplicated. Specifies the host name you want to replace.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,26 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Path mapping settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Path mapping settings'>
|
||||
<p>Here are settings for path mapping. You can use if you want replaced path mapping links appear in search results.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu path mappings.</p>
|
||||
<img alt='List of path mapping' src='/images/ja/2.0/pathMapping-1.png'/>
|
||||
<img alt='Path mapping settings' src='/images/ja/2.0/pathMapping-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Path mapping'>
|
||||
<p>Path mapping is replaced by parts to match the specified regular expression, replace the string with. When crawling a local filesystem environment may search result links are not valid. Such cases using path mapping, you can control the search results link. You can specify multiple path mappings.</p>
|
||||
</subsection>
|
||||
<subsection name='Regular expressions'>
|
||||
<p>Specifies the string you want to replace. How to write a<a href='http://java.sun.com/javase/ja/6/docs/ja/api/java/util/regex/Pattern.html'>Regular expressions in Java 6</a>To follow.</p>
|
||||
</subsection>
|
||||
<subsection name='Replacement character'>
|
||||
<p>Specifies the string to replace the matched regular expression.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,26 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting a request header</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting a request header'>
|
||||
<p>Here the request header. Feature request headers request header information added to requests when you get to crawl documents. Available if, for example, to see header information in the authentication system, if certain values are logged automatically.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click request header menu.</p>
|
||||
<img alt='A list of request headers' src='/images/ja/2.0/requestHeader-1.png'/>
|
||||
<img alt='Setting a request header' src='/images/ja/2.0/requestHeader-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='The name'>
|
||||
<p>Specifies the request header name to append to the request.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the request header value to append to the request.</p>
|
||||
</subsection>
|
||||
<subsection name='Web name'>
|
||||
<p>Select a Web crawl setting name to add request headers. Only selected the crawl settings in appended to the request header.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for a role</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for a role'>
|
||||
<p>Here are settings for the role. Role is selected in the crawl settings, you can classify the document appears in the search results. About how to use the<a href='../config/role-setting.html'>Settings for a role</a>Please see the.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu role.</p>
|
||||
<img alt='The list of roles' src='/images/ja/2.0/roleType-1.png'/>
|
||||
<img alt='Settings for a role' src='/images/ja/2.0/roleType-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Display name'>
|
||||
<p>Specifies the name that appears in the list.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the identifier when a classified document. This value will be sent to Solr. Must be alphanumeric characters.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,28 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>System settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='System settings'>
|
||||
<p>Describes the settings related to Solr, here registration in Fess. SOLR servers are grouped by file, has been registered.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Solr.</p>
|
||||
<img alt='System settings' src='/images/ja/2.0/system-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Process state'>
|
||||
<p>Update server appears as a running if additional documents, such as the. Crawl process displays the session ID when running. You can safely shut down and shut down when not running Fess server to shut down. If the process does not terminate if you shut a Fess is running to finish crawling process.</p>
|
||||
</subsection>
|
||||
<subsection name='Search for the update server'>
|
||||
<p>Server group name is used to search for and update appears.</p>
|
||||
</subsection>
|
||||
<subsection name='The status of the server'>
|
||||
<p>Server becomes unavailable and the status of disabled. For example, inaccessible to the Solr server and changes to disabled. To enable recovery after server become unavailable will become available.</p>
|
||||
</subsection>
|
||||
<subsection name='Action to the SOLR server.'>
|
||||
<p>You can publish index commit, optimize for server groups. You can also remove a specific search for the session ID.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,37 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Web authentication settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Web authentication settings'>
|
||||
<p>Describes Web authentication is required when set against here, using Web crawling. Fess is corresponding to a crawl for BASIC authentication and DIGEST authentication.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Web authentication.</p>
|
||||
<img alt='Configuring Web authentication' src='/images/ja/2.0/webAuthentication-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Host name'>
|
||||
<p>Specifies the host name of the site that requires authentication. Web crawl settings you specify if applicable in any host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Port'>
|
||||
<p>Specifies the port of the site that requires authentication. Specify-1 to apply for all ports. Web crawl settings you specified and if applicable on any port.</p>
|
||||
</subsection>
|
||||
<subsection name='Realm'>
|
||||
<p>Specifies the realm name of the site that requires authentication. Web crawl settings you specify if applicable in any realm name.</p>
|
||||
</subsection>
|
||||
<subsection name='Authentication methods'>
|
||||
<p>Select the authentication method. You can use BASIC authentication or DIGEST authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='User name'>
|
||||
<p>Specifies the user name to log in authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='Password'>
|
||||
<p>Specifies the password to log into the certification site.</p>
|
||||
</subsection>
|
||||
<subsection name='Web name'>
|
||||
<p>Select to apply the above authentication settings Web settings name. Must be registered in advance Web crawl settings.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,99 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for crawling the Web using</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for crawling the Web using'>
|
||||
<p>Describes the settings here, using Web crawling.</p>
|
||||
<p>Recommends that if you want to index document number 100000 over in Fess crawl settings for one to several tens of thousands of these. One crawl setting a target number 100000 from the indexed performance degrades.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Web.</p>
|
||||
<img alt='Web crawl settings' src='/images/ja/2.0/webCrawlingConfig-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Setting name'>
|
||||
<p>Is the name that appears on the list page.</p>
|
||||
</subsection>
|
||||
<subsection name='Specify a URL'>
|
||||
<p>You can specify multiple URLs. http: or https: in the specify starting. For example,</p>
|
||||
<source><![CDATA[
|
||||
http://localhost/
|
||||
http://localhost:8080/
|
||||
]]></source>
|
||||
<p>The so determines.</p>
|
||||
</subsection>
|
||||
<subsection name='URL filtering'>
|
||||
<p>By specifying regular expressions you can exclude the crawl and search for specific URL pattern.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>URL to crawl</th>
|
||||
<td>Crawl the URL for the specified regular expression.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Excluded from the crawl URL</th>
|
||||
<td>The URL for the specified regular expression does not crawl. The URL to crawl, even WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>To search for URL</th>
|
||||
<td>The URL for the specified regular expression search. Even if specified and the URL to the search excluded WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>To exclude from the search URL</th>
|
||||
<td>URL for the specified regular expression search. Unable to search all links since they exclude from being crawled and crawled when the search and not just some.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For example, http: URL to crawl if not crawl //localhost/ less than the</p>
|
||||
<source><![CDATA[
|
||||
http://localhost/.*
|
||||
]]></source>
|
||||
<p>Also be excluded if the extension of png want to exclude from the URL</p>
|
||||
<source><![CDATA[
|
||||
.*\.png$
|
||||
]]></source>
|
||||
<p>It specifies. It is possible to specify multiple in the line for.</p>
|
||||
</subsection>
|
||||
<subsection name='Depth'>
|
||||
<p>That will follow the links contained in the document in the crawl order can specify the tracing depth.</p>
|
||||
</subsection>
|
||||
<subsection name='Maximum access'>
|
||||
<p>You can specify the number of documents to retrieve crawl.</p>
|
||||
</subsection>
|
||||
<subsection name='User agent'>
|
||||
<p>You can specify the user agent to use when crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of threads'>
|
||||
<p>Specifies the number of threads you want to crawl. Value of 5 in 5 threads crawling the website at the same time.</p>
|
||||
</subsection>
|
||||
<subsection name='Interval'>
|
||||
<p>Is the interval (in milliseconds) to crawl documents. 5000 when one thread is 5 seconds at intervals Gets the document.</p>
|
||||
<p>Number of threads, 5 pieces, will be to go to and get the 5 documents per second between when 1000 millisecond interval,. Set the adequate value when crawling a website to the Web server, the load would not load.</p>
|
||||
</subsection>
|
||||
<subsection name='Boost value'>
|
||||
<p>You can search URL in this crawl setting to weight. Available in the search results on other than you want to. The standard is 1. Priority higher values, will be displayed at the top of the search results. If you want to see results other than absolutely in favor, including 10,000 sufficiently large value.</p>
|
||||
<p>Values that can be specified is an integer greater than 0. This value is used as the boost value when adding documents to Solr.</p>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>Register the browser type was selected as the crawled documents. Even if you select only the PC search on your mobile device not appear in results. If you want to see only specific mobile devices also available.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>You can control only when a particular user role can appear in search results. You must roll a set before you. For example, available by the user in the system requires a login, such as portal servers, search results out if you want.</p>
|
||||
</subsection>
|
||||
<subsection name='Label'>
|
||||
<p>You can label with search results. Search on each label, such as enable, in the search screen, specify the label.</p>
|
||||
</subsection>
|
||||
<subsection name='State'>
|
||||
<p>Crawl crawl time, is set to enable. If you want to avoid crawling temporarily available.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
<section name='Other'>
|
||||
<subsection name='Sitemap'>
|
||||
<p>Fess and crawls sitemap file, as defined in the URL to crawl. Sitemap<a href='http://www.sitemaps.org/'>http://www.sitemaps.org/</a> Of the specification. Available formats are XML Sitemaps and XML Sitemaps Index the text (URL line written in)</p>
|
||||
<p>Site map the specified URL. Sitemap is a XML files and XML files for text, when crawling that URL of ordinary or cannot distinguish between what a sitemap. Because the file name is sitemap.*.xml, sitemap.*.gz, sitemap.*txt in the default URL as a Sitemap handles (in webapps/fess/WEB-INF/classes/s2robot_rule.dicon can be customized).</p>
|
||||
<p>Crawls sitemap file to crawl the HTML file links will crawl the following URL in the next crawl.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Set up Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 2.0 Configuration Guide'>
|
||||
<p>Here is the Fess 2.0 Setup instructions.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,18 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Log settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Log settings'>
|
||||
<p>Fess output log (Solr log output to the logs/catalina.out) will be output to webapps/fess/WEB-INF/logs/fess.out. sets the contents to output Fess.out, Webpps/Fess/Web-INF/clsses/log4j.Xml. By default output INFO level.</p>
|
||||
<p>For example, better Fess up to document for Solr log if you want to output in log4j.xml disconnect the commented-out section below.</p>
|
||||
<source><![CDATA[
|
||||
<logger name="jp.sf.fess.solr" >
|
||||
<level value ="debug" />
|
||||
</logger>
|
||||
]]></source>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Use memory-related settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='The maximum value of heap memory changes'>
|
||||
<p>If the contents of the crawl settings cause OutOfMemory error similar to the following.</p>
|
||||
<source><![CDATA[
|
||||
java.lang.OutOfMemoryError: Java heap space
|
||||
]]></source>
|
||||
<p>Increase the maximum heap memory occur. bin/setenv. [sh | bat] to (in this case the maximum value set 1024M) will change to-Xmx1024m.</p>
|
||||
<source><![CDATA[
|
||||
Windowsの場合
|
||||
...-Dpdfbox.cjk.support=true -Xmx1024m
|
||||
|
||||
Unixの場合
|
||||
...-Dpdfbox.cjk.support=true -Xmx1024m"
|
||||
]]></source>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,17 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Mobile device information settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Mobile phone information update'>
|
||||
<p>The mobile device information<a class='externalLink' href='http://valueengine.jp/'>ValueEngine Inc.</a>That provided more available. If you want to use the latest mobile device information downloaded device profile save the removed _YYYY-MM-DD and webapps/fess/WEB-INF/classes/device. After the restart to enable change.</p>
|
||||
<source><![CDATA[
|
||||
ProfileData_YYYY-MM-DD.csv -> ProfileData.csv
|
||||
UserAgent_YYYY-MM-DD.csv -> UserAgent.csv
|
||||
DisplayInfo_YYYY-MM-DD.csv -> DisplayInfo.csv
|
||||
]]></source>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,17 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Stemming settings</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='About stemming'>
|
||||
<p>In Fess when indexing and searching the stemming process done.</p>
|
||||
<p>This is to normalize the English word processing, for example, words such as recharging and rechargable is normalized to form recharg. Hit and even if you search by recharging the word this word rechargable, less search leakage is expected.</p>
|
||||
</section>
|
||||
<section name='about protwords.txt'>
|
||||
<p>You may not intended for the stemming process basic rule-based processing, normalization is done. For example, Maine (state name) Word will be normalized in the main.</p>
|
||||
<p>In this case, by adding Maine to protwords.txt, you can exclude the stemming process.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,57 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Proxy settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
|
||||
<section name='For the crawler proxy settings'>
|
||||
<p>If you like crawling external sites from within the intranet firewall may end up blocked crawl. Set the proxy for the crawler in that case.</p>
|
||||
</section>
|
||||
<subsection name='How to set up'>
|
||||
<p>Proxy is set in to create webapps/Fess/Web-INF/classes/s9robot_client.dicon with the following contents.</p>
|
||||
<source><![CDATA[
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE components PUBLIC "-//SEASAR//DTD S2Container 2.4//EN"
|
||||
"http://www.seasar.org/dtd/components24.dtd">
|
||||
<components>
|
||||
<include path="s2robot_robotstxt.dicon"/>
|
||||
<include path="s2robot_contentlength.dicon"/>
|
||||
|
||||
<component name="httpClient" class="org.seasar.robot.client.http.CommonsHttpClient" instance="prototype">
|
||||
<property name="cookiePolicy">@org.apache.commons.httpclient.cookie.CookiePolicy@BROWSER_COMPATIBILITY</property>
|
||||
<property name="proxyHost">"プロキシホスト名"</property>
|
||||
<property name="proxyPort">プロキシポート</property>
|
||||
<!-- プロキシに認証がある場合
|
||||
<property name="proxyCredentials">
|
||||
<component class="org.apache.commons.httpclient.UsernamePasswordCredentials">
|
||||
<arg>"プロキシ用ユーザー名"</arg>
|
||||
<arg>"プロキシ用パスワード"</arg>
|
||||
</component>
|
||||
</property>
|
||||
-->
|
||||
</component>
|
||||
|
||||
<component name="fsClient" class="org.seasar.robot.client.fs.FileSystemClient" instance="prototype">
|
||||
<property name="charset">"UTF-8"</property>
|
||||
</component>
|
||||
|
||||
<component name="clientFactory" class="org.seasar.robot.client.S2RobotClientFactory" instance="prototype">
|
||||
<initMethod name="addClient">
|
||||
<arg>{"http:.*", "https:.*"}</arg>
|
||||
<arg>httpClient</arg>
|
||||
</initMethod>
|
||||
<initMethod name="addClient">
|
||||
<arg>"file:.*"</arg>
|
||||
<arg>fsClient</arg>
|
||||
</initMethod>
|
||||
</component>
|
||||
|
||||
</components>
|
||||
|
||||
]]></source>
|
||||
</subsection>
|
||||
|
||||
</body>
|
||||
</document>
|
|
@ -1,25 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting up replication</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting up replication'>
|
||||
<p>Fess can copy the path in Solr index data. You can distribute load during indexing to build two in Fess of the crawl and index creation and search for Fess servers.</p>
|
||||
<p>You must use the replication features of Fess for Solr index file in the shared disk, such as NFS, Fess of each can be referenced from.</p>
|
||||
</section>
|
||||
<section name='How to build a'>
|
||||
<subsection name='Building indexes for Fess'>
|
||||
<p>Fess, download and install the.<code>/ /NET/Server1/usr/local/Fess</code> To assume you installed.</p>
|
||||
<p>To register the crawl settings as well as Fess starts after the normal construction, create the index (index for Fess building instructions normal building procedures and especially remains the same) crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Search for Fess building'>
|
||||
<p>Fess, download and install the.<code>/ /NET/Server2/usr/local/Fess</code> To assume you installed.</p>
|
||||
<p>To enable replication features check box in Fess starts after the management screen crawl settings the "snapshot path'. Snapshot path designates the index location for the index for Fess. In this case, the<code>/NET/Server1/usr/local/Fess //solr/core1/data/index</code> In the will.</p>
|
||||
<img alt='Replication' src='/images/ja/2.0/crawl-2.png'/>
|
||||
<p>Time press the update button to save the data and set in Schedule performs replication of the index.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,90 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting role-based search</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='With role-based search'>
|
||||
<p>You can divide out search results in Fess in any authentication system authenticated users credentials to. For example, find rolls a does appears role information in search results with the roles a user a user b will not display it. By using this feature, user login in the portal and single sign-on environment belongs to you can enable search, sector or job title.</p>
|
||||
<p>In role-based search of the Fess roll information available below.</p>
|
||||
<ul>
|
||||
<li>Request parameter</li>
|
||||
<li>Request header</li>
|
||||
<li>Cookies</li>
|
||||
<li>J2EE authentication information</li>
|
||||
</ul>
|
||||
<p>To save authentication information in cookies for authentication when running of Fess in portal and agent-based single sign-on system domain and path that can retrieve role information. You can also reverse proxy type single sign-on system access to Fess adding authentication information in the request headers and request parameters to retrieve role information.</p>
|
||||
</section>
|
||||
<section name='Setting role-based search'>
|
||||
<p>Describes how to set up role-based search using J2EE authentication information.</p>
|
||||
<subsection name='Tomcat-users.xml settings'>
|
||||
<p>conf/Tomcat-users.XML the add roles and users. This time the role1 role perform role-based search. Login to role1.</p>
|
||||
<source><![CDATA[
|
||||
<?xml version='1.0' encoding='utf-8'?>
|
||||
<tomcat-users>
|
||||
<role rolename="fess"/>
|
||||
<role rolename="solr"/>
|
||||
<role rolename="role1"/>
|
||||
<user username="admin" password="admin" roles="fess"/>
|
||||
<user username="solradmin" password="solradmin" roles="solr"/>
|
||||
<user username="role1" password="role1" roles="role1"/>
|
||||
</tomcat-users>
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='app.dicon settings'>
|
||||
<p>sets the webapps/fess/WEB-INF/classes/app.dicon shown below.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<component name="systemHelper" class="jp.sf.fess.helper.SystemHelper">
|
||||
<property name="authenticatedRoles">"role1"</property>
|
||||
</component>
|
||||
:
|
||||
<component name="roleQueryHelper" class="jp.sf.fess.helper.impl.RoleQueryHelperImpl">
|
||||
<property name="defaultRoleList">
|
||||
{"guest"}
|
||||
</property>
|
||||
</component>
|
||||
:
|
||||
]]></source>
|
||||
<p>authenticatedRoles can describe multiple by commas (,). You can set the role information by setting the defaultRoleList, there is no authentication information. Do not display the search results not logged in to set user roles are required.</p>
|
||||
</subsection>
|
||||
<subsection name='Web.xml settings'>
|
||||
<p>sets the webapps/fess/WEB-INF/web.xml shown below.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<security-constraint>
|
||||
<web-resource-collection>
|
||||
<web-resource-name>Fess Authentication</web-resource-name>
|
||||
<url-pattern>/login/login</url-pattern>
|
||||
</web-resource-collection>
|
||||
<auth-constraint>
|
||||
<role-name>fess</role-name>
|
||||
<role-name>role1</role-name>
|
||||
</auth-constraint>
|
||||
</security-constraint>
|
||||
:
|
||||
<security-role>
|
||||
<role-name>fess</role-name>
|
||||
</security-role>
|
||||
|
||||
<security-role>
|
||||
<role-name>role1</role-name>
|
||||
</security-role>
|
||||
:
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Settings in the Administration screen of the Fess'>
|
||||
<p>Fess up and log in as an administrator. From the role of the menu set name Role1 (any name) and value register role at role1. After the crawl settings want to use in the user with the role1 in, crawl Crawl Settings select Role1.</p>
|
||||
</subsection>
|
||||
<subsection name='Log roll'>
|
||||
<p>Log out from the management screen. log in as user Role1. A successful login and redirect to the top of the search screen.</p>
|
||||
<p>Only thing was the Role1 role setting in the crawl settings search as usual, and displayed.</p>
|
||||
<p>Also, search not logged in will be search by guest user.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll out'>
|
||||
<p>Whether or not logged out, logged in a non-Admin role to access http://localhost:8080/fess/admin screen appears. By pressing the logout button will log out.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,30 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Ports changes</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Changing the port'>
|
||||
<p>Fess by default, you use the port 8080. Change in the following steps to change.</p>
|
||||
<subsection name='Tomcat port changes'>
|
||||
<p>Change the port Tomcat is Fess available. Modifies the following described conf/server.xml changes.</p>
|
||||
<ul>
|
||||
<li>8080: HTTP access port</li>
|
||||
<li>8005: shut down port</li>
|
||||
<li>8009: AJP port</li>
|
||||
<li>: SSL HTTP access port 8443 (the default is off)</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
<subsection name='SOLR configuration'>
|
||||
<p>May need to change if you change the Tomcat port using the settings in the standard configuration, the same Solr-Tomcat, so Fess Solr server referenced information. change the webapps/fess/WEB-INF/classes/fess_solr.dicon.</p>
|
||||
<source><![CDATA[
|
||||
<arg>"http://localhost:8080/solr"</arg>
|
||||
]]></source>
|
||||
<p>
|
||||
<b>Note: to display the error on search and index update: cannot access the Solr server and do not change if you change the Tomcat port similar to the above ports.</b>
|
||||
</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,37 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>SOLR failure operation</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='SOLR failure operation'>
|
||||
<p>Solr server group in the Fess, managing multiple groups. Change the status of servers and groups if the server and group information that keeps a Fess, inaccessible to the Solr server.</p>
|
||||
<p>SOLR server state information can change in system setting. maxErrorCount, maxRetryStatusCheckCount, maxRetryUpdateQueryCount and minActiveServer can be defined in the webapps/fess/WEB-INF/classes/fess_solr.dicon.</p>
|
||||
<subsection name='Solr group failure behavior'>
|
||||
<ul>
|
||||
<li>When SOLR group within Solr server number of valid state minActiveServer less than Solr group will be disabled.</li>
|
||||
<li>Solr server number of valid state is minActiveServer following group in the SOLR Solr group into an invalid state if is not, you can access to the Solr server, disable Solr server status maxRetryStatusCheckCount check to Solr server status change from the disabled state the valid state. The valid state not changed and was able to access Solr Server index corrupted state.</li>
|
||||
<li>Disable Solr group is not available.</li>
|
||||
<li>SOLR group to enable States to the group in the Solr Solr server status change enabled in system settings management screen.</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
<subsection name='Behavior of search failures'>
|
||||
<ul>
|
||||
<li>Search queries can send valid Solr group.</li>
|
||||
<li>Search queries will be sent only to valid Solr server.</li>
|
||||
<li>Send a search query to fewer available if you register a Solr server multiple SOLR group in the Solr server.</li>
|
||||
<li>The search query was sent to the SOLR server fails maxErrorCount than Solr server modifies the disabled state.</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
<subsection name='Update: disabled behavior'>
|
||||
<ul>
|
||||
<li>Update queries you can send valid state Solr group.</li>
|
||||
<li>Update query will be sent only to valid Solr server.</li>
|
||||
<li>If multiple Solr servers are registered in the SOLR group in any valid state Solr server send the update query.</li>
|
||||
<li>Is sent to the SOLR Server update query fails maxRetryUpdateQueryCount than Solr server modifies the index corrupted state.</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,36 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for the index string extraction</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
|
||||
<section name='About the index string extraction'>
|
||||
<p>You must isolate the document in order to register as the index when creating indexes for the search.</p>
|
||||
<p>Tokenizer is used for this.</p>
|
||||
<p>Basically, carved by the tokenizer units smaller than go find no hits.</p>
|
||||
<p>For example, statements of living in Tokyo, Japan. Was split by the tokenizer now, this statement is in Tokyo, living and so on. In this case, in Tokyo, Word search, you will get hit. However, when performing a search with the word 'Kyoto' will not be hit.</p>
|
||||
<p>For selection of the tokenizer is important.</p>
|
||||
<p>You can change the tokenizer by setting the schema.xml analyzer part is if the Fess in the default CJKTokenizer used.</p>
|
||||
</section>
|
||||
|
||||
<subsection name='About CJKTokenizer'>
|
||||
<p>Such as CJKTokenizer Japan Japanese multibyte string against bi-gram, in other words two characters create index. In this case, can't find one letter words.</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name='About StandardTokenizer'>
|
||||
<p>StandardTokenizer creates index uni-gram, in other words one by one for the Japan language of multibyte-character strings. Therefore, the less search leakage. Also, with StandardTokenizer can't CJKTokenizer the search query letter to search to.</p>
|
||||
<p>The following example to change schema.xml so analyzer parts, you can use the StandardTokenizer.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<types>
|
||||
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
|
||||
<analyzer>
|
||||
<tokenizer class="solr.StandardTokenizerFactory"/>
|
||||
:
|
||||
]]></source>
|
||||
</subsection>
|
||||
|
||||
</body>
|
||||
</document>
|
|
@ -1,45 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Register for the Windows service</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Registered as a Windows service'>
|
||||
<p>You can register the Fess as a Windows service in a Windows environment. How to register a service is similar to the Tomcat.</p>
|
||||
<subsection name='How to register'>
|
||||
<p>First, after installing the Fess from the command prompt service.bat performs (such as Vista to launch as administrator you must). Fess was installed on C:\Java\fess-server-2.0.0.</p>
|
||||
<source><![CDATA[
|
||||
> cd C:\Java\fess-server-2.0.0\bin
|
||||
> service.bat install fess
|
||||
...
|
||||
The service 'fess' has been installed.
|
||||
]]></source>
|
||||
<p>Then add properties for Fess. To run the following, Tomcat Properties window appears.</p>
|
||||
<source><![CDATA[
|
||||
> tomcat6w.exe //ES//fess
|
||||
]]></source>
|
||||
<p>Set the following in the Java Options in the Java tab.</p>
|
||||
<source><![CDATA[
|
||||
-Dcatalina.base=C:\Java\fess-server-2.0.0
|
||||
-Dcatalina.home=C:\Java\fess-server-2.0.0
|
||||
-Djava.endorsed.dirs=C:\Java\fess-server-2.0.0\endorsed
|
||||
-Djava.io.tmpdir=C:\Java\fess-server-2.0.0\temp
|
||||
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
|
||||
-Djava.util.logging.config.file=C:\Java\fess-server-2.0.0\conf\logging.properties
|
||||
-Dsolr.solr.home=C:\Java\fess-server-2.0.0\solr
|
||||
-Dsolr.data.dir=C:\Java\fess-server-2.0.0\solr\data
|
||||
-Dfess.log.file=C:\Java\fess-server-2.0.0\webapps\fess\WEB-INF\logs\fess.out
|
||||
-Djava.awt.headless=true
|
||||
-XX:+UseGCOverheadLimit
|
||||
-XX:+UseConcMarkSweepGC
|
||||
-XX:+CMSIncrementalMode
|
||||
-XX:+UseTLAB
|
||||
-Dpdfbox.cjk.support=true
|
||||
-XX:MaxPermSize=128m
|
||||
]]></source>
|
||||
<p>Modifies the value of the maximum memory pool to 512. Settings to save the settings and then press OK button. Please start later as normal Windows services and.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 2.0 Search Guide'>
|
||||
<p>Here is the instructions on how to search for Fess 2.0.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,57 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search by specifying a search field</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Field searches'>
|
||||
<p>In the Fess crawl results saved in the title and text fields. You can search for a field of them.</p>
|
||||
<p>You can search for a the following fields by default.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>URL</th>
|
||||
<td>The crawl URL</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>host</th>
|
||||
<td>Were included in the crawl URL host name</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>site</th>
|
||||
<td>Site name was included in the crawl URL</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>title</th>
|
||||
<td>Title</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>content</th>
|
||||
<td>Text</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>contentLength</th>
|
||||
<td>You crawl the content size</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>lastModified</th>
|
||||
<td>Last update of the content you want to crawl</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>mimetype</th>
|
||||
<td>The MIME type of the content</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>If you do not specify the fields title and content to search for.</p>
|
||||
<subsection name='How to search'>
|
||||
<p>If a field search "field name: search terms ' of so fill out the search form, the search.</p>
|
||||
<p>Title against Fess the search as a search term.</p>
|
||||
<source><![CDATA[
|
||||
title:Fess
|
||||
]]></source>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,14 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search by label</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Search by label'>
|
||||
<p>By label to be registered in the management screen will enable search by labels in the search screen. You can use the label if you want to sort the search results. If you do not register the label displayed the label drop-down box.</p>
|
||||
<img alt='Search by label' src='/images/ja/2.0/search-label-1.png'/>
|
||||
<p>To set the label by creating indexes, can search each crawl settings specified on the label. All results search search do not specify a label is usually the same.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,44 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search sort</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Sort search'>
|
||||
<p>To sort the search results by specifying the fields such as search time.</p>
|
||||
<p>You can sort the following fields by default.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>Tstamp</th>
|
||||
<td>On the crawl</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>contentLength</th>
|
||||
<td>You crawl the content size</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>lastModified</th>
|
||||
<td>Last update of the content you want to crawl</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<subsection name='How to sort'>
|
||||
<p>If you want to sort ' sort: field name ' in to fill out the search form, the search.</p>
|
||||
<p>In ascending order sort the content size as a search term, Fess is below.</p>
|
||||
<source><![CDATA[
|
||||
Fess sort:contentLength
|
||||
]]></source>
|
||||
<p>To sort in descending order as below.</p>
|
||||
<source><![CDATA[
|
||||
Fess sort:contentLength.desc
|
||||
]]></source>
|
||||
<p>If you sort by multiple fields separated list, shown below.</p>
|
||||
<source><![CDATA[
|
||||
Fess sort:contentLength.desc,lastModified
|
||||
]]></source>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting the browser type</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting the browser type'>
|
||||
<p>Describes the settings related to the browser type. Search results are browser type can be added to the data, for each type of browser browsing search results out into.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu browser types.</p>
|
||||
<img alt='Setting the browser type' src='/images/ja/3.0/browserType-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>You can set the display name and value. It is used if you want more new terminals. You do not need special customizations are used only where necessary.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,34 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings Wizard</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings Wizard'>
|
||||
<p>Introduction to the Configuration Wizard.</p>
|
||||
<p>You can use Settings Wizard, to set you up on the fess.</p>
|
||||
<subsection name='How to use Setup Wizard'>
|
||||
<p>In Administrator account after logging in, click menu Settings Wizard.</p>
|
||||
<img alt='Settings Wizard' src='/images/ja/3.0/config-wizard-1.png'/>
|
||||
<p>First, setting a schedule.</p>
|
||||
<p>During the time in fess is crawling and indexes.</p>
|
||||
<p>By default, every day is a 0 時 0 分.</p>
|
||||
<img alt='Setting a schedule' src='/images/ja/3.0/config-wizard-2.png'/>
|
||||
<p>The crawl settings.</p>
|
||||
<p>Crawl settings is to register a URI to look for.</p>
|
||||
<p>The crawl settings name please put name of any easy to identify.</p>
|
||||
<p>Put the URI part de-indexed, want to search for.</p>
|
||||
<img alt='Crawl settings' src='/images/ja/3.0/config-wizard-3.png'/>
|
||||
<p>For example, if you want search for http://example.com, below looks like.</p>
|
||||
<img alt='Crawl settings example' src='/images/ja/3.0/config-wizard-4.png'/>
|
||||
<p>In this is the last setting.</p>
|
||||
<p>Crawl start button press the start crawling. Not start until in the time specified in the scheduling settings by pressing the Finish button if the crawl.</p>
|
||||
<img alt='Crawl started' src='/images/ja/3.0/config-wizard-5.png'/>
|
||||
</subsection>
|
||||
<subsection name='Changes to settings'>
|
||||
<p>Settings in the Setup Wizard you can change from crawl General, Web, file system.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,100 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>The General crawl settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='The General crawl settings'>
|
||||
<p>Describes the settings related to crawling.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account click crawl General menu after login.</p>
|
||||
<img alt='Crawl General' src='/images/ja/3.0/crawl-1.png'/>
|
||||
<p>You can specify the path to a generated index and replication capabilities to enable.</p>
|
||||
<img alt='Replication features' src='/images/ja/3.0/crawl-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='Scheduled full crawl frequency'>
|
||||
<p>You can set the interval at which the crawl for a Web site or file system. By default, the following.</p>
|
||||
<source><![CDATA[
|
||||
0 0 0 * * ?
|
||||
]]></source>
|
||||
<p>Figures are from left, seconds, minutes, during the day, month, represents a day of the week. Description format is similar to the Unix cron settings. This example, and am 0 時 0 分 to crawling daily.</p>
|
||||
<p>Following are examples of how to write.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0 12 * *?</td>
|
||||
<td align='left'>Each day starts at 12 pm</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 15 10? * *</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10 * *?</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 15 10 * *? *</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10 * *? 3.05</td>
|
||||
<td align='left'>3.09 years every day 10:15 am start</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 * 14 * *?</td>
|
||||
<td align='left'>Daily 2:59 pm-3.00 pm start per 1 minute</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0 / 5 14 * *?</td>
|
||||
<td align='left'>Every day 2:59 pm-3.00 pm start every 5 minutes</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 0 / 5 14, 18 * *?</td>
|
||||
<td align='left'>Daily 3.00 pm-2: 59 pm and 6: 00 starts every 5 minutes at the PM-6: 59 pm</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0-5 14 * *?</td>
|
||||
<td align='left'>Every day at 3.00pm-3.05pm start per 1 minute</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 10, 44 14? 3 WED</td>
|
||||
<td align='left'>Starts Wednesday March 2: 10 and 2: 44 pm</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10? * MON-FRI</td>
|
||||
<td align='left'>Monday through Friday at 10:15 am start</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>Also check if the seconds can be set to run at intervals 60 seconds by default. If you set seconds exactly and you should customize webapps/fess/WEB-INF/classes/chronosCustomize.dicon taskScanIntervalTime value, if enough do I see in one-hour increments.</p>
|
||||
</subsection>
|
||||
<subsection name='Mobile translation'>
|
||||
<p>If theses PC website search results on mobile devices may not display correctly. And select the mobile conversion, such as if the PC site for mobile terminals, and to show that you can. You can if you choose Google Google Wireless Transcoder allows to display content on mobile phones. For example, if site for PC and mobile devices browsing the results in the search for mobile terminals search results will link in the search result link passes the Google Wireless Transcoder. You can use smooth mobile transformation in mobile search.</p>
|
||||
</subsection>
|
||||
<subsection name='Replication features'>
|
||||
<p>To enable replication features that can apply already copied the Solr index generated. For example, you can use them if you want to search only in the search servers crawled and indexed on a different server, placed in front.</p>
|
||||
</subsection>
|
||||
<subsection name='Index commit, optimize'>
|
||||
<p>After the data is registered for Solr. Index to commit or to optimize the registered data becomes available. If optimize is issued the Solr index optimization, if you have chosen, you choose to commit the commit is issued.</p>
|
||||
</subsection>
|
||||
<subsection name='Server switchovers'>
|
||||
<p>Fess can combine multiple Solr server as a group, the group can manage multiple. Solr server group for updates and search for different groups to use. For example, if you had two groups using the Group 2 for update, search for use of Group 1. After the crawl has been completed if switching server updates for Group 1, switches to group 2 for the search. It is only valid if you have registered multiple Solr server group.</p>
|
||||
</subsection>
|
||||
<subsection name='Committed to the document number of each'>
|
||||
<p>To raise the performance of the index in Fess while crawling and sends for Solr document in 20 units. For each value specified here because without committing to continue adding documents documents added in the Solr on performance, Solr issued document commits. By default, after you add documents 1000 is committed.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of concurrent crawls settings'>
|
||||
<p>Fess document crawling is done on Web crawling, and file system CROLL. You can crawl to a set number of values in each crawl specified here only to run simultaneously multiple. For example, crawl setting number of concurrent as 3 Web crawling set 1-set 10 if the crawling runs until the set 3 3 set 1-. Complete crawl of any of them, and will start the crawl settings 4. Similarly, setting 10 to complete one each in we will start one.</p>
|
||||
<p>But you can specify the number of threads in the crawl settings simultaneously run crawl setting number is not indicates the number of threads to start. For example, if 3 in the number of concurrent crawls settings, number of threads for each crawl settings and 5 3 x 5 = 15 thread count up and crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Expiration date of the index'>
|
||||
<p>You can automatically delete data after the data has been indexed. If you select the 5, with the expiration of index register at least 5 days before and had no update is removed. If you omit data content has been removed, can be used.</p>
|
||||
</subsection>
|
||||
<subsection name='Snapshot path'>
|
||||
<p>Copy index information from the index directory as the snapshot path, if replication is enabled, will be applied.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,34 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Set session information</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Set session information'>
|
||||
<p>Describes the settings related to the session information. One time the crawl results saved as a single session information. You can check the run time and the number of indexed.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the session information menu.</p>
|
||||
</subsection>
|
||||
<subsection name='Session information list'>
|
||||
<img alt='Session information list' src='/images/ja/3.0/crawlingSession-1.png'/>
|
||||
<p>You can remove all session information and click the Delete link all in the running.</p>
|
||||
</subsection>
|
||||
<subsection name='Session details'>
|
||||
<img alt='Session details' src='/images/ja/3.0/crawlingSession-2.png'/>
|
||||
<p>To specify a session ID, you can see crawling content.</p>
|
||||
<ul>
|
||||
<li>Information about the entire crawl Cralwer *:</li>
|
||||
<li>FsCrawl *: information about the file system crawling</li>
|
||||
<li>WebCrawl *: crawling the Web information</li>
|
||||
<li>Information issued by Solr server optimization optimize *:</li>
|
||||
<li>Commit *: information about the commit was issued to the Solr server.</li>
|
||||
<li>* StartTime: start time</li>
|
||||
<li>* EndTime: end time</li>
|
||||
<li>* ExecTime: execution time (MS)</li>
|
||||
<li>* IndexSize: number of documents indexed</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,33 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Configuration backup and restore</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Configuration backup and restore'>
|
||||
<p>Here, describes Fess information backup and restore methods.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu backup and restore.</p>
|
||||
<img alt='Backup and restore' src='/images/ja/3.0/data-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Backup settings'>
|
||||
<p>Click the download link and Fess information output in XML format. Saved settings information is below.</p>
|
||||
<ul>
|
||||
<li>The General crawl settings</li>
|
||||
<li>Web crawl settings</li>
|
||||
<li>File system Crawl settings</li>
|
||||
<li>Path mapping</li>
|
||||
<li>Web authentication</li>
|
||||
<li>Compatible browsers</li>
|
||||
<li>Session information</li>
|
||||
</ul>
|
||||
<p>In the SOLR index data and data being crawled is not backed up. Those data can Fess setting information to crawl after the restore, regenerate.</p>
|
||||
</subsection>
|
||||
<subsection name='Restore settings'>
|
||||
<p>You can restore the configuration information by uploading the XML outputted by the backup. Specify the XML file, click the restore button on the data.</p>
|
||||
<p>If there is already to enable overwriting of data, the same data does update existing data.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,129 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Data store configuration</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Data store configuration'>
|
||||
<p>You can crawl databases in Fess. Here are required to store settings.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu data store.</p>
|
||||
<img alt='Data store configuration' src='/images/ja/3.0/dataStoreCrawling-1.png'/>
|
||||
<p>As an example, the following table database named testdb MySQL, user name hoge, fuga password connection and the will to make it.</p>
|
||||
<source><![CDATA[
|
||||
CREATE TABLE job (
|
||||
id BIGINT NOT NULL AUTO_INCREMENT
|
||||
, title VARCHAR(100) NOT NULL
|
||||
, content VARCHAR(255) NOT NULL
|
||||
, versionNo INTEGER NOT NULL
|
||||
, PRIMARY KEY (id)
|
||||
);
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Parameter'>
|
||||
<p>Parameter settings example looks like the following.</p>
|
||||
<source><![CDATA[
|
||||
driver=com.mysql.jdbc.Driver
|
||||
url=jdbc:mysql://localhost:3306/testdb?useUnicode=true&characterEncoding=UTF-8
|
||||
username=hoge
|
||||
password=fuga
|
||||
sql=select * from job
|
||||
]]></source>
|
||||
<p>Parameter is a "key = value" format. Description of the key is as follows.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>driver</td>
|
||||
<td align='left'>Driver class name</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>URL</td>
|
||||
<td align='left'>URL</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>username</td>
|
||||
<td align='left'>To connect to the DB user name</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>password</td>
|
||||
<td align='left'>To connect to the DB password</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>SQL</td>
|
||||
<td align='left'>Want to crawl to get SQL statement</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</subsection>
|
||||
<subsection name='Script'>
|
||||
<p>Script configuration example looks like the following.</p>
|
||||
<source><![CDATA[
|
||||
url="http://localhost/" + id
|
||||
host="localhost"
|
||||
site="localhost"
|
||||
title=title
|
||||
content=content
|
||||
cache=content
|
||||
digest=content
|
||||
anchor=
|
||||
contentLength=content.length()
|
||||
lastModified=content.length()
|
||||
]]></source>
|
||||
<p>
|
||||
Parameter is a "key = value" format.
|
||||
Description of the key is as follows.</p>
|
||||
<p>
|
||||
Side of the value written in OGNL. String, tie up in double quotation marks.
|
||||
Access in the database column name, its value.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>URL</td>
|
||||
<td align='left'>URLs (links appear in search results)</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>host</td>
|
||||
<td align='left'>Host name</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>site</td>
|
||||
<td align='left'>Site pass</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>title</td>
|
||||
<td align='left'>Title</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>content</td>
|
||||
<td align='left'>Content (string index)</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>cache</td>
|
||||
<td align='left'>Content cache (not indexed)</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Digest</td>
|
||||
<td align='left'>Digest piece that appears in the search results</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>anchor</td>
|
||||
<td align='left'>Links to content (not usually required)</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>contentLength</td>
|
||||
<td align='left'>The length of the content</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>lastModified</td>
|
||||
<td align='left'>Content last updated</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</subsection>
|
||||
<subsection name='Driver'>
|
||||
<p>To connect to the database driver is needed. keep the jar file in webapps/fess/WEB-INF/cmd/lib.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,101 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Appearance settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Appearance settings'>
|
||||
<p>Here are settings for the design of search screens.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu design.</p>
|
||||
<img alt='Design' src='/images/ja/3.0/design-1.png'/>
|
||||
<p>You can edit the search screen in the screen below.</p>
|
||||
<img alt='JSP compilation screen' src='/images/ja/3.0/design-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='Image file'>
|
||||
<p>You can upload the image files to use in the search screen. Image file names are supported are jpg, gif and png.</p>
|
||||
</subsection>
|
||||
<subsection name='Image file name'>
|
||||
<p>If you want the file name to upload image files to use. Uploaded if you omit the file name will be used.</p>
|
||||
</subsection>
|
||||
<subsection name='Design JSP files'>
|
||||
<p>You can edit the JSP files in the search screen. You can by pressing the Edit button of the JSP file, edit the current JSP files. And pressing the button will default to edit as a JSP file when you install. To keep with the update button in the Edit screen, changes are reflected.</p>
|
||||
<p>Following are examples of how to write.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>Top page (frame)</td>
|
||||
<td align='left'>Is a JSP file search home page. This JSP include JSP file of each part.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Top page (within the Head tags)</td>
|
||||
<td align='left'>This is the express search home page head tag in JSP files. If you want to edit the meta tags, title tags, script tags, such as the change.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Top page (content)</td>
|
||||
<td align='left'>Is a JSP file to represent the body tag in the search home page.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results pages (frames)</td>
|
||||
<td align='left'>Search result is a list page of JSP files. This JSP include JSP file of each part.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (within the Head tags)</td>
|
||||
<td align='left'>Search result is a JSP file to represent within the head tag of the list page. If you want to edit the meta tags, title tags, script tags, such as the change.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results page (header)</td>
|
||||
<td align='left'>Search result is a JSP file to represent the header of the list page. Include search form at the top.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (footer)</td>
|
||||
<td align='left'>Search result is a JSP file that represents the footer part of the page. Contains the copyright page at the bottom.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results pages (content)</td>
|
||||
<td align='left'>Search results search results list page is a JSP file to represent the part. Is the search results when the JSP file. If you want to customize the search result representation change.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (result no)</td>
|
||||
<td align='left'>Search results search results list page is a JSP file to represent the part. Is a JSP file when the search result is not used.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>You can to edit for PCs and similar portable screen.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
<section name='How to set up'>
|
||||
<subsection name='To view the registration date and modified date'>
|
||||
<p>If you want to display in the search results crawl in Fess and registered or modified files to get the search results page (content), write the following.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
:
|
||||
<ol>
|
||||
<c:forEach var="doc" varStatus="s" items="${documentItems}">
|
||||
<%
|
||||
java.util.Map docMap = (java.util.Map)pageContext.getAttribute("doc");
|
||||
Long tstampValue = (Long)docMap.get("tstamp");
|
||||
java.util.Date tstampDate = new java.util.Date(tstampValue);
|
||||
Long lastModifiedValue = (Long)docMap.get("lastModified");
|
||||
java.util.Date lastModifiedDate = new java.util.Date(lastModifiedValue);
|
||||
java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat("yyyy/MM/dd HH:mm");
|
||||
%>
|
||||
<li>
|
||||
<h3 class="title">
|
||||
<a href="${doc.urlLink}">${f:h(doc.contentTitle)}</a>
|
||||
</h3>
|
||||
<div class="body">
|
||||
${doc.contentDescription}
|
||||
<br/>
|
||||
<cite>${f:h(doc.site)}</cite>
|
||||
<br>Registered: <%= sdf.format(tstampDate) %>
|
||||
<br>Last Modified: <%= sdf.format(lastModifiedDate) %>
|
||||
:
|
||||
:
|
||||
]]></source>
|
||||
<p>tstampDate will update on registration date, lastModifiedDate. Output date format is specified in SimpeDateFormat.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,96 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for crawling a file system using</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for crawling a file system using'>
|
||||
<p>Describes the settings for crawl here, using file system.</p>
|
||||
<p>Recommends that if you want to index document number 100000 over in Fess crawl settings for one to several tens of thousands of these. One crawl setting a target number 100000 from the indexed performance degrades.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu file.</p>
|
||||
<img alt='Setting file system Crawl' src='/images/ja/3.0/fileCrawlingConfig-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Setting name'>
|
||||
<p>Is the name that appears on the list page.</p>
|
||||
</subsection>
|
||||
<subsection name='Specifying a path'>
|
||||
<p>You can specify multiple paths. file: in the specify starting. For example,</p>
|
||||
<source><![CDATA[
|
||||
file:/home/taro/
|
||||
file:/home/documents/
|
||||
]]></source>
|
||||
<p>The so determines. Patrolling below the specified directory.</p>
|
||||
<p>So there is need to write URI if the Windows environment path that c:\Documents\taro in file/c: /Documents/taro and specify.</p>
|
||||
</subsection>
|
||||
<subsection name='Path filtering'>
|
||||
<p>By specifying regular expressions you can exclude the crawl and search for given path pattern.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>Path to crawl</th>
|
||||
<td>Crawl the path for the specified regular expression.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>The path to exclude from being crawled</th>
|
||||
<td>The path for the specified regular expression does not crawl. The path you want to crawl, even WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Path to be searched</th>
|
||||
<td>The path for the specified regular expression search. Even if specified path to find excluded and WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Path to exclude from searches</th>
|
||||
<td>Not search the path for the specified regular expression. Unable to search all links since they exclude from being crawled and crawled when the search and not just some.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For example, the path to target if you don't crawl less than/home /</p>
|
||||
<source><![CDATA[
|
||||
file:/home/.*
|
||||
]]></source>
|
||||
<p>Also the path to exclude if extension of png want to exclude from</p>
|
||||
<source><![CDATA[
|
||||
.*\.png$
|
||||
]]></source>
|
||||
<p>It specifies. It is possible to specify multiple line breaks in.</p>
|
||||
<p>How to specify the URI handling java.io.File: Looks like:</p>
|
||||
<source><![CDATA[
|
||||
/home/taro -> file:/home/taro
|
||||
c:\memo.txt -> file:/c:/memo.txt
|
||||
\\server\memo.txt -> file:////server/memo.txt
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Depth'>
|
||||
<p>Specify the depth of a directory hierarchy.</p>
|
||||
</subsection>
|
||||
<subsection name='Maximum access'>
|
||||
<p>You can specify the number of documents to retrieve crawl.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of threads'>
|
||||
<p>Specifies the number of threads you want to crawl. Value of 5 in 5 threads crawling the website at the same time.</p>
|
||||
</subsection>
|
||||
<subsection name='Interval'>
|
||||
<p>Is the time interval to crawl documents. 5000 when one thread is 5 seconds at intervals Gets the document.</p>
|
||||
<p>Number of threads, 5 pieces, will be to go to and get the 5 documents per second between when 1000 millisecond interval,.</p>
|
||||
</subsection>
|
||||
<subsection name='Boost value'>
|
||||
<p>You can search URL in this crawl setting to weight. Available in the search results on other than you want to. The standard is 1. Priority higher values, will be displayed at the top of the search results. If you want to see results other than absolutely in favor, including 10,000 sufficiently large value.</p>
|
||||
<p>Values that can be specified is an integer greater than 0. This value is used as the boost value when adding documents to Solr.</p>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>Register the browser type was selected as the crawled documents. Even if you select only the PC search on your mobile device not appear in results. If you want to see only specific mobile devices also available.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>You can control only when a particular user role can appear in search results. You must roll a set before you. > For example, available by the user in the system requires a login, such as portal servers, search results out if you want.</p>
|
||||
</subsection>
|
||||
<subsection name='Label'>
|
||||
<p>You can label with search results. Search on each label, such as enable, in the search screen, specify the label.</p>
|
||||
</subsection>
|
||||
<subsection name='State'>
|
||||
<p>Crawl crawl time, is set to enable. If you want to avoid crawling temporarily available.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Management UI Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 3.0 management UI Guide'>
|
||||
<p>Here, is the description of the Fess 3.0 management UI.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting a label</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting a label'>
|
||||
<p>Here are settings for the label. Label can classify documents that appear in search results, select the crawl settings in. If you register the label shown select label drop-down box to the right of the search box.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu label.</p>
|
||||
<img alt='List of labels' src='/images/ja/3.0/labelType-1.png'/>
|
||||
<img alt='Setting a label' src='/images/ja/3.0/labelType-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Display name'>
|
||||
<p>Specifies the name that is displayed when the search label drop-down select.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the identifier when a classified document. This value will be sent to Solr. Must be alphanumeric characters.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Log file download</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Log file download'>
|
||||
<p>Describes the log files will be output in the Fess download.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click log file menu.</p>
|
||||
<img alt='Session information' src='/images/ja/3.0/log-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Download'>
|
||||
<p>You can download the log file and click the log file name.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Duplicate host settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Duplicate host settings'>
|
||||
<p>Here are settings on the duplicate host. Available when the duplicate host to be treated as the same thing crawling at a different host name. For example, if you want the same site www.example.com and example.com in available.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu duplicate host.</p>
|
||||
<img alt='A list of the duplicate host' src='/images/ja/3.0/overlappingHost-1.png'/>
|
||||
<img alt='Duplicate host settings' src='/images/ja/3.0/overlappingHost-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Canonical name'>
|
||||
<p>Specify the canonical host name. Duplicate host names replace the canonical host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Duplicate names'>
|
||||
<p>Specify the host names are duplicated. Specifies the host name you want to replace.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,26 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Path mapping settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Path mapping settings'>
|
||||
<p>Here are settings for path mapping. You can use if you want replaced path mapping links appear in search results.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu path mappings.</p>
|
||||
<img alt='List of path mapping' src='/images/ja/3.0/pathMapping-1.png'/>
|
||||
<img alt='Path mapping settings' src='/images/ja/3.0/pathMapping-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Path mapping'>
|
||||
<p>Path mapping is replaced by parts to match the specified regular expression, replace the string with. When crawling a local filesystem environment may search result links are not valid. Such cases using path mapping, you can control the search results link. You can specify multiple path mappings.</p>
|
||||
</subsection>
|
||||
<subsection name='Regular expressions'>
|
||||
<p>Specifies the string you want to replace. How to write a<a href='http://java.sun.com/javase/ja/6/docs/ja/api/java/util/regex/Pattern.html'>Regular expressions in Java 6</a>To follow.</p>
|
||||
</subsection>
|
||||
<subsection name='Replacement character'>
|
||||
<p>Specifies the string to replace the matched regular expression.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,26 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting a request header</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting a request header'>
|
||||
<p>Here the request header. Feature request headers request header information added to requests when you get to crawl documents. Available if, for example, to see header information in the authentication system, if certain values are logged automatically.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click request header menu.</p>
|
||||
<img alt='A list of request headers' src='/images/ja/3.0/requestHeader-1.png'/>
|
||||
<img alt='Setting a request header' src='/images/ja/3.0/requestHeader-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='The name'>
|
||||
<p>Specifies the request header name to append to the request.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the request header value to append to the request.</p>
|
||||
</subsection>
|
||||
<subsection name='Web name'>
|
||||
<p>Select a Web crawl setting name to add request headers. Only selected the crawl settings in appended to the request header.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for a role</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for a role'>
|
||||
<p>Here are settings for the role. Role is selected in the crawl settings, you can classify the document appears in the search results. About how to use the<a href='../config/role-setting.html'>Settings for a role</a>Please see the.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu role.</p>
|
||||
<img alt='The list of roles' src='/images/ja/3.0/roleType-1.png'/>
|
||||
<img alt='Settings for a role' src='/images/ja/3.0/roleType-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Display name'>
|
||||
<p>Specifies the name that appears in the list.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the identifier when a classified document. This value will be sent to Solr. Must be alphanumeric characters.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,28 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>System settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='System settings'>
|
||||
<p>Describes the settings related to Solr, here registration in Fess. SOLR servers are grouped by file, has been registered.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Solr.</p>
|
||||
<img alt='System settings' src='/images/ja/3.0/system-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Process state'>
|
||||
<p>Update server appears as a running if additional documents, such as the. Crawl process displays the session ID when running. You can safely shut down and shut down when not running Fess server to shut down. If the process does not terminate if you shut a Fess is running to finish crawling process.</p>
|
||||
</subsection>
|
||||
<subsection name='Search for the update server'>
|
||||
<p>Server group name is used to search for and update appears.</p>
|
||||
</subsection>
|
||||
<subsection name='The status of the server'>
|
||||
<p>Server becomes unavailable and the status of disabled. For example, inaccessible to the Solr server and changes to disabled. To enable recovery after server become unavailable will become available.</p>
|
||||
</subsection>
|
||||
<subsection name='Action to the SOLR server.'>
|
||||
<p>You can publish index commit, optimize for server groups. You can also remove a specific search for the session ID.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,37 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Web authentication settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Web authentication settings'>
|
||||
<p>Describes Web authentication is required when set against here, using Web crawling. Fess is corresponding to a crawl for BASIC authentication and DIGEST authentication.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Web authentication.</p>
|
||||
<img alt='Configuring Web authentication' src='/images/ja/3.0/webAuthentication-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Host name'>
|
||||
<p>Specifies the host name of the site that requires authentication. Web crawl settings you specify if applicable in any host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Port'>
|
||||
<p>Specifies the port of the site that requires authentication. Specify-1 to apply for all ports. Web crawl settings you specified and if applicable on any port.</p>
|
||||
</subsection>
|
||||
<subsection name='Realm'>
|
||||
<p>Specifies the realm name of the site that requires authentication. Web crawl settings you specify if applicable in any realm name.</p>
|
||||
</subsection>
|
||||
<subsection name='Authentication methods'>
|
||||
<p>Select the authentication method. You can use BASIC authentication or DIGEST authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='User name'>
|
||||
<p>Specifies the user name to log in authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='Password'>
|
||||
<p>Specifies the password to log into the certification site.</p>
|
||||
</subsection>
|
||||
<subsection name='Web name'>
|
||||
<p>Select to apply the above authentication settings Web settings name. Must be registered in advance Web crawl settings.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,99 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for crawling the Web using</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for crawling the Web using'>
|
||||
<p>Describes the settings here, using Web crawling.</p>
|
||||
<p>Recommends that if you want to index document number 100000 over in Fess crawl settings for one to several tens of thousands of these. One crawl setting a target number 100000 from the indexed performance degrades.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Web.</p>
|
||||
<img alt='Web crawl settings' src='/images/ja/3.0/webCrawlingConfig-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Setting name'>
|
||||
<p>Is the name that appears on the list page.</p>
|
||||
</subsection>
|
||||
<subsection name='Specify a URL'>
|
||||
<p>You can specify multiple URLs. http: or https: in the specify starting. For example,</p>
|
||||
<source><![CDATA[
|
||||
http://localhost/
|
||||
http://localhost:8080/
|
||||
]]></source>
|
||||
<p>The so determines.</p>
|
||||
</subsection>
|
||||
<subsection name='URL filtering'>
|
||||
<p>By specifying regular expressions you can exclude the crawl and search for specific URL pattern.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>URL to crawl</th>
|
||||
<td>Crawl the URL for the specified regular expression.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Excluded from the crawl URL</th>
|
||||
<td>The URL for the specified regular expression does not crawl. The URL to crawl, even WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>To search for URL</th>
|
||||
<td>The URL for the specified regular expression search. Even if specified and the URL to the search excluded WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>To exclude from the search URL</th>
|
||||
<td>URL for the specified regular expression search. Unable to search all links since they exclude from being crawled and crawled when the search and not just some.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For example, http: URL to crawl if not crawl //localhost/ less than the</p>
|
||||
<source><![CDATA[
|
||||
http://localhost/.*
|
||||
]]></source>
|
||||
<p>Also be excluded if the extension of png want to exclude from the URL</p>
|
||||
<source><![CDATA[
|
||||
.*\.png$
|
||||
]]></source>
|
||||
<p>It specifies. It is possible to specify multiple in the line for.</p>
|
||||
</subsection>
|
||||
<subsection name='Depth'>
|
||||
<p>That will follow the links contained in the document in the crawl order can specify the tracing depth.</p>
|
||||
</subsection>
|
||||
<subsection name='Maximum access'>
|
||||
<p>You can specify the number of documents to retrieve crawl.</p>
|
||||
</subsection>
|
||||
<subsection name='User agent'>
|
||||
<p>You can specify the user agent to use when crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of threads'>
|
||||
<p>Specifies the number of threads you want to crawl. Value of 5 in 5 threads crawling the website at the same time.</p>
|
||||
</subsection>
|
||||
<subsection name='Interval'>
|
||||
<p>Is the interval (in milliseconds) to crawl documents. 5000 when one thread is 5 seconds at intervals Gets the document.</p>
|
||||
<p>Number of threads, 5 pieces, will be to go to and get the 5 documents per second between when 1000 millisecond interval,. Set the adequate value when crawling a website to the Web server, the load would not load.</p>
|
||||
</subsection>
|
||||
<subsection name='Boost value'>
|
||||
<p>You can search URL in this crawl setting to weight. Available in the search results on other than you want to. The standard is 1. Priority higher values, will be displayed at the top of the search results. If you want to see results other than absolutely in favor, including 10,000 sufficiently large value.</p>
|
||||
<p>Values that can be specified is an integer greater than 0. This value is used as the boost value when adding documents to Solr.</p>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>Register the browser type was selected as the crawled documents. Even if you select only the PC search on your mobile device not appear in results. If you want to see only specific mobile devices also available.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>You can control only when a particular user role can appear in search results. You must roll a set before you. For example, available by the user in the system requires a login, such as portal servers, search results out if you want.</p>
|
||||
</subsection>
|
||||
<subsection name='Label'>
|
||||
<p>You can label with search results. Search on each label, such as enable, in the search screen, specify the label.</p>
|
||||
</subsection>
|
||||
<subsection name='State'>
|
||||
<p>Crawl crawl time, is set to enable. If you want to avoid crawling temporarily available.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
<section name='Other'>
|
||||
<subsection name='Sitemap'>
|
||||
<p>Fess and crawls sitemap file, as defined in the URL to crawl. Sitemap<a href='http://www.sitemaps.org/'>http://www.sitemaps.org/</a> Of the specification. Available formats are XML Sitemaps and XML Sitemaps Index the text (URL line written in)</p>
|
||||
<p>Site map the specified URL. Sitemap is a XML files and XML files for text, when crawling that URL of ordinary or cannot distinguish between what a sitemap. Because the file name is sitemap.*.xml, sitemap.*.gz, sitemap.*txt in the default URL as a Sitemap handles (in webapps/fess/WEB-INF/classes/s2robot_rule.dicon can be customized).</p>
|
||||
<p>Crawls sitemap file to crawl the HTML file links will crawl the following URL in the next crawl.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,28 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>File size you want to crawl settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='File size settings'>
|
||||
<p>You can specify the file size limit crawl of Fess. In the default HTML file is 2.5 MB, otherwise handles up to 10 m bytes. Edit the webapps/fess/WEB-INF/classes/s2robot_contentlength.dicon if you want to change the file size handling. Standard s2robot_contentlength.dicon is as follows.</p>
|
||||
<source><![CDATA[
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE components PUBLIC "-//SEASAR//DTD S2Container 2.4//EN"
|
||||
"http://www.seasar.org/dtd/components24.dtd">
|
||||
<components>
|
||||
<component name="contentLengthHelper" class="org.seasar.robot.helper.ContentLengthHelper" instance="singleton" >
|
||||
<property name="defaultMaxLength">10485760L</property><!-- 10M -->
|
||||
<initMethod name="addMaxLength">
|
||||
<arg>"text/html"</arg>
|
||||
<arg>2621440L</arg><!-- 2.5M -->
|
||||
</initMethod>
|
||||
</component>
|
||||
</components>
|
||||
]]></source>
|
||||
<p>Change the value of defaultMaxLength if you want to change the default value. Dealing with file size can be specified for each content type. Describes the maximum file size to handle text/HTML and HTML files.</p>
|
||||
<p>Note the amount of heap memory to use when changing the maximum allowed file size handling. About how to set up<a href='memory-config.html'>Memory-related</a>Please see the.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,13 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Index backup and restore</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Index data backup and restore'>
|
||||
<p>The index data is managed by Solr. Backup from the Administration screen of the Fess, and cases will be in the size and number of Gigabit can not index data.</p>
|
||||
<p>If you need to index data backup stopped the Fess from back solr/core1/data directory. Also, index data backed up to restore to undo.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Set up Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 3.0 Configuration Guide'>
|
||||
<p>Here is the Fess 3.0 Setup instructions.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,43 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Log settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Log file'>
|
||||
<p>The following summarizes the log file to output the Fess.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>File name</th>
|
||||
<th>Contents</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>webapps/fess/WEB-INF/logs/fess.out</td>
|
||||
<td>Fess server log. Output logging operation in the management and search screens, etc.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>webapps/fess/WEB-INF/logs/fess_crawler.out</td>
|
||||
<td>Crawl log. Crawling log output.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>logs/Catalina.out</td>
|
||||
<td>Log of the Fess Server (Tomcat). SOLR relevant log output.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>If you encounter problems to work check the log.</p>
|
||||
</section>
|
||||
<section name='Log settings'>
|
||||
<p>Sets the output log information is at webapps/fess/WEB-INF/classes/log4j.xml. By default output INFO level.</p>
|
||||
<p>For example, better Fess up to document for Solr log if you want to output in log4j.xml disconnect the commented-out section below.</p>
|
||||
<source><![CDATA[
|
||||
<logger name="jp.sf.fess.solr" >
|
||||
<level value ="debug" />
|
||||
</logger>
|
||||
]]></source>
|
||||
<p>See the Log4J documentation if you need detailed settings for the log output.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,42 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Use memory-related settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='The maximum value of heap memory changes'>
|
||||
<p>If the contents of the crawl settings cause OutOfMemory error similar to the following.</p>
|
||||
<source><![CDATA[
|
||||
java.lang.OutOfMemoryError: Java heap space
|
||||
]]></source>
|
||||
<p>Increase the maximum heap memory occur. bin/setenv. [sh | bat] to (in this case the maximum value set 1024M) will change to-Xmx1024m.</p>
|
||||
<source><![CDATA[
|
||||
Windowsの場合
|
||||
...-Dpdfbox.cjk.support=true -Xmx1024m
|
||||
|
||||
Unixの場合
|
||||
...-Dpdfbox.cjk.support=true -Xmx1024m"
|
||||
]]></source>
|
||||
</section>
|
||||
<section name='Crawler side memory maximum value changes'>
|
||||
<p>
|
||||
Crawler side memory maximum value can be changed.
|
||||
The default is 512 m.</p>
|
||||
<p>
|
||||
Unplug the commented out webapps/fess/WEB-INF/classes/fess.dicon crawlerJavaOptions to change, change the-Xmx1024m (in this case the maximum value set 1024M).
|
||||
</p>
|
||||
<source><![CDATA[
|
||||
<component name="systemHelper" class="jp.sf.fess.helper.SystemHelper">
|
||||
<property name="adminRole">"fess"</property>
|
||||
<property name="authenticatedRoles">"role1"</property>
|
||||
<property name="crawlerJavaOptions">new String[] {
|
||||
"-Djava.awt.headless=true", "-server", "-XX:+UseGCOverheadLimit",
|
||||
"-XX:+UseConcMarkSweepGC", "-XX:+CMSIncrementalMode",
|
||||
"-XX:+UseTLAB", "-Dpdfbox.cjk.support=true", "-Xmx1024m",
|
||||
"-XX:MaxPermSize=128m" }</property>
|
||||
</component>
|
||||
]]></source>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,17 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Mobile device information settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Mobile phone information update'>
|
||||
<p>The mobile device information<a class='externalLink' href='http://valueengine.jp/'>ValueEngine Inc.</a>That provided more available. If you want to use the latest mobile device information downloaded device profile save the removed _YYYY-MM-DD and webapps/fess/WEB-INF/classes/device. After the restart to enable change.</p>
|
||||
<source><![CDATA[
|
||||
ProfileData_YYYY-MM-DD.csv -> ProfileData.csv
|
||||
UserAgent_YYYY-MM-DD.csv -> UserAgent.csv
|
||||
DisplayInfo_YYYY-MM-DD.csv -> DisplayInfo.csv
|
||||
]]></source>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,24 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Path encoding change</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Path encoding change'>
|
||||
<p>in non-HTML file, different character set precedents and the file name if the search result link text is garbled.</p>
|
||||
<p>For example, test.txt file contents are written in UTF-8, file name Shift_JIS, the link text is garbled.</p>
|
||||
</section>
|
||||
<subsection name='How to set up'>
|
||||
<p>For example by revising the webapps/fess/WEB-INF/classes/s2robot_transformer.dicon as shown below, to resolve paths in Shift_JIS.</p>
|
||||
<source><![CDATA[
|
||||
<component name="fessFileTransformer" class="jp.sf.fess.transformer.FessFileTransformer" instance="singleton">
|
||||
<property name="name">"fessFileTransformer"</property>
|
||||
<property name="ignoreEmptyContent">true</property>
|
||||
<property name="encoding">"Shift_JIS"</property>
|
||||
</component>
|
||||
]]></source>
|
||||
</subsection>
|
||||
|
||||
</body>
|
||||
</document>
|
|
@ -1,17 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Stemming settings</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='About stemming'>
|
||||
<p>In Fess when indexing and searching the stemming process done.</p>
|
||||
<p>This is to normalize the English word processing, for example, words such as recharging and rechargable is normalized to form recharg. Hit and even if you search by recharging the word this word rechargable, less search leakage is expected.</p>
|
||||
</section>
|
||||
<section name='about protwords.txt'>
|
||||
<p>You may not intended for the stemming process basic rule-based processing, normalization is done. For example, Maine (state name) Word will be normalized in the main.</p>
|
||||
<p>In this case, by adding Maine to protwords.txt, you can exclude the stemming process.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,57 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Proxy settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
|
||||
<section name='For the crawler proxy settings'>
|
||||
<p>If you like crawling external sites from within the intranet firewall may end up blocked crawl. Set the proxy for the crawler in that case.</p>
|
||||
</section>
|
||||
<subsection name='How to set up'>
|
||||
<p>Proxy is set in to create webapps/Fess/Web-INF/classes/s9robot_client.dicon with the following contents.</p>
|
||||
<source><![CDATA[
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE components PUBLIC "-//SEASAR//DTD S2Container 2.4//EN"
|
||||
"http://www.seasar.org/dtd/components24.dtd">
|
||||
<components>
|
||||
<include path="s2robot_robotstxt.dicon"/>
|
||||
<include path="s2robot_contentlength.dicon"/>
|
||||
|
||||
<component name="httpClient" class="org.seasar.robot.client.http.CommonsHttpClient" instance="prototype">
|
||||
<property name="cookiePolicy">@org.apache.commons.httpclient.cookie.CookiePolicy@BROWSER_COMPATIBILITY</property>
|
||||
<property name="proxyHost">"プロキシホスト名"</property>
|
||||
<property name="proxyPort">プロキシポート</property>
|
||||
<!-- プロキシに認証がある場合
|
||||
<property name="proxyCredentials">
|
||||
<component class="org.apache.commons.httpclient.UsernamePasswordCredentials">
|
||||
<arg>"プロキシ用ユーザー名"</arg>
|
||||
<arg>"プロキシ用パスワード"</arg>
|
||||
</component>
|
||||
</property>
|
||||
-->
|
||||
</component>
|
||||
|
||||
<component name="fsClient" class="org.seasar.robot.client.fs.FileSystemClient" instance="prototype">
|
||||
<property name="charset">"UTF-8"</property>
|
||||
</component>
|
||||
|
||||
<component name="clientFactory" class="org.seasar.robot.client.S2RobotClientFactory" instance="prototype">
|
||||
<initMethod name="addClient">
|
||||
<arg>{"http:.*", "https:.*"}</arg>
|
||||
<arg>httpClient</arg>
|
||||
</initMethod>
|
||||
<initMethod name="addClient">
|
||||
<arg>"file:.*"</arg>
|
||||
<arg>fsClient</arg>
|
||||
</initMethod>
|
||||
</component>
|
||||
|
||||
</components>
|
||||
|
||||
]]></source>
|
||||
</subsection>
|
||||
|
||||
</body>
|
||||
</document>
|
|
@ -1,25 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting up replication</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting up replication'>
|
||||
<p>Fess can copy the path in Solr index data. You can distribute load during indexing to build two in Fess of the crawl and index creation and search for Fess servers.</p>
|
||||
<p>You must use the replication features of Fess for Solr index file in the shared disk, such as NFS, Fess of each can be referenced from.</p>
|
||||
</section>
|
||||
<section name='How to build a'>
|
||||
<subsection name='Building indexes for Fess'>
|
||||
<p>Fess, download and install the.<code>/ /NET/Server1/usr/local/Fess</code> To assume you installed.</p>
|
||||
<p>To register the crawl settings as well as Fess starts after the normal construction, create the index (index for Fess building instructions normal building procedures and especially remains the same) crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Search for Fess building'>
|
||||
<p>Fess, download and install the.<code>/ /NET/Server2/usr/local/Fess</code> To assume you installed.</p>
|
||||
<p>To enable replication features check box in Fess starts after the management screen crawl settings the "snapshot path'. Snapshot path designates the index location for the index for Fess. In this case, the<code>/NET/Server1/usr/local/Fess //solr/core1/data/index</code> In the will.</p>
|
||||
<img alt='Replication' src='/images/ja/3.0/crawl-2.png'/>
|
||||
<p>Time press the update button to save the data and set in Schedule performs replication of the index.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,97 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting role-based search</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='With role-based search'>
|
||||
<p>You can divide out search results in Fess in any authentication system authenticated users credentials to. For example, find rolls a does appears role information in search results with the roles a user a user b will not display it. By using this feature, user login in the portal and single sign-on environment belongs to you can enable search, sector or job title.</p>
|
||||
<p>In role-based search of the Fess roll information available below.</p>
|
||||
<ul>
|
||||
<li>Request parameter</li>
|
||||
<li>Request header</li>
|
||||
<li>Cookies</li>
|
||||
<li>J2EE authentication information</li>
|
||||
</ul>
|
||||
<p>To save authentication information in cookies for authentication when running of Fess in portal and agent-based single sign-on system domain and path that can retrieve role information. You can also reverse proxy type single sign-on system access to Fess adding authentication information in the request headers and request parameters to retrieve role information.</p>
|
||||
</section>
|
||||
<section name='Setting role-based search'>
|
||||
<p>Describes how to set up role-based search using J2EE authentication information.</p>
|
||||
<subsection name='Tomcat-users.xml settings'>
|
||||
<p>conf/Tomcat-users.XML the add roles and users. This time the role1 role perform role-based search. Login to role1.</p>
|
||||
<source><![CDATA[
|
||||
<?xml version='1.0' encoding='utf-8'?>
|
||||
<tomcat-users>
|
||||
<role rolename="fess"/>
|
||||
<role rolename="solr"/>
|
||||
<role rolename="role1"/>
|
||||
<user username="admin" password="admin" roles="fess"/>
|
||||
<user username="solradmin" password="solradmin" roles="solr"/>
|
||||
<user username="role1" password="role1" roles="role1"/>
|
||||
</tomcat-users>
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='app.dicon settings'>
|
||||
<p>sets the webapps/fess/WEB-INF/classes/app.dicon shown below.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<component name="roleQueryHelper" class="jp.sf.fess.helper.impl.RoleQueryHelperImpl">
|
||||
<property name="defaultRoleList">
|
||||
{"guest"}
|
||||
</property>
|
||||
</component>
|
||||
:
|
||||
]]></source>
|
||||
<p>You can set the role information by setting the defaultRoleList, there is no authentication information. Do not display the search results need roles for users not logged in you.</p>
|
||||
</subsection>
|
||||
<subsection name='Fess.dicon settings'>
|
||||
<p>sets the webapps/fess/WEB-INF/classes/fess.dicon shown below.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<component name="systemHelper" class="jp.sf.fess.helper.SystemHelper">
|
||||
<property name="authenticatedRoles">"role1"</property>
|
||||
</component>
|
||||
:
|
||||
]]></source>
|
||||
<p>authenticatedRoles can describe multiple by commas (,).</p>
|
||||
</subsection>
|
||||
<subsection name='Web.xml settings'>
|
||||
<p>sets the webapps/fess/WEB-INF/web.xml shown below.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<security-constraint>
|
||||
<web-resource-collection>
|
||||
<web-resource-name>Fess Authentication</web-resource-name>
|
||||
<url-pattern>/login/login</url-pattern>
|
||||
</web-resource-collection>
|
||||
<auth-constraint>
|
||||
<role-name>fess</role-name>
|
||||
<role-name>role1</role-name>
|
||||
</auth-constraint>
|
||||
</security-constraint>
|
||||
:
|
||||
<security-role>
|
||||
<role-name>fess</role-name>
|
||||
</security-role>
|
||||
|
||||
<security-role>
|
||||
<role-name>role1</role-name>
|
||||
</security-role>
|
||||
:
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Settings in the Administration screen of the Fess'>
|
||||
<p>Fess up and log in as an administrator. From the role of the menu set name Role1 (any name) and value register role at role1. After the crawl settings want to use in the user with the role1 in, crawl Crawl Settings select Role1.</p>
|
||||
</subsection>
|
||||
<subsection name='Log roll'>
|
||||
<p>Log out from the management screen. log in as user Role1. A successful login and redirect to the top of the search screen.</p>
|
||||
<p>Only thing was the Role1 role setting in the crawl settings search as usual, and displayed.</p>
|
||||
<p>Also, search not logged in will be search by guest user.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll out'>
|
||||
<p>Whether or not logged out, logged in a non-Admin role to access http://localhost:8080/fess/admin screen appears. By pressing the logout button will log out.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,30 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Ports changes</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Changing the port'>
|
||||
<p>Fess by default, you use the port 8080. Change in the following steps to change.</p>
|
||||
<subsection name='Tomcat port changes'>
|
||||
<p>Change the port Tomcat is Fess available. Modifies the following described conf/server.xml changes.</p>
|
||||
<ul>
|
||||
<li>8080: HTTP access port</li>
|
||||
<li>8005: shut down port</li>
|
||||
<li>8009: AJP port</li>
|
||||
<li>: SSL HTTP access port 8443 (the default is off)</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
<subsection name='SOLR configuration'>
|
||||
<p>May need to change if you change the Tomcat port using the settings in the standard configuration, the same Solr-Tomcat, so Fess Solr server referenced information. change the webapps/fess/WEB-INF/classes/fess_solr.dicon.</p>
|
||||
<source><![CDATA[
|
||||
<arg>"http://localhost:8080/solr"</arg>
|
||||
]]></source>
|
||||
<p>
|
||||
<b>Note: to display the error on search and index update: cannot access the Solr server and do not change if you change the Tomcat port similar to the above ports.</b>
|
||||
</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,48 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>How to use the dynamic field of SOLR</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Dynamic field of SOLR'>
|
||||
<p>SOLR is document items (fields) for each to the schema defined in order to register. Available in Fess Solr schema is defined in solr/core1/conf/schema.xml. dynamic fields and standard fields such as title and content can be freely defined field names are defined. The dynamic fields that are available in the schema.xml Fess become. Advanced parameter values see a Solr document.</p>
|
||||
<source><![CDATA[
|
||||
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_t" type="text" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_ti" type="tint" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_tl" type="tlong" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_tf" type="tfloat" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_td" type="tdouble" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_tdt" type="tdate" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_pi" type="pint" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_pl" type="plong" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_pf" type="pfloat" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_pd" type="pdouble" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_pdt" type="pdate" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_si" type="sint" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_sl" type="slong" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_sf" type="sfloat" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_sd" type="sdouble" indexed="true" stored="true"/>
|
||||
<dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
|
||||
]]></source>
|
||||
</section>
|
||||
<section name='How to use the'>
|
||||
<p>I think scenes using the dynamic field of many, in database scrawl's, such as registering in datastore crawl settings. How to register dynamic fields in database scrawl by placing the script other_t = hoge hoge column data into Solr other_t field.</p>
|
||||
<p>You need to add fields for the following in the dynamic field data out of Solr using webapps/fess/WEB-INF/classes/app.dicon. Add the other_t.</p>
|
||||
<source><![CDATA[
|
||||
<component name="queryHelper" class="jp.sf.fess.helper.impl.QueryHelperImpl">
|
||||
<property name="responseFields">new String[]{"id", "score", "boost",
|
||||
"contentLength", "host", "site", "lastModified", "mimetype",
|
||||
"tstamp", "title", "digest", "url", "other_t" }</property>
|
||||
</component>
|
||||
]]></source>
|
||||
<p>Edit the JSP file has made returns from Solr in the above settings, so to display on the page. Login to the manage screen, displays the design. Display of search results the search results displayed on the page (the content), so edit the JSP file. where you want to display the other_t value in $ {f:h(doc.other_t)} and you can display the value registered in.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,37 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>SOLR failure operation</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='SOLR failure operation'>
|
||||
<p>Solr server group in the Fess, managing multiple groups. Change the status of servers and groups if the server and group information that keeps a Fess, inaccessible to the Solr server.</p>
|
||||
<p>SOLR server state information can change in system setting. maxErrorCount, maxRetryStatusCheckCount, maxRetryUpdateQueryCount and minActiveServer can be defined in the webapps/fess/WEB-INF/classes/fess_solr.dicon.</p>
|
||||
<subsection name='Solr group failure behavior'>
|
||||
<ul>
|
||||
<li>When SOLR group within Solr server number of valid state minActiveServer less than Solr group will be disabled.</li>
|
||||
<li>Solr server number of valid state is minActiveServer following group in the SOLR Solr group into an invalid state if is not, you can access to the Solr server, disable Solr server status maxRetryStatusCheckCount check to Solr server status change from the disabled state the valid state. The valid state not changed and was able to access Solr Server index corrupted state.</li>
|
||||
<li>Disable Solr group is not available.</li>
|
||||
<li>SOLR group to enable States to the group in the Solr Solr server status change enabled in system settings management screen.</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
<subsection name='Behavior of search failures'>
|
||||
<ul>
|
||||
<li>Search queries can send valid Solr group.</li>
|
||||
<li>Search queries will be sent only to valid Solr server.</li>
|
||||
<li>Send a search query to fewer available if you register a Solr server multiple SOLR group in the Solr server.</li>
|
||||
<li>The search query was sent to the SOLR server fails maxErrorCount than Solr server modifies the disabled state.</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
<subsection name='Update: disabled behavior'>
|
||||
<ul>
|
||||
<li>Update queries you can send valid state Solr group.</li>
|
||||
<li>Update query will be sent only to valid Solr server.</li>
|
||||
<li>If multiple Solr servers are registered in the SOLR group in any valid state Solr server send the update query.</li>
|
||||
<li>Is sent to the SOLR Server update query fails maxRetryUpdateQueryCount than Solr server modifies the index corrupted state.</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,36 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for the index string extraction</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
|
||||
<section name='About the index string extraction'>
|
||||
<p>You must isolate the document in order to register as the index when creating indexes for the search.</p>
|
||||
<p>Tokenizer is used for this.</p>
|
||||
<p>Basically, carved by the tokenizer units smaller than go find no hits.</p>
|
||||
<p>For example, statements of living in Tokyo, Japan. Was split by the tokenizer now, this statement is in Tokyo, living and so on. In this case, in Tokyo, Word search, you will get hit. However, when performing a search with the word 'Kyoto' will not be hit.</p>
|
||||
<p>For selection of the tokenizer is important.</p>
|
||||
<p>You can change the tokenizer by setting the schema.xml analyzer part is if the Fess in the default CJKTokenizer used.</p>
|
||||
</section>
|
||||
|
||||
<subsection name='About CJKTokenizer'>
|
||||
<p>Such as CJKTokenizer Japan Japanese multibyte string against bi-gram, in other words two characters create index. In this case, can't find one letter words.</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name='About StandardTokenizer'>
|
||||
<p>StandardTokenizer creates index uni-gram, in other words one by one for the Japan language of multibyte-character strings. Therefore, the less search leakage. Also, with StandardTokenizer can't CJKTokenizer the search query letter to search to.</p>
|
||||
<p>The following example to change schema.xml so analyzer parts, you can use the StandardTokenizer.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
<types>
|
||||
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
|
||||
<analyzer>
|
||||
<tokenizer class="solr.StandardTokenizerFactory"/>
|
||||
:
|
||||
]]></source>
|
||||
</subsection>
|
||||
|
||||
</body>
|
||||
</document>
|
|
@ -1,49 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Register for the Windows service</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Registered as a Windows service'>
|
||||
<p>You can register the Fess as a Windows service in a Windows environment. How to register a service is similar to the Tomcat.</p>
|
||||
<subsection name='Advance preparation'>
|
||||
<p>Because if you registered as a Windows service, the crawling process is going to see Windows system environment variables<b>Is Java JAVA_HOME environment variables for the system to register</b>, As well as <b>Add %JAVA_HOME%\bin to Path</b>You must.</p>
|
||||
</subsection>
|
||||
<subsection name='Setting'>
|
||||
<p>to edit the webapps \fess\WEB-INF\classes\fess.dicon, remove the-server option. (No pdfbox.cjk.support options from 3.1.0)</p>
|
||||
<source><![CDATA[
|
||||
<component name="systemHelper" class="jp.sf.fess.helper.SystemHelper">
|
||||
<!--
|
||||
<property name="adminRole">"fess"</property>
|
||||
<property name="authenticatedRoles">"role1"</property>
|
||||
-->
|
||||
<property name="crawlerJavaOptions">new String[] {
|
||||
"-Djava.awt.headless=true", "-XX:+UseGCOverheadLimit",
|
||||
"-XX:+UseConcMarkSweepGC", "-XX:+CMSIncrementalMode",
|
||||
"-XX:+UseTLAB", "-Xmx512m", "-XX:MaxPermSize=128m"
|
||||
}</property>
|
||||
</component>
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='How to register'>
|
||||
<p>First, after installing the Fess from the command prompt service.bat performs (such as Vista to launch as administrator you must). Fess was installed on C:\Java\fess-server-3.0.0.</p>
|
||||
<source><![CDATA[
|
||||
> cd C:\Java\fess-server-3.0.0\bin
|
||||
> service.bat install fess
|
||||
...
|
||||
The service 'fess' has been installed.
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='How to check the setting'>
|
||||
<p>By making the following you can review properties for Fess. To run the following, Tomcat Properties window appears.</p>
|
||||
<source><![CDATA[
|
||||
> tomcat6w.exe //ES//fess
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Service settings'>
|
||||
<p>Control Panel - to display the management tool in administrative tools - services, you can set automatic start like normal Windows services.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 3.0 Search Guide'>
|
||||
<p>Here is the instructions on how to search for Fess 3.0.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,57 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search by specifying a search field</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Field searches'>
|
||||
<p>In the Fess crawl results saved in the title and text fields. You can search for a field of them.</p>
|
||||
<p>You can search for a the following fields by default.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>URL</th>
|
||||
<td>The crawl URL</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>host</th>
|
||||
<td>Were included in the crawl URL host name</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>site</th>
|
||||
<td>Site name was included in the crawl URL</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>title</th>
|
||||
<td>Title</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>content</th>
|
||||
<td>Text</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>contentLength</th>
|
||||
<td>You crawl the content size</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>lastModified</th>
|
||||
<td>Last update of the content you want to crawl</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>mimetype</th>
|
||||
<td>The MIME type of the content</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>If you do not specify the fields title and content to search for.</p>
|
||||
<subsection name='How to search'>
|
||||
<p>If a field search "field name: search terms ' of so fill out the search form, the search.</p>
|
||||
<p>Title against Fess the search as a search term.</p>
|
||||
<source><![CDATA[
|
||||
title:Fess
|
||||
]]></source>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,14 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search by label</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Search by label'>
|
||||
<p>By label to be registered in the management screen will enable search by labels in the search screen. You can use the label if you want to sort the search results. If you do not register the label displayed the label drop-down box.</p>
|
||||
<img alt='Search by label' src='/images/ja/3.0/search-label-1.png'/>
|
||||
<p>To set the label by creating indexes, can search each crawl settings specified on the label. All results search search do not specify a label is usually the same.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,15 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>NOT search</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='NOT search (3.1 or later)'>
|
||||
<p>If you want to find documents that do not contain a Word can NOT find.
|
||||
Locate the NOT search as NOT in front of the Word does not contain. Is NOT in uppercase characters ago and need space.</p>
|
||||
<p>For example, searches, enter if you want to find documents that contain the search term 1 does not contain a search term 2 search term 1 NOT search words 2.</p>
|
||||
<p>Attention is required because NOT find expensive.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,15 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>OR search</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='OR search (3.1 or later)'>
|
||||
<p>If you want to find documents that contain any of the search terms OR search use.
|
||||
When describing the multiple words in the search box, by default will search.
|
||||
You want OR search the case describes OR between search words. OR write in capital letters, spaces are required before and after.</p>
|
||||
<p>For example, the search, enter if you want to search for documents that contain either search term 2 search term 1 search term 1 OR search term 2. OR between multiple languages are available.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,44 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search sort</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Sort search'>
|
||||
<p>To sort the search results by specifying the fields such as search time.</p>
|
||||
<p>You can sort the following fields by default.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>Tstamp</th>
|
||||
<td>On the crawl</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>contentLength</th>
|
||||
<td>You crawl the content size</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>lastModified</th>
|
||||
<td>Last update of the content you want to crawl</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<subsection name='How to sort'>
|
||||
<p>If you want to sort ' sort: field name ' in to fill out the search form, the search.</p>
|
||||
<p>In ascending order sort the content size as a search term, Fess is below.</p>
|
||||
<source><![CDATA[
|
||||
Fess sort:contentLength
|
||||
]]></source>
|
||||
<p>To sort in descending order as below.</p>
|
||||
<source><![CDATA[
|
||||
Fess sort:contentLength.desc
|
||||
]]></source>
|
||||
<p>If you sort by multiple fields separated list, shown below.</p>
|
||||
<source><![CDATA[
|
||||
Fess sort:contentLength.desc,lastModified
|
||||
]]></source>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting the browser type</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting the browser type'>
|
||||
<p>Describes the settings related to the browser type. Search results are browser type can be added to the data, for each type of browser browsing search results out into.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu browser types.</p>
|
||||
<img alt='Setting the browser type' src='/images/ja/4.0/browserType-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>You can set the display name and value. It is used if you want more new terminals. You do not need special customizations are used only where necessary.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,34 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings Wizard</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings Wizard'>
|
||||
<p>Introduction to the Configuration Wizard.</p>
|
||||
<p>You can use Settings Wizard, to set you up on the fess.</p>
|
||||
<subsection name='How to use Setup Wizard'>
|
||||
<p>In Administrator account after logging in, click menu Settings Wizard.</p>
|
||||
<img alt='Settings Wizard' src='/images/ja/4.0/config-wizard-1.png'/>
|
||||
<p>First, setting a schedule.</p>
|
||||
<p>During the time in fess is crawling and indexes.</p>
|
||||
<p>By default, every day is a 0 時 0 分.</p>
|
||||
<img alt='Setting a schedule' src='/images/ja/4.0/config-wizard-2.png'/>
|
||||
<p>The crawl settings.</p>
|
||||
<p>Crawl settings is to register a URI to look for.</p>
|
||||
<p>The crawl settings name please put name of any easy to identify.</p>
|
||||
<p>Put the URI part de-indexed, want to search for.</p>
|
||||
<img alt='Crawl settings' src='/images/ja/4.0/config-wizard-3.png'/>
|
||||
<p>For example, if you want search for http://example.com, below looks like.</p>
|
||||
<img alt='Crawl settings example' src='/images/ja/4.0/config-wizard-4.png'/>
|
||||
<p>In this is the last setting.</p>
|
||||
<p>Crawl start button press the start crawling. Not start until in the time specified in the scheduling settings by pressing the Finish button if the crawl.</p>
|
||||
<img alt='Crawl started' src='/images/ja/4.0/config-wizard-5.png'/>
|
||||
</subsection>
|
||||
<subsection name='Changes to settings'>
|
||||
<p>Settings in the Setup Wizard you can change from crawl General, Web, file system.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,139 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>The General crawl settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='The General crawl settings'>
|
||||
<p>Describes the settings related to crawling.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account click crawl General menu after login.</p>
|
||||
<img alt='Crawl General' src='/images/ja/4.0/crawl-1.png'/>
|
||||
<p>You can specify the path to a generated index and replication capabilities to enable.</p>
|
||||
<img alt='Replication features' src='/images/ja/4.0/crawl-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='Scheduled full crawl frequency'>
|
||||
<p>You can set the interval at which the crawl for a Web site or file system. By default, the following.</p>
|
||||
<source><![CDATA[
|
||||
0 0 0 * * ?
|
||||
]]></source>
|
||||
<p>Figures are from left, seconds, minutes, during the day, month, represents a day of the week. Description format is similar to the Unix cron settings. This example, and am 0 時 0 分 to crawling daily.</p>
|
||||
<p>Following are examples of how to write.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0 12 * *?</td>
|
||||
<td align='left'>Each day starts at 12 pm</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 15 10? * *</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10 * *?</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 15 10 * *? *</td>
|
||||
<td align='left'>Day 10: 15 am start</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10 * *? 2005</td>
|
||||
<td align='left'>Each of the 2009 start am, 10:15</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 * 14 * *?</td>
|
||||
<td align='left'>Every day 2:00 in the PM-2: 59 pm start every 1 minute</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0 / 5 14 * *?</td>
|
||||
<td align='left'>Every day 2:00 in the PM-2: 59 pm start every 5 minutes</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 0 / 5 14, 18 * *?</td>
|
||||
<td align='left'>Every day 2:00 pm-2: 59 pm and 6: 00 starts every 5 minutes at the PM-6: 59 pm</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 0-5 14 * *?</td>
|
||||
<td align='left'>Every day 2:00 in the PM-2: 05 pm start every 1 minute</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>0 10, 44 14? 3 WED</td>
|
||||
<td align='left'>Starts Wednesday March 2: 10 and 2: 44 pm</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>0 15 10? * MON-FRI</td>
|
||||
<td align='left'>Monday through Friday at 10:15 am start</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>Also check if the seconds can be set to run at intervals 60 seconds by default. If you set seconds exactly and you should customize webapps/fess/WEB-INF/classes/chronosCustomize.dicon taskScanIntervalTime value, if enough do I see in one-hour increments.</p>
|
||||
</subsection>
|
||||
<subsection name='Search log'>
|
||||
<p>When the user enters a search, the search the output log. If you want to get search statistics to enable.</p>
|
||||
</subsection>
|
||||
<subsection name='Add search parameters'>
|
||||
<p>Search results link attaches to the search term. To display the find search terms in PDF becomes possible.</p>
|
||||
</subsection>
|
||||
<subsection name='XML response'>
|
||||
<p>Search results can be retrieved in XML format. http://localhost:8080/Fess/XML? can get access query = search term.</p>
|
||||
</subsection>
|
||||
<subsection name='JSON response'>
|
||||
<p>Search results available in JSON format. http://localhost:8080/Fess/JSON? can get access query = search term.</p>
|
||||
</subsection>
|
||||
<subsection name='Mobile translation'>
|
||||
<p>If theses PC website search results on mobile devices may not display correctly. And select the mobile conversion, such as if the PC site for mobile terminals, and to show that you can. You can if you choose Google Google Wireless Transcoder allows to display content on mobile phones. For example, if site for PC and mobile devices browsing the results in the search for mobile terminals search results will link in the search result link passes the Google Wireless Transcoder. You can use smooth mobile transformation in mobile search.</p>
|
||||
</subsection>
|
||||
<subsection name='The default label value'>
|
||||
<p>You can specify the label to see if the label by default,. Specifies the value of the label.</p>
|
||||
</subsection>
|
||||
<subsection name='Search support'>
|
||||
<p>You can specify whether or not to display a search screen. If you select Web unusable for mobile search screen. If not available not available search screen. And if you want to create a dedicated index server and select not available.</p>
|
||||
</subsection>
|
||||
<subsection name='Featured keyword response'>
|
||||
<p>In JSON format often find search words becomes available. can be retrieved by accessing the http://localhost:8080/Fess/hotsearchword.</p>
|
||||
</subsection>
|
||||
<subsection name='Specify the number of days before session information removed'>
|
||||
<p>Delete a session log for the specified number of days ago. One day in the one log purge old log is deleted.</p>
|
||||
</subsection>
|
||||
<subsection name='Specify the number of days before search log delete'>
|
||||
<p>Delete a search log for the specified number of days ago. One day in the one log purge old log is deleted.</p>
|
||||
</subsection>
|
||||
<subsection name='Log deletion Bots name'>
|
||||
<p>Specifies the Bots name Bots you want to remove from the search log logs included in the user agent by commas (,). Log is deleted by log purge once a day.</p>
|
||||
</subsection>
|
||||
<subsection name='CSV encoding'>
|
||||
<p>Specifies the encoding for the CSV will be available in the backup and restore.</p>
|
||||
</subsection>
|
||||
<subsection name='Replication features'>
|
||||
<p>To enable replication features that can apply already copied the Solr index generated. For example, you can use them if you want to search only in the search servers crawled and indexed on a different server, placed in front.</p>
|
||||
</subsection>
|
||||
<subsection name='Index commit, optimize'>
|
||||
<p>After the data is registered for Solr. Index to commit or to optimize the registered data becomes available. If optimize is issued the Solr index optimization, if you have chosen, you choose to commit the commit is issued.</p>
|
||||
</subsection>
|
||||
<subsection name='Server switchovers'>
|
||||
<p>Fess can combine multiple Solr server as a group, the group can manage multiple. Solr server group for updates and search for different groups to use. For example, if you had two groups using the Group 2 for update, search for use of Group 1. After the crawl has been completed if switching server updates for Group 1, switches to group 2 for the search. It is only valid if you have registered multiple Solr server group.</p>
|
||||
</subsection>
|
||||
<subsection name='Committed to the document number of each'>
|
||||
<p>To raise the performance of the index in Fess while crawling and sends for Solr document in 20 units. For each value specified here because without committing to continue adding documents documents added in the Solr on performance, Solr issued document commits. By default, after you add documents 1000 is committed.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of concurrent crawls settings'>
|
||||
<p>Fess document crawling is done on Web crawling, and file system CROLL. You can crawl to a set number of values in each crawl specified here only to run simultaneously multiple. For example, crawl setting number of concurrent as 3 Web crawling set 1-set 10 if the crawling runs until the set 3 3 set 1-. Complete crawl of any of them, and will start the crawl settings 4. Similarly, setting 10 to complete one each in we will start one.</p>
|
||||
<p>But you can specify the number of threads in the crawl settings simultaneously run crawl setting number is not indicates the number of threads to start. For example, if 3 in the number of concurrent crawls settings, number of threads for each crawl settings and 5 3 x 5 = 15 thread count up and crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Expiration date of the index'>
|
||||
<p>You can automatically delete data after the data has been indexed. If you select the 5, with the expiration of index register at least 5 days before and had no update is removed. If you omit data content has been removed, can be used.</p>
|
||||
</subsection>
|
||||
<subsection name='Disability types to exclude'>
|
||||
<p>Registered disabled URL URL exceeds the failure count next time you crawl to crawl out. No need to worry about disability type is crawled next time by specifying this value.</p>
|
||||
</subsection>
|
||||
<subsection name='Failure count'>
|
||||
<p>Disaster URL exceeds the number of failures will crawl out.</p>
|
||||
</subsection>
|
||||
<subsection name='Snapshot path'>
|
||||
<p>Copy index information from the index directory as the snapshot path, if replication is enabled, will be applied.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,34 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Set session information</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Set session information'>
|
||||
<p>Describes the settings related to the session information. One time the crawl results saved as a single session information. You can check the run time and the number of indexed.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the session information menu.</p>
|
||||
</subsection>
|
||||
<subsection name='Session information list'>
|
||||
<img alt='Session information list' src='/images/ja/4.0/crawlingSession-1.png'/>
|
||||
<p>You can remove all session information and click the Delete link all in the running.</p>
|
||||
</subsection>
|
||||
<subsection name='Session details'>
|
||||
<img alt='Session details' src='/images/ja/4.0/crawlingSession-2.png'/>
|
||||
<p>To specify a session ID, you can see crawling content.</p>
|
||||
<ul>
|
||||
<li>Information about the entire crawl Cralwer *:</li>
|
||||
<li>FsCrawl *: information about the file system crawling</li>
|
||||
<li>WebCrawl *: crawling the Web information</li>
|
||||
<li>Information issued by Solr server optimization optimize *:</li>
|
||||
<li>Commit *: information about the commit was issued to the Solr server.</li>
|
||||
<li>* StartTime: start time</li>
|
||||
<li>* EndTime: end time</li>
|
||||
<li>* ExecTime: execution time (MS)</li>
|
||||
<li>* IndexSize: number of documents indexed</li>
|
||||
</ul>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,33 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Configuration backup and restore</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Configuration backup and restore'>
|
||||
<p>Here, describes Fess information backup and restore methods.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu backup and restore.</p>
|
||||
<img alt='Backup and restore' src='/images/ja/4.0/data-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Backup settings'>
|
||||
<p>Click the download link and Fess information output in XML format. Saved settings information is below.</p>
|
||||
<ul>
|
||||
<li>The General crawl settings</li>
|
||||
<li>Web crawl settings</li>
|
||||
<li>File system Crawl settings</li>
|
||||
<li>Path mapping</li>
|
||||
<li>Web authentication</li>
|
||||
<li>Compatible browsers</li>
|
||||
</ul>
|
||||
<p>Session information, search log, click log is available in CSV format.</p>
|
||||
<p>In the SOLR index data and data being crawled is not backed up. Those data can Fess setting information to crawl after the restore, regenerate.</p>
|
||||
</subsection>
|
||||
<subsection name='Restore settings'>
|
||||
<p>You can restore settings information, various log in to upload XML output by backup or CSV. To specify the files, please click the restore button on the data.</p>
|
||||
<p>If enable overwrite data in XML file configuration information specified when the same data is updating existing data.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,129 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Data store configuration</title>
|
||||
<author>Sone, Takaaki</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Data store configuration'>
|
||||
<p>You can crawl databases in Fess. Here are required to store settings.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu data store.</p>
|
||||
<img alt='Data store configuration' src='/images/ja/4.0/dataStoreCrawling-1.png'/>
|
||||
<p>As an example, the following table database named testdb MySQL, user name hoge, fuga password connection and the will to make it.</p>
|
||||
<source><![CDATA[
|
||||
CREATE TABLE job (
|
||||
id BIGINT NOT NULL AUTO_INCREMENT
|
||||
, title VARCHAR(100) NOT NULL
|
||||
, content VARCHAR(255) NOT NULL
|
||||
, versionNo INTEGER NOT NULL
|
||||
, PRIMARY KEY (id)
|
||||
);
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Parameter'>
|
||||
<p>Parameter settings example looks like the following.</p>
|
||||
<source><![CDATA[
|
||||
driver=com.mysql.jdbc.Driver
|
||||
url=jdbc:mysql://localhost:3306/testdb?useUnicode=true&characterEncoding=UTF-8
|
||||
username=hoge
|
||||
password=fuga
|
||||
sql=select * from job
|
||||
]]></source>
|
||||
<p>Parameter is a "key = value" format. Description of the key is as follows.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>driver</td>
|
||||
<td align='left'>Driver class name</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>URL</td>
|
||||
<td align='left'>URL</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>username</td>
|
||||
<td align='left'>To connect to the DB user name</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>password</td>
|
||||
<td align='left'>To connect to the DB password</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>SQL</td>
|
||||
<td align='left'>Want to crawl to get SQL statement</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</subsection>
|
||||
<subsection name='Script'>
|
||||
<p>Script configuration example looks like the following.</p>
|
||||
<source><![CDATA[
|
||||
url="http://localhost/" + id
|
||||
host="localhost"
|
||||
site="localhost"
|
||||
title=title
|
||||
content=content
|
||||
cache=content
|
||||
digest=content
|
||||
anchor=
|
||||
contentLength=content.length()
|
||||
lastModified=content.length()
|
||||
]]></source>
|
||||
<p>
|
||||
Parameter is a "key = value" format.
|
||||
Description of the key is as follows.</p>
|
||||
<p>
|
||||
Side of the value written in OGNL. String, tie up in double quotation marks.
|
||||
Access in the database column name, its value.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>URL</td>
|
||||
<td align='left'>URLs (links appear in search results)</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>host</td>
|
||||
<td align='left'>Host name</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>site</td>
|
||||
<td align='left'>Site pass</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>title</td>
|
||||
<td align='left'>Title</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>content</td>
|
||||
<td align='left'>Content (string index)</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>cache</td>
|
||||
<td align='left'>Content cache (not indexed)</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Digest</td>
|
||||
<td align='left'>Digest piece that appears in the search results</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>anchor</td>
|
||||
<td align='left'>Links to content (not usually required)</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>contentLength</td>
|
||||
<td align='left'>The length of the content</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>lastModified</td>
|
||||
<td align='left'>Content last updated</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</subsection>
|
||||
<subsection name='Driver'>
|
||||
<p>To connect to the database driver is needed. keep the jar file in webapps/fess/WEB-INF/cmd/lib.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,101 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Appearance settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Appearance settings'>
|
||||
<p>Here are settings for the design of search screens.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu design.</p>
|
||||
<img alt='Design' src='/images/ja/4.0/design-1.png'/>
|
||||
<p>You can edit the search screen in the screen below.</p>
|
||||
<img alt='JSP compilation screen' src='/images/ja/4.0/design-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='Image file'>
|
||||
<p>You can upload the image files to use in the search screen. Image file names are supported are jpg, gif and png.</p>
|
||||
</subsection>
|
||||
<subsection name='Image file name'>
|
||||
<p>If you want the file name to upload image files to use. Uploaded if you omit the file name will be used.</p>
|
||||
</subsection>
|
||||
<subsection name='Design JSP files'>
|
||||
<p>You can edit the JSP files in the search screen. You can by pressing the Edit button of the JSP file, edit the current JSP files. And pressing the button will default to edit as a JSP file when you install. To keep with the update button in the Edit screen, changes are reflected.</p>
|
||||
<p>Following are examples of how to write.</p>
|
||||
<table class='table table-striped table-bordered table-condensed'>
|
||||
<tbody>
|
||||
<tr class='a'>
|
||||
<td align='left'>Top page (frame)</td>
|
||||
<td align='left'>Is a JSP file search home page. This JSP include JSP file of each part.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Top page (within the Head tags)</td>
|
||||
<td align='left'>This is the express search home page head tag in JSP files. If you want to edit the meta tags, title tags, script tags, such as the change.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Top page (content)</td>
|
||||
<td align='left'>Is a JSP file to represent the body tag in the search home page.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results pages (frames)</td>
|
||||
<td align='left'>Search result is a list page of JSP files. This JSP include JSP file of each part.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (within the Head tags)</td>
|
||||
<td align='left'>Search result is a JSP file to represent within the head tag of the list page. If you want to edit the meta tags, title tags, script tags, such as the change.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results page (header)</td>
|
||||
<td align='left'>Search result is a JSP file to represent the header of the list page. Include search form at the top.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (footer)</td>
|
||||
<td align='left'>Search result is a JSP file that represents the footer part of the page. Contains the copyright page at the bottom.</td>
|
||||
</tr>
|
||||
<tr class='b'>
|
||||
<td align='left'>Search results pages (content)</td>
|
||||
<td align='left'>Search results search results list page is a JSP file to represent the part. Is the search results when the JSP file. If you want to customize the search result representation change.</td>
|
||||
</tr>
|
||||
<tr class='a'>
|
||||
<td align='left'>Search results page (result no)</td>
|
||||
<td align='left'>Search results search results list page is a JSP file to represent the part. Is a JSP file when the search result is not used.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>You can to edit for PCs and similar portable screen.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
<section name='How to set up'>
|
||||
<subsection name='To view the registration date and modified date'>
|
||||
<p>If you want to display in the search results crawl in Fess and registered or modified files to get the search results page (content), write the following.</p>
|
||||
<source><![CDATA[
|
||||
:
|
||||
:
|
||||
<ol>
|
||||
<c:forEach var="doc" varStatus="s" items="${documentItems}">
|
||||
<%
|
||||
java.util.Map docMap = (java.util.Map)pageContext.getAttribute("doc");
|
||||
Long tstampValue = (Long)docMap.get("tstamp");
|
||||
java.util.Date tstampDate = new java.util.Date(tstampValue);
|
||||
Long lastModifiedValue = (Long)docMap.get("lastModified");
|
||||
java.util.Date lastModifiedDate = new java.util.Date(lastModifiedValue);
|
||||
java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat("yyyy/MM/dd HH:mm");
|
||||
%>
|
||||
<li>
|
||||
<h3 class="title">
|
||||
<a href="${doc.urlLink}">${f:h(doc.contentTitle)}</a>
|
||||
</h3>
|
||||
<div class="body">
|
||||
${doc.contentDescription}
|
||||
<br/>
|
||||
<cite>${f:h(doc.site)}</cite>
|
||||
<br>Registered: <%= sdf.format(tstampDate) %>
|
||||
<br>Last Modified: <%= sdf.format(lastModifiedDate) %>
|
||||
:
|
||||
:
|
||||
]]></source>
|
||||
<p>tstampDate will update on registration date, lastModifiedDate. Output date format is specified in SimpeDateFormat.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,21 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Disaster URL</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Disaster URL'>
|
||||
<p>Here the failure URL. URL could not be obtained at crawl time are recorded and confirmed as the failure URL.</p>
|
||||
<subsection name='How to display'>
|
||||
<p>In Administrator account click menu disabled URL after login.</p>
|
||||
<img alt='Disaster URL' src='/images/ja/4.0/failureUrl-1.png'/>
|
||||
<p>Clicking the confirmation link failure URL displayed for more information.</p>
|
||||
<img alt='Details of the disaster URL' src='/images/ja/4.0/failureUrl-2.png'/>
|
||||
</subsection>
|
||||
<subsection name='List'>
|
||||
<p>A glance could not crawl the URL and date.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,40 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for file system authentication</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for file system authentication'>
|
||||
<p>Crawls using file system here, describes how to set file system authentication is required. Fess is corresponding to a crawl for a shared folder in Windows.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu file system authentication.</p>
|
||||
<img alt='File system settings' src='/images/ja/4.0/fileAuthentication-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Host name'>
|
||||
<p>Specifies the host name of the site that requires authentication. Is omitted, the specified file system Kroll set applicable in any host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Port'>
|
||||
<p>Specifies the port of the site that requires authentication. Specify-1 to apply for all ports. File system Crawl settings specified in that case applies on any port.</p>
|
||||
</subsection>
|
||||
<subsection name='Authentication methods'>
|
||||
<p>Select the authentication method. You can use SAMBA (Windows shared folder authentication).</p>
|
||||
</subsection>
|
||||
<subsection name='User name'>
|
||||
<p>Specifies the user name to log in authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='Password'>
|
||||
<p>Specifies the password to log into the certification site.</p>
|
||||
</subsection>
|
||||
<subsection name='Parameter'>
|
||||
<p>Sets if the authentication site login required settings. SAMBA, the set value of the domain. If you want to write as.</p>
|
||||
<source><![CDATA[
|
||||
domain=FUGA
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='File system name'>
|
||||
<p>Select a file name to apply the authentication settings for the above. Must be registered ago you file system CROLL.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,98 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for crawling a file system using</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for crawling a file system using'>
|
||||
<p>Describes the settings for crawl here, using file system.</p>
|
||||
<p>Recommends that if you want to index document number 100000 over in Fess crawl settings for one to several tens of thousands of these. One crawl setting a target number 100000 from the indexed performance degrades.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu file.</p>
|
||||
<img alt='Setting file system Crawl' src='/images/ja/4.0/fileCrawlingConfig-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Setting name'>
|
||||
<p>Is the name that appears on the list page.</p>
|
||||
</subsection>
|
||||
<subsection name='Specifying a path'>
|
||||
<p>You can specify multiple paths. file: or smb: in the specify starting. For example,</p>
|
||||
<source><![CDATA[
|
||||
file:/home/taro/
|
||||
file:/home/documents/
|
||||
smb://host1/share/
|
||||
]]></source>
|
||||
<p>The so determines. Patrolling below the specified directory.</p>
|
||||
<p>So there is need to write URI if the Windows environment path that c:\Documents\taro in file/c: /Documents/taro and specify.</p>
|
||||
<p>Windows shared folder, for example, if you want to crawl to host1 share folder crawl settings for smb: (last / to) the //host1/share/. If authentication is in the shared folder on the file system authentication screen set authentication information.</p>
|
||||
</subsection>
|
||||
<subsection name='Path filtering'>
|
||||
<p>By specifying regular expressions you can exclude the crawl and search for given path pattern.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>Path to crawl</th>
|
||||
<td>Crawl the path for the specified regular expression.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>The path to exclude from being crawled</th>
|
||||
<td>The path for the specified regular expression does not crawl. The path you want to crawl, even WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Path to be searched</th>
|
||||
<td>The path for the specified regular expression search. Even if specified path to find excluded and WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Path to exclude from searches</th>
|
||||
<td>Not search the path for the specified regular expression. Unable to search all links since they exclude from being crawled and crawled when the search and not just some.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For example, the path to target if you don't crawl less than/home /</p>
|
||||
<source><![CDATA[
|
||||
file:/home/.*
|
||||
]]></source>
|
||||
<p>Also the path to exclude if extension of png want to exclude from</p>
|
||||
<source><![CDATA[
|
||||
.*\.png$
|
||||
]]></source>
|
||||
<p>It specifies. It is possible to specify multiple line breaks in.</p>
|
||||
<p>How to specify the URI handling java.io.File: Looks like:</p>
|
||||
<source><![CDATA[
|
||||
/home/taro -> file:/home/taro
|
||||
c:\memo.txt -> file:/c:/memo.txt
|
||||
\\server\memo.txt -> file:////server/memo.txt
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Depth'>
|
||||
<p>Specify the depth of a directory hierarchy.</p>
|
||||
</subsection>
|
||||
<subsection name='Maximum access'>
|
||||
<p>You can specify the number of documents to retrieve crawl.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of threads'>
|
||||
<p>Specifies the number of threads you want to crawl. Value of 5 in 5 threads crawling the website at the same time.</p>
|
||||
</subsection>
|
||||
<subsection name='Interval'>
|
||||
<p>Is the time interval to crawl documents. 5000 when one thread is 5 seconds at intervals Gets the document.</p>
|
||||
<p>Number of threads, 5 pieces, will be to go to and get the 5 documents per second between when 1000 millisecond interval,.</p>
|
||||
</subsection>
|
||||
<subsection name='Boost value'>
|
||||
<p>You can search URL in this crawl setting to weight. Available in the search results on other than you want to. The standard is 1. Priority higher values, will be displayed at the top of the search results. If you want to see results other than absolutely in favor, including 10,000 sufficiently large value.</p>
|
||||
<p>Values that can be specified is an integer greater than 0. This value is used as the boost value when adding documents to Solr.</p>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>Register the browser type was selected as the crawled documents. Even if you select only the PC search on your mobile device not appear in results. If you want to see only specific mobile devices also available.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>You can control only when a particular user role can appear in search results. You must roll a set before you. > For example, available by the user in the system requires a login, such as portal servers, search results out if you want.</p>
|
||||
</subsection>
|
||||
<subsection name='Label'>
|
||||
<p>You can label with search results. Search on each label, such as enable, in the search screen, specify the label.</p>
|
||||
</subsection>
|
||||
<subsection name='State'>
|
||||
<p>Crawl crawl time, is set to enable. If you want to avoid crawling temporarily available.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,12 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Management UI Guide</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Fess 4.0 management UI Guide'>
|
||||
<p>Here, is the description of the Fess 4.0 management UI.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,29 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting a label</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting a label'>
|
||||
<p>Here are settings for the label. Label can classify documents that appear in search results, select the crawl settings in. If you register the label shown select label drop-down box to the right of the search box.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu label.</p>
|
||||
<img alt='List of labels' src='/images/ja/4.0/labelType-1.png'/>
|
||||
<img alt='Setting a label' src='/images/ja/4.0/labelType-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Display name'>
|
||||
<p>Specifies the name that is displayed when the search label drop-down select.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the identifier when a classified document. This value will be sent to Solr. Must be alphanumeric characters.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>Specifies the role to view the label.</p>
|
||||
</subsection>
|
||||
<subsection name='Display order'>
|
||||
<p>Specifies the order of the labels.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Log file download</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Log file download'>
|
||||
<p>Describes the log files will be output in the Fess download.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click log file menu.</p>
|
||||
<img alt='Session information' src='/images/ja/4.0/log-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Download'>
|
||||
<p>You can download the log file and click the log file name.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Duplicate host settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Duplicate host settings'>
|
||||
<p>Here are settings on the duplicate host. Available when the duplicate host to be treated as the same thing crawling at a different host name. For example, if you want the same site www.example.com and example.com in available.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click the menu duplicate host.</p>
|
||||
<img alt='A list of the duplicate host' src='/images/ja/4.0/overlappingHost-1.png'/>
|
||||
<img alt='Duplicate host settings' src='/images/ja/4.0/overlappingHost-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Canonical name'>
|
||||
<p>Specify the canonical host name. Duplicate host names replace the canonical host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Duplicate names'>
|
||||
<p>Specify the host names are duplicated. Specifies the host name you want to replace.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,26 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Path mapping settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Path mapping settings'>
|
||||
<p>Here are settings for path mapping. You can use if you want replaced path mapping links appear in search results.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu path mappings.</p>
|
||||
<img alt='List of path mapping' src='/images/ja/4.0/pathMapping-1.png'/>
|
||||
<img alt='Path mapping settings' src='/images/ja/4.0/pathMapping-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Path mapping'>
|
||||
<p>Path mapping is replaced by parts to match the specified regular expression, replace the string with. When crawling a local filesystem environment may search result links are not valid. Such cases using path mapping, you can control the search results link. You can specify multiple path mappings.</p>
|
||||
</subsection>
|
||||
<subsection name='Regular expressions'>
|
||||
<p>Specifies the string you want to replace. How to write a<a href='http://java.sun.com/javase/ja/6/docs/ja/api/java/util/regex/Pattern.html'>Regular expressions in Java 6</a>To follow.</p>
|
||||
</subsection>
|
||||
<subsection name='Replacement character'>
|
||||
<p>Specifies the string to replace the matched regular expression.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,26 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Setting a request header</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Setting a request header'>
|
||||
<p>Here the request header. Feature request headers request header information added to requests when you get to crawl documents. Available if, for example, to see header information in the authentication system, if certain values are logged automatically.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click request header menu.</p>
|
||||
<img alt='A list of request headers' src='/images/ja/4.0/requestHeader-1.png'/>
|
||||
<img alt='Setting a request header' src='/images/ja/4.0/requestHeader-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='The name'>
|
||||
<p>Specifies the request header name to append to the request.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the request header value to append to the request.</p>
|
||||
</subsection>
|
||||
<subsection name='Web name'>
|
||||
<p>Select a Web crawl setting name to add request headers. Only selected the crawl settings in appended to the request header.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for a role</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for a role'>
|
||||
<p>Here are settings for the role. Role is selected in the crawl settings, you can classify the document appears in the search results. About how to use the<a href='../config/role-setting.html'>Settings for a role</a>Please see the.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu role.</p>
|
||||
<img alt='The list of roles' src='/images/ja/4.0/roleType-1.png'/>
|
||||
<img alt='Settings for a role' src='/images/ja/4.0/roleType-2.png'/><!-- TODO -->
|
||||
</subsection>
|
||||
<subsection name='Display name'>
|
||||
<p>Specifies the name that appears in the list.</p>
|
||||
</subsection>
|
||||
<subsection name='Value'>
|
||||
<p>Specifies the identifier when a classified document. This value will be sent to Solr. Must be alphanumeric characters.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Search'>
|
||||
<p>Here the search for management.</p>
|
||||
<subsection name='How to display'>
|
||||
<p>In Administrator account after logging in, click the menu search.</p>
|
||||
<img alt='Administrative search' src='/images/ja/4.0/search-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Search list'>
|
||||
<p>You can search by criteria you specify. In the regular search screen role and browser requirements is added implicitly, but do not provide management for search. You can document a certain remove from index from the search results.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Search log settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Search log'>
|
||||
<p>Here the search log. When you search in the search screen users search logs are logged. Search log search term or date is recorded. You can also record the URL, then you want the search results to.</p>
|
||||
<subsection name='How to display'>
|
||||
<p>In Administrator account after logging in, click menu search logs.</p>
|
||||
<img alt='Search log' src='/images/ja/4.0/searchLog-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Search log list'>
|
||||
<p>Search language and date are listed. You can review and detailed, you click the URL.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,19 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Statistics</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Statistics'>
|
||||
<p>Here are statistics. You can search log and click log.</p>
|
||||
<subsection name='How to display'>
|
||||
<p>In Administrator account after logging in, click the menu statistics.</p>
|
||||
<img alt='Statistics' src='/images/ja/4.0/stats-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Statistics'>
|
||||
<p>You can select the target by selecting the type of report, to make sure. Displayed in order by the specified criteria.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,31 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>System settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='System settings'>
|
||||
<p>Describes the settings related to Solr, here registration in Fess. SOLR servers are grouped by file, has been registered.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Solr.</p>
|
||||
<img alt='System settings' src='/images/ja/4.0/system-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Process state'>
|
||||
<p>Update server appears as a running if additional documents, such as the. Crawl process displays the session ID when running. You can safely shut down and shut down when not running Fess server to shut down. If the process does not terminate if you shut a Fess is running to finish crawling process.</p>
|
||||
</subsection>
|
||||
<subsection name='Search for the update server'>
|
||||
<p>Server group name is used to search for and update appears.</p>
|
||||
</subsection>
|
||||
<subsection name='The status of the server'>
|
||||
<p>Server becomes unavailable and the status of disabled. For example, inaccessible to the Solr server and changes to disabled. To enable recovery after server become unavailable will become available.</p>
|
||||
</subsection>
|
||||
<subsection name='Action to the SOLR server.'>
|
||||
<p>You can publish index commit, optimize for server groups. You can also remove a specific search for the session ID. You can remove only the specific documents by specifying the URL.</p>
|
||||
</subsection>
|
||||
<subsection name='Documents added'>
|
||||
<p>Shown by the number of documents registered in each session. Can verify the results list by clicking the session name.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,28 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>System information</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='System information'>
|
||||
<p>Here, you can currently check property information such as system environment variables.</p>
|
||||
<subsection name='How to display'>
|
||||
<p>In Administrator account after logging in, click system information menu.</p>
|
||||
<img alt='System information' src='/images/ja/4.0/systemInfo-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Environment variables'>
|
||||
<p>You can list the server environment variable.</p>
|
||||
</subsection>
|
||||
<subsection name='System Properties'>
|
||||
<p>You can list the system properties on Fess.</p>
|
||||
</subsection>
|
||||
<subsection name='Fess property'>
|
||||
<p>Fess setup information available.</p>
|
||||
</subsection>
|
||||
<subsection name='For bug reports'>
|
||||
<p>Is a list of properties to attach when reporting a bug. Extract the value contains no personal information.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,44 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Web authentication settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Web authentication settings'>
|
||||
<p>Describes Web authentication is required when set against here, using Web crawling. Fess is corresponding to a crawl for BASIC authentication and DIGEST authentication.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Web authentication.</p>
|
||||
<img alt='Configuring Web authentication' src='/images/ja/4.0/webAuthentication-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Host name'>
|
||||
<p>Specifies the host name of the site that requires authentication. Web crawl settings you specify if applicable in any host name.</p>
|
||||
</subsection>
|
||||
<subsection name='Port'>
|
||||
<p>Specifies the port of the site that requires authentication. Specify-1 to apply for all ports. Web crawl settings you specified and if applicable on any port.</p>
|
||||
</subsection>
|
||||
<subsection name='Realm'>
|
||||
<p>Specifies the realm name of the site that requires authentication. Web crawl settings you specify if applicable in any realm name.</p>
|
||||
</subsection>
|
||||
<subsection name='Authentication methods'>
|
||||
<p>Select the authentication method. You can use BASIC authentication, DIGEST authentication or NTLM authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='User name'>
|
||||
<p>Specifies the user name to log in authentication.</p>
|
||||
</subsection>
|
||||
<subsection name='Password'>
|
||||
<p>Specifies the password to log into the certification site.</p>
|
||||
</subsection>
|
||||
<subsection name='Parameter'>
|
||||
<p>Sets if the authentication site login required settings. You can set the workstation and domain values for NTLM authentication. If you want to write as.</p>
|
||||
<source><![CDATA[
|
||||
workstation=HOGE
|
||||
domain=FUGA
|
||||
]]></source>
|
||||
</subsection>
|
||||
<subsection name='Web name'>
|
||||
<p>Select to apply the above authentication settings Web settings name. Must be registered in advance Web crawl settings.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,99 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Settings for crawling the Web using</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Settings for crawling the Web using'>
|
||||
<p>Describes the settings here, using Web crawling.</p>
|
||||
<p>Recommends that if you want to index document number 100000 over in Fess crawl settings for one to several tens of thousands of these. One crawl setting a target number 100000 from the indexed performance degrades.</p>
|
||||
<subsection name='How to set up'>
|
||||
<p>In Administrator account after logging in, click menu Web.</p>
|
||||
<img alt='Web crawl settings' src='/images/ja/4.0/webCrawlingConfig-1.png'/>
|
||||
</subsection>
|
||||
<subsection name='Setting name'>
|
||||
<p>Is the name that appears on the list page.</p>
|
||||
</subsection>
|
||||
<subsection name='Specify a URL'>
|
||||
<p>You can specify multiple URLs. http: or https: in the specify starting. For example,</p>
|
||||
<source><![CDATA[
|
||||
http://localhost/
|
||||
http://localhost:8080/
|
||||
]]></source>
|
||||
<p>The so determines.</p>
|
||||
</subsection>
|
||||
<subsection name='URL filtering'>
|
||||
<p>By specifying regular expressions you can exclude the crawl and search for specific URL pattern.</p>
|
||||
<table>
|
||||
<tbody>
|
||||
<tr>
|
||||
<th>URL to crawl</th>
|
||||
<td>Crawl the URL for the specified regular expression.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Excluded from the crawl URL</th>
|
||||
<td>The URL for the specified regular expression does not crawl. The URL to crawl, even WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>To search for URL</th>
|
||||
<td>The URL for the specified regular expression search. Even if specified and the URL to the search excluded WINS here.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>To exclude from the search URL</th>
|
||||
<td>URL for the specified regular expression search. Unable to search all links since they exclude from being crawled and crawled when the search and not just some.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For example, http: URL to crawl if not crawl //localhost/ less than the</p>
|
||||
<source><![CDATA[
|
||||
http://localhost/.*
|
||||
]]></source>
|
||||
<p>Also be excluded if the extension of png want to exclude from the URL</p>
|
||||
<source><![CDATA[
|
||||
.*\.png$
|
||||
]]></source>
|
||||
<p>It specifies. It is possible to specify multiple in the line for.</p>
|
||||
</subsection>
|
||||
<subsection name='Depth'>
|
||||
<p>That will follow the links contained in the document in the crawl order can specify the tracing depth.</p>
|
||||
</subsection>
|
||||
<subsection name='Maximum access'>
|
||||
<p>You can specify the number of documents to retrieve crawl.</p>
|
||||
</subsection>
|
||||
<subsection name='User agent'>
|
||||
<p>You can specify the user agent to use when crawling.</p>
|
||||
</subsection>
|
||||
<subsection name='Number of threads'>
|
||||
<p>Specifies the number of threads you want to crawl. Value of 5 in 5 threads crawling the website at the same time.</p>
|
||||
</subsection>
|
||||
<subsection name='Interval'>
|
||||
<p>Is the interval (in milliseconds) to crawl documents. 5000 when one thread is 5 seconds at intervals Gets the document.</p>
|
||||
<p>Number of threads, 5 pieces, will be to go to and get the 5 documents per second between when 1000 millisecond interval,. Set the adequate value when crawling a website to the Web server, the load would not load.</p>
|
||||
</subsection>
|
||||
<subsection name='Boost value'>
|
||||
<p>You can search URL in this crawl setting to weight. Available in the search results on other than you want to. The standard is 1. Priority higher values, will be displayed at the top of the search results. If you want to see results other than absolutely in favor, including 10,000 sufficiently large value.</p>
|
||||
<p>Values that can be specified is an integer greater than 0. This value is used as the boost value when adding documents to Solr.</p>
|
||||
</subsection>
|
||||
<subsection name='Browser type'>
|
||||
<p>Register the browser type was selected as the crawled documents. Even if you select only the PC search on your mobile device not appear in results. If you want to see only specific mobile devices also available.</p>
|
||||
</subsection>
|
||||
<subsection name='Roll'>
|
||||
<p>You can control only when a particular user role can appear in search results. You must roll a set before you. For example, available by the user in the system requires a login, such as portal servers, search results out if you want.</p>
|
||||
</subsection>
|
||||
<subsection name='Label'>
|
||||
<p>You can label with search results. Search on each label, such as enable, in the search screen, specify the label.</p>
|
||||
</subsection>
|
||||
<subsection name='State'>
|
||||
<p>Crawl crawl time, is set to enable. If you want to avoid crawling temporarily available.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
<section name='Other'>
|
||||
<subsection name='Sitemap'>
|
||||
<p>Fess and crawls sitemap file, as defined in the URL to crawl. Sitemap<a href='http://www.sitemaps.org/'>http://www.sitemaps.org/</a> Of the specification. Available formats are XML Sitemaps and XML Sitemaps Index the text (URL line written in)</p>
|
||||
<p>Site map the specified URL. Sitemap is a XML files and XML files for text, when crawling that URL of ordinary or cannot distinguish between what a sitemap. Because the file name is sitemap.*.xml, sitemap.*.gz, sitemap.*txt in the default URL as a Sitemap handles (in webapps/fess/WEB-INF/classes/s2robot_rule.dicon can be customized).</p>
|
||||
<p>Crawls sitemap file to crawl the HTML file links will crawl the following URL in the next crawl.</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,35 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>The desktop search settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Desktop search'>
|
||||
<p>
|
||||
Increasing awareness of security in the browser environment in recent years, open a local file (for example, c:\hoge.txt) from the Web pages on.
|
||||
Not to copy and paste the link from the search results, and then reopen the usability is good.
|
||||
In order to respond to this in Fess and provides desktop search functionality.</p>
|
||||
</section>
|
||||
<section name='Setting'>
|
||||
<p>
|
||||
Desktop Search feature is turned off by default.
|
||||
Please enable the following settings.</p>
|
||||
<p>First of all, bin/setenv.bat as java.awt.headless from true to false edits.</p>
|
||||
<source><![CDATA[
|
||||
... -Djava.awt.headless=false ...
|
||||
]]></source>
|
||||
<p>Then add the following to webapps/fess/WEB-INF/conf/crawler.properties.</p>
|
||||
<source><![CDATA[
|
||||
search.desktop=true
|
||||
]]></source>
|
||||
<p>Start the Fess, after you set up above. How to use Basic remains especially.</p>
|
||||
</section>
|
||||
<section name='Usage precautions'>
|
||||
<ul>
|
||||
<li>Please Fess inaccessible from the outside, such as (for example, 8080 port does not release).</li>
|
||||
<li>because false Java.awt.headless image size conversion for mobile devices is not available.</li>
|
||||
</ul>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,28 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>File size you want to crawl settings</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='File size settings'>
|
||||
<p>You can specify the file size limit crawl of Fess. In the default HTML file is 2.5 MB, otherwise handles up to 10 m bytes. Edit the webapps/fess/WEB-INF/classes/s2robot_contentlength.dicon if you want to change the file size handling. Standard s2robot_contentlength.dicon is as follows.</p>
|
||||
<source><![CDATA[
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE components PUBLIC "-//SEASAR//DTD S2Container 2.4//EN"
|
||||
"http://www.seasar.org/dtd/components24.dtd">
|
||||
<components>
|
||||
<component name="contentLengthHelper" class="org.seasar.robot.helper.ContentLengthHelper" instance="singleton" >
|
||||
<property name="defaultMaxLength">10485760L</property><!-- 10M -->
|
||||
<initMethod name="addMaxLength">
|
||||
<arg>"text/html"</arg>
|
||||
<arg>2621440L</arg><!-- 2.5M -->
|
||||
</initMethod>
|
||||
</component>
|
||||
</components>
|
||||
]]></source>
|
||||
<p>Change the value of defaultMaxLength if you want to change the default value. Dealing with file size can be specified for each content type. Describes the maximum file size to handle text/HTML and HTML files.</p>
|
||||
<p>Note the amount of heap memory to use when changing the maximum allowed file size handling. About how to set up<a href='memory-config.html'>Memory-related</a>Please see the.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
|
@ -1,13 +0,0 @@
|
|||
<?xml version='1.0' encoding='UTF-8'?>
|
||||
<document>
|
||||
<properties>
|
||||
<title>Index backup and restore</title>
|
||||
<author>Shinsuke Sugaya</author>
|
||||
</properties>
|
||||
<body>
|
||||
<section name='Index data backup and restore'>
|
||||
<p>The index data is managed by Solr. Backup from the Administration screen of the Fess, and cases will be in the size and number of Gigabit can not index data.</p>
|
||||
<p>If you need to index data backup stopped the Fess from back solr/core1/data directory. Also, index data backed up to restore to undo.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Reference in a new issue