Indexing is not a very fast process, and the speed may range from 30Kb to 250Kb per second, depending on index size and computer power. The indexer shouldn't be started too often, and frequency of starts depends on web-site update frequency. For static sites one execution of the indexer will do.
During indexing process three files are created:
The indexer can also create statistics file - stats.log, which can be processed right after having the server indexed to store information in database.
Two indexing modes a available:
To start the Indexer it is necessary to run indexer(.exe) with the following options:
Example for Windows
C:\indexer.exe localhostor
C:\indexer.exe --config=D:\www\search.conf disk
Example for Unix/Linux
./indexer name_of_taskor
./indexer.exe --config=/home/www/search.conf disk
All indexer settings are stored in 'search.conf' file. The file has the following structure:
[Job name_of_task] [Index] Parameter1 Value1 Parameter2 Value2 Parameter3 Value3 [Index] Parameter1 Value1 Parameter2 Value2 Parameter3 Value3
For each action parameters and their values are set, one on a line. Parameter and value are separated by spaces or tabs.
You may use single-line commentaries in the configuration file. Each commentary starts with symbol "#".
URL url
Address starting with 'http://...' in HTTP-mode, or local path in local drive mode.
Example:
For HTTP: URL http://www.novgorod.ru/frisbee/ For disk (Windows): URL c:/pub/home/frisbee/ For disk (Unix): URL /pub/home/frisbee/
Extensions ext1,ext2,ext3
Sets a list of extensions of files to be indexed. Can be used in local drive mode only, and is ignored in HTTP indexing mode. Extensions are separated by "," (comma).
Example:
Extensions htm,html,shtml,shtm
Type typ
Sets type of the search index:
Default value - Normal
Example:
Type Strict
Path path
Spesifies working directory. Index files and a log-file are saved to this directory.
Example:
Path c:\www\novgorodor
Path /home/www/novgorod
CharSet cset
Sets the way character coding of the files to be indexed will be identified. The values may be:
Example:
CharSet ByHTTPHeader
MaxFiles num
Sets maximum number of files to be indexed, 10000 by default. Be careful when selecting value, because many servers contain huge numbers of links, for example http://news.novgorod.ru/
Example:
MaxFiles 50
Statistic stat
Sets the way reports are saved. Reports are generated at the end of action Index and are saved to file stats.log. Available options:
Statistics are saved to file stats.log.
Example:
Statistic Append
Exclude excl1,excl2,excl3
Sets a list of words to be excluded. Addresses containing at least one of excluded words are not included in indexing queue. Words are separated by "," (comma)
Example:
Exclude editpost.php?,reply.php?,admin/
AddOption opt
Sets indexing method. Can be used in HTTP indexing mode only. The following values are available:
Example:
AddOption SubPages
StopWordsFile file
Задает имя файла, в котором храняться стоп-слова.
StopWordsFile stop.txt
Sets language. If this parameter is specified a field 'Accept-Language' is included in HTTP header. This variable may effect document content on some sites.
Example:
Language ru
AFrom pathSets substring which will be replaced in URL by string specified in parameter ATo.
Example:
AFrom /home/dir/mysite/ ATo http://search.codenet.ru/
ATo urlSets substring which will replace AFrom in URL. Used together with AFrom.
Example:
AFrom http://127.0.0.1/ ATo http://www.codenet.ru/
or
AFrom c:/documents/www/www.codenet.ru/ ATo http://www.codenet.ru/
StartWord word
Sets starting word. Page description will be composed of words following the starting one. Hence, it is possible to exclude menus and the like from description. The starting word is obligatory.
Example:
StartWord about
MetaDescription yesno
Sets page description method. Description can be displayed in search results with help of the special symbol %E. Available values are "Yes" or "No". Default is 'No'. If 'Yes' is used, the system attempts to get description from '<META name="description...' tag. If tag can not be found or the value is 'No', description is composed of the first words in the document (see. startword)
Example:
MetaDescription Yes
MetaRobots yesno
If the parameter has value "No", the tag '<META name="robots"...' is ignored, otherwise the tag is analysed for presence of NOINDEX, NOFOLLOW, NONE. More details can be found in section Use of "Robots" META-tags. Default value is "Yes"
Example:
MetaRobots No
UseRobotsTxt <yesno>
If set to "Yes", indexing rules are taken from file 'robots.txt', stored in web-server root directory. Default value is "No". More information about working with 'robots.txt' is available in section robots.txt - Exclusions Standard for Robots. Robot's name is "CNSearch".
Example:
UseRobotsTxt yes
Starting with version 0.91 an option of working through proxy-server became available. 4 new directives were added ProxyServer, ProxyPort, ProxyLogin, and ProxyPassword
ProxyServer server
Specifies proxy-server. The indexer connects directly by default. Works with ProxyPort.
Example:
ProxyServer proxy.domain.ru
ProxyPort port
Sets proxy port. Works with ProxyServer.
Example:
ProxyPort 8080
ProxyLogin login
Sets proxy login. Used only in case the proxy server requires authorization. Works with ProxyPassword.
Example:
ProxyLogin alex
ProxyPassword password
Sets proxy password. Used only in case the proxy server requires authorization. Works with ProxyLogin.
Example:
ProxyPassword qwerty
To distinguish between morphological forms you need to create file 'lang.cns' and save it in the directory, where index files are stored (or will be created). We do not include file 'lang.cns' in this distribution, because of its size - 16 Mb.
If file 'lang.cns' is not found, the search and indexing process will be performed without taking morphology into account.
We have developed a special utility allowing building 'lang.cns' from ispell dictionaries. You may find necessary dictionaries at http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html.
ispell dictionary comprises of two files - a list of words (lang.dict) and a set of word formation rules (lang.aff). These files may have some other names in downloaded archives. You will have to rename them to 'lang.dict' and 'lang.aff'.
ATTENTION!!! If you have built the index taking into consideration morphology, you will have to search also taking into consideration morpholgy and using the same dictionary.
Starting with the version 1.3 CNSearch Pro can avoid indexing frequently used words (articles, pronouns, prepositions) to increase search speed and reduce volume of information stored in the search index. These words are called 'stop-words'.
Stop-words are defined at the indexing stage. It is done with the help of the special file containing one stop-word per line. For example:
- file: stopwords.txt --------------- a an is the this -------------------------------------
Name of the file containing stop-words is indicated in the Indexer configuration file in the option StopWordsFile, for example:
StopWordsFile stopwords.txt
For you visitors to know which words from their search phrase have been ignored, they may be listed with the help of the special symbol "%P" as shown in the picture:
Word combination "Stop Words" may be changed for some other one (for example, when translating to the foreign language) by changing parameter StopWords in the Frontend configuration file.