Table of Contents
waisindex - Indexes files
waisindex [ -d index_filename ] [ -a ] [ -r ]
[ -mem mbytes ] [ -register ] [ -export ] [ -e [ file ] ] [ -l log_level ] [ -pos | -nopos ] [ -nopairs | -pairs ]
[ -nocat ] [ -T type ] [ -t type ] [ -contents | -nocontents ] filename filename ...
creates an index of the words in files so that they can be searched quickly
(see waissearch). The index takes about as much disk space as the original
text. It also creates a new source structure named index_filename.src if
- -d index_filename
- This is the base filename for the index
files. Therefore if /usr/local/foo is specified, then the index files will
be called /usr/local/foo.dct etc.
The index should be stored on the local file system of the machine running
waisindex. It works over NFS, but it is much slower.
- Append this index
to an existing one. Useful for incremental additions or updates. This will
only add onto an index, so that if a file has changed, it will get reindexed,
but the old entries will not be purged. Therefore, to save space, it is
a good idea to reindex the whole set of files periodically.
- How much main memory to use during indexing. This
variable will have a
- large effect on how fast indexing is done.
- Register this database with the directory of servers. You are encouraged
to register databases, but only ones that will be consistently running.
The directory of servers is available to anyone that is on the internet
or can phone in.
- This causes the resulting source description file
to include the host-name and tcp-port for use by the clients. Otherwise the
file contains no connection information, and is expected to be used only
for local searches.
- -e [ filename ]
- Redirect error output to pathname, if
supplied, or to /dev/null. Error output defaults to stderr, unless -s is
selected, in which case it defaults to /dev/null.
- -l log_level
- set logging
level. Currently only levels 0, 1, 5 and 10 are meaningful: Level 0 means
log nothing (silent). Level 1 logs only errors and warnings (messages of
HIGH priority), level 5 logs messages of MEDIUM priority (like indexing
filename info). Level 10 logs everything.
- -pos (-nopos)
- Include (don’t include
- default) word position information in the index. This will increase the
index size, but will allow search engines to do proximity.
- -nopairs (-pairs)
- Don’t build (build - the default) word pairs from consecutive capitalized
- Inhibits the creation of a catalog. This is useful for databases
with a large number of documents, as the catalog contains 3 lines per document.
- -contents (-nocontents)
- Include (exclude) the contents of the file from the
index. The filename and header will still be indexed. Default is type depedant.
- -T type
- Sets the TYPE of the document to "type".
- -t type
- This is the format
of files that are handled by waisindex. It is easy to parse a different
format, but that has to be done by changing the source (ircfiles.c). To
find out the list of currently known types, execute the waisindex command
with no arguments and it will list them.
- filename filename...
- These are the
files that will be indexed according to the arguments above. To insure the
files are registered in the filename table correctly, it is advised that
these be full paths (beginning with a /). If the database is to be used
from a machine other than the machine on which the index is created, this
should be a machine-independant path.
Wide Area Information Servers Concepts
by Brewster Kahle.
The diagnostics produced by the waisindex
are meant to be self-explanatory.
It temporarily takes twice the space
it needs for an index.
Due to some compile time constants the document
table is limited to 16 Megabytes. This limits the indexer to databases
with headlines that add up to less than 16 megabytes (since thats the principal
component of the table). This is typically a problem for database types
where a record is essentially a headline (one_line, archie).
See the note
in ir/README in the wais distribution for more detail.
Table of Contents