Indexing is always performed by the
recollindex program, which can be started
either from the command line or from the
menu in the recoll GUI program. When started
from the GUI, the indexing will run on the same configuration
recoll was started on. When started from the
command line, recollindex will use the
RECOLL_CONFDIR
variable or accept a
-c
confdir
option
to specify a non-default configuration directory.
If the recoll program finds no index when it starts, it will automatically start indexing (except if canceled).
The recollindex indexing process can be interrupted by sending an interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the process exits, because it needs to properly flush and close the index. This can also be done from the recoll GUI → menu entry.
After such an interruption, the index will be somewhat inconsistent because some operations which are normally performed at the end of the indexing pass will have been skipped (for example, the stemming and spelling databases will be inexistant or out of date). You just need to restart indexing at a later time to restore consistency. The indexing will restart at the interruption point (the full file tree will be traversed, but files that were indexed up to the interruption and for which the index is still up to date will not need to be reindexed).
recollindex has a number of other options which are described in its man page. Only a few will be described here.
Option -z
will reset the index when
starting. This is almost the same as destroying the index
files (the nuance is that the Xapian format version will not
be changed).
Option -Z
will force the update of all
documents without resetting the index first. This will not
have the "clean start" aspect of -z
, but
the advantage is that the index will remain available for
querying while it is rebuilt, which can be a significant
advantage if it is very big (some installations need days
for a full index rebuild).
Of special interest also, maybe, are
the -i
and
-f
options. -i
allows
indexing an explicit list of files (given as command line
parameters or read on stdin
).
-f
tells
recollindex to ignore file selection
parameters from the configuration. Together, these options allow
building a custom file selection process for some area of the
file system, by adding the top directory to the
skippedPaths
list and using an appropriate
file selection method to build the file list to be fed to
recollindex -if
.
Trivial example:
find . -name indexable.txt -print | recollindex -if
recollindex -i
will
not descend into subdirectories specified as parameters,
but just add them as index entries. It is
up to the external file selection method to build the complete
file list.
The most common way to set up indexing is to have a cron
task execute it every night. For example the following
crontab
entry would do it every day at
3:30AM (supposing recollindex is in your
PATH):
30 3 * * * recollindex > /some/tmp/dir/recolltrace 2>&1
Or, using anacron:
1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
As of version 1.17 the Recoll GUI has dialogs to manage
crontab
entries for
recollindex. You can reach them from the
→
menu. They only
work with the good old cron, and do not give
access to all features of cron scheduling.
The usual command to edit your
crontab
is crontab
-e
(which will usually start the
vi editor to edit the file). You may have
more sophisticated tools available on your system.
Please be aware that there may be differences between your usual interactive command line environment and the one seen by crontab commands. Especially the PATH variable may be of concern. Please check the crontab manual pages about possible issues.