This chapter describes the generic path searching mechanism Kpathsea provides. For information about searching for particular file types (e.g., TeX fonts), see the next chapter.
A search path is a colon-separated list of path elements, which are directory names with some extra frills. A search path can come from (a combination of) many sources; see below. To look up a file `foo' along a path `.:/dir', Kpathsea checks each element of the path in turn: first `./foo', then `/dir/foo', (typically) returning the first one that exists.
The "colon" and "slash" mentioned here aren't necessarily `:' and `/' on non-Unix systems. Kpathsea tries to adapt to other operating systems' conventions.
To check a path element e, Kpathsea first sees if a prebuilt database (see below) applies to e, i.e., if the database is in a directory that is a prefix of e. If so, the path specification is matched against the contents of the database.
If the database does not exist, or does not apply to this path element, contains no matches, the filesystem is searched. Kpathsea constructs the list of directories that correspond to this path element, and then checks in them for the file being searched for. (To help speed future lookups of files in the same directory, the directory in which a file is found is floated to the top of the directory list.)
Each path element is checked in turn: first the database, then the disk. Once a match is found, the searching stops and the result is returned. This avoids possibly-expensive processing of path specifications that are never needed on a particular run.
Although the simplest and most common path element is a directory name, Kpathsea supports additional features in search paths: layers of default values, environment variable names, config file values, users' home directories, and recursive subdirectory searching. Thus, we say that Kpathsea expands a path element, meaning getting rid of all the magic specifications and getting down to the basic directory name or names. This process is described in the sections below. It happens in the same order as the sections.
Exception to the above: If the filename being searched for is absolute or explicitly relative, i.e., starts with `/' or `./' or `../', Kpathsea simply checks if that file exists; it is not looked for along any paths.
A search path can come from many sources. In priority order (meaning Kpathsea will use whichever it finds first):
In any case, once the path specification to use is determined, its evaluation is independent of its source. These sources may also be combined via default expansion. See the next section.
You can see each of these values for a given search path by using the debugging options of Kpathsea or your program. See section 3. Debugging.
As mentioned above, Kpathsea reads runtime configuration files named `texmf.cnf' for search path definitions. The path used to search for them is constructed in the usual way, as described above (except that configuration files cannot be used to define the path, naturally; also, an `ls-R' database is not used to search for them, for technical reasons).
The environment variable used is `TEXMFCNF'.
Kpathsea reads all `texmf.cnf' files in the search path, not just the first one found; it uses the first definition of each variable encountered. Thus, with the (default) search path of `.:$TEXMF', values from `./texmf.cnf' override those from `$TEXMF/texmf.cnf'.
Here is the format for `texmf.cnf' files:
variable [. progname] [=] valuewhere the `=' and surrounding whitespace is optional.
argv[0]
) progname. This allows (for example)
different flavors of TeX to have different search paths.
sed
and other processing done on `texmf.cnf' at
build time.)
make
, unlike most
everything else).
Here is the fragment from the distributed file illustrating most of these points:
% TeX input files -- i.e., anything to be found by \input or \openin [...] latex209_inputs = .:$TEXMF/tex/latex209//:$TEXMF/tex// latex2e_inputs = .:$TEXMF/tex/latex2e//:$TEXMF/tex// TEXINPUTS = .:$TEXMF/tex// TEXINPUTS.latex209 = $latex209_inputs TEXINPUTS.latex2e = $latex2e_inputs TEXINPUTS.latex = $latex2e_inputs
Although this format has obvious similarities to Bourne shell
scripts--change the comment character to #
, disallow spaces
around the =
, and get rid of the .program
convention, and it could be run through the shell. But there seemed
little advantage to doing this, since all the information would have to
passed back (with echo
's, presumably) to Kpathsea and parsed
there anyway, since the sh
process couldn't affect its parent's
environment.
The implementation of all this is in `kpathsea/cnf.c'.
If the highest-priority search path (in the list in the previous section) contains an extra colon (i.e., leading, trailing, or doubled), Kpathsea inserts the next-highest-priority search path that is set at that point. If that search path has an extra colon, the same happens with the next-highest. (An extra colon in the compile-time default value has unpredictable results, and may cause the program to crash, so installers beware.)
For example, given
setenv TEXINPUTS /home/karl:
and a `TEXINPUTS' value from `texmf.cnf' of
.:$TEXMF//tex
then the final value used for searching will be:
/home/karl:.:$TEXMF//tex
You can trace this by debugging "paths" (see section 3. Debugging).
Minor technical point: Since it would be useless to insert the default value in more than one place, Kpathsea changes only one extra `:' and leaves any others in place (where they will eventually be effectively equivalent to `.', i.e., the current directory). It checks first for a leading `:', then a trailing `:', then a doubled `:'.
`$foo' or `${foo}' in a path element is replaced by (1) the value of an environment variable `foo' (if it is set); (2) the value of `foo' from `texmf.cnf' (if any such exists); (3) the empty string.
If the character after the `$' is alphanumeric or `_', the variable name consists of all consecutive such characters. If the character after the `$' is a `{', the variable name consists of everything up to the next `}' (braces are not balanced!). Otherwise, Kpathsea gives a warning and ignores the `$' and its following character.
Remember to quote the `$''s and braces as necessary for your shell.
Shell variable values cannot be seen by Kpathsea.
For example, given
setenv TEXMF /home/tex setenv TEXINPUTS .:$TEXMF:${TEXMF}new
the final `TEXINPUTS' path is the three directories:
.:/home/tex:/home/texnew
You can trace this by debugging "paths" (see section 3. Debugging).
A leading `~' or `~user' in a path element is replaced by the current or user's home directory, respectively.
If user is invalid, or the home directory cannot be determined, Kpathsea uses `.' instead.
For example,
setenv TEXINPUTS ~/mymacros:
will prepend a directory `mymacros' in your home directory to the default path.
A `//' in a path element following a directory d is replaced by all subdirectories of d: first those subdirectories directly under d, then the subsubdirectories under those, and so on. At each level, the order in which the directories are searched is unspecified. (It's "directory order", and definitely not alphabetical.)
If you specify any filename components after the `//', only subdirectories which contain those components are included. For example, `/a//b' would expand into directories `/a/1/b', `/a/2/b', `/a/1/1/b', and so on, but not `/a/b/c' or `/a/1'.
I should mention one related implementation trick, which I stole from GNU find. Matthew Farwell `<dylan@ibmpcug.co.uk>' suggested it, and David MacKenzie `<djm@gnu.ai.mit.edu>' implemented it (as far as I know).
The trick is that in every real Unix implementation (as opposed to the
POSIX specification), a directory which contains no subdirectories will
have exactly two links (namely, one for `.' and one for `..').
That is to say, the st_nlink
field in the `stat' structure
will be two. Thus, we don't have to stat everything in the bottom-level
(leaf) directories--we can just check st_nlink
, notice it's two,
and do no more work.
But if you have a directory that contains one subdirectory and
five hundred files, st_nlink
will be 3, and Kpathsea has to stat
every one of those 501 entries. Therein lies slowness.
You can disable the trick by undefining UNIX_ST_LINK
in
`kpathsea/config.h'. (It is undefined by default except under Unix.)
Unfortunately, in some cases files in leaf directories are
stat
'd: if the path specification is, say,
`$TEXMF/fonts//pk//', then files in a subdirectory
`.../pk', even if it is a leaf, are checked. The reason cannot
be explained without reference to the implementation, so read
`kpathsea/elt-dirs.c' (search for `may descend') if you are
curious. (And if you can find a way to solve the problem, please
let me know.)
ls-R
)Kpathsea goes to some lengths to minimize disk accesses for searches (see section 4.6 Subdirectory expansion). Nevertheless, at installations with enough directories, doing a linear search of each possible directory for a given file can take an excessively long time ("excessive" depending on the speed of the disk, whether it's NFS-mounted, how patient you are, etc.). In practice, the union of font directories from the Dvips(k) and Dviljk distributions is large enough for searching to be noticeably slow on typical machines these days.
Therefore, Kpathsea can use an externally-built "database" that maps files to directories, thus avoiding the need to exhaustively search the disk. By fiat, you must name the file `ls-R', and put it at the root of the TeX installation hierarchy (`$TEXMF' by default). Kpathsea does variable expansion on the `$TEXMF', naturally, so you can use different `ls-R''s for different trees, if you are testing new ones. However, one and only one `ls-R' is read; it is not searched for along any paths.
You can build `ls-R' with the command
ls -R /your/root/dir >ls-R
if your ls
produces the right output
format (see the section below). GNU ls
, for example, outputs in
this format. It is probably best to do this via cron
, so changes
in the installed files will be automatically reflected (albeit with some
delay) in the database.
If your system uses symbolic links, the command ls -LR
will be
more reliable than plain ls -R
. The former follows the symbolic
links to the real files, which is what Kpathsea needs.
Kpathsea warns you if it finds an `ls-R' file, but the file does
not contain any usable entries. The usual culprit is using just ls
-R
to generate the `ls-R' file instead of ls -R
/your/dir
. Kpathsea looks for lines starting with `/', to
improve reliability with unusual filenames (specifically, those ending
with a `:').
Because the database may be out-of-date for a particular run (e.g., if a
font was just built with MakeTeXPK
), if a file is not found in
the database, by default Kpathsea goes ahead and searches the disk. If a
particular path element begins with `!!', however, only the
database will be searched for that element, never the disk. If the
database does not exist, nothing will be searched. Because this can
greatly surprise users ("I see the font `foo.tfm' when I do an
ls
; why can't Dvips find it?"), I do not recommend using this
feature.
The "database" read by Kpathsea is a line-oriented file of plain
text. The format is that generated by GNU (and perhaps other) ls
programs given the `-R' option, as follows.
For example, here's the first few lines of `ls-R' on my system:
bibtex dvips fonts ini ls-R mf tex /usr/local/lib/texmf/bibtex: bib bst doc /usr/local/lib/texmf/bibtex/bib: asi.bib bibshare btxdoc.bib
On my system, `ls-R' is about 30K bytes.
Go to the first, previous, next, last section, table of contents.