\input texinfo
@comment %**start of header
@setfilename id.info
@settitle ID database utilities
@comment %**end of header

@include version.texi

@c Define new indices for filenames, commands and options.
@defcodeindex fl
@defcodeindex cm
@defcodeindex op

@c Put everything in one index (arbitrarily chosen to be the concept index).
@syncodeindex fl cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex op cp
@syncodeindex pg cp
@syncodeindex vr cp

@ifinfo
@set Francois Franc,ois
@end ifinfo
@tex
@set Francois Fran\noexpand\ptexc cois
@end tex

@ifinfo
@format
START-INFO-DIR-ENTRY
* ID database: (id).		Identifier database utilities.
* aid: (id)aid invocation::			Matching strings.
* eid: (id)eid invocation::			Invoking an editor on matches.
* fid: (id)fid invocation::			Listing a file's identifiers.
* gid: (id)gid invocation::			Listing all matching lines.
* idx: (id)idx invocation::			Testing mkid scanners.
* iid: (id)iid invocation::			Interactive complex queries.
* lid: (id)lid invocation::			Matching patterns.
* mkid: (id)mkid invocation::			Creating an ID database.
* pid: (id)pid invocation::			Looking up filenames.
END-INFO-DIR-ENTRY
@end format
@end ifinfo

@ifinfo
This file documents the @code{mkid} identifier database utilities.

Copyright (C) 1991, 1995 Tom Horsley.

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

@ignore
Permission is granted to process this file through TeX and print the
results, provided the printed document carries copying permission
notice identical to this one except for the removal of this paragraph
(this paragraph not being relevant to the printed manual).

@end ignore
Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation.
@end ifinfo

@titlepage
@title ID database utilities
@subtitle Programs for simple, fast, high-capacity cross-referencing 
@subtitle for version @value{VERSION}
@author Tom Horsley

@page
@vskip 0pt plus 1filll
Copyright @copyright{} 1991, 1995 Tom Horsley.

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation.
@end titlepage


@ifinfo
@node Top
@top ID database utilities

This manual documents version @value{VERSION} of the ID database
utilities.

@menu
* Introduction::                Overview of the tools, and authors.
* mkid invocation::             Creating an ID database.
* Common query arguments::      Common lookup options and search patterns.
* gid invocation::              Listing all matching lines.
* Looking up identifiers::      lid, aid, eid, and fid.
* pid invocation::              Looking up filenames.
* iid invocation::              Interactive and complex queries.
* Index::                       General index.
@end menu
@end ifinfo


@node Introduction
@chapter Introduction

@cindex overview
@cindex introduction

@cindex ID database, definition of
An @dfn{ID database} is a binary file containing a list of filenames, a
list of identifiers, and a matrix indicating which identifiers appear in
which files.  With this database and some tools to manipulate it
(described in this manual), a host of tasks become simpler and faster.
For example, you can list all files containing a particular
@code{#include} throughout a huge source hierarchy, search for all the
memos containing references to a project, or automatically invoke an
editor on all files containing references to some function.  Anyone with
a large software project to maintain, or a large set of text files to
organize, can benefit from an ID database.

Although the ID utilities are most commonly used with identifiers,
numeric constants are also stored in the database, and can be searched
for in the same way (independent of radix, if desired).

There are a number of programs in the ID family:

@table @code

@item mkid
scans files for identifiers and numeric constants and builds the ID
database file.

@item gid
lists all lines that match given patterns.

@item lid
lists the filenames containing identifiers that match given patterns.

@item aid
lists the filenames containing identifiers that contain given strings,
independent of case.

@item eid
invokes an editor on each file containing identifiers that match given
patterns.

@item fid
lists all identifiers recorded in the database for given files, or
identifiers common to two files.

@item pid
matches the filenames in the database, rather than the identifiers.

@item iid
interactively supports more complex queries, such as intersection and
union.

@item idx
helps with testing of new @code{mkid} scanners.

@end table

@cindex bugs, reporting
Please report bugs to @samp{gkm@@magilla.cichlid.com}.  Remember to
include the version number, machine architecture, input files, and any
other information needed to reproduce the bug: your input, what you
expected, what you got, and why it is wrong.  Diffs are welcome, but
please include a description of the problem as well, since this is
sometimes difficult to infer.  @xref{Bugs, , , gcc, GNU CC}.

@menu
* Past and future::       How the ID tools came about, and where they're going.
@end menu


@node Past and future
@section Past and future

@cindex history

@pindex look @r{and @code{mkid} 1}
@cindex McGary, Greg
Greg McGary conceived of the ideas behind mkid when he began hacking the
Unix kernel in 1984.  He needed a navigation tool to help him find his
way the expansive, unfamiliar landscape.  The first @code{mkid}-like
tools were shell scripts, and produced an ASCII database that looks much
like the output of @code{lid} with no arguments.  It took over an hour
on a VAX 11/750 to build a database for a 4.1BSD-ish kernel.  Lookups
were done with the system utility @code{look}, modified to handle very
long lines.

In 1986, Greg rewrote @code{mkid}, @code{lid}, @code{fid} and @code{idx}
in C to improve performance.  Database-build times were shortened by an
order of magnitude.  The @code{mkid} tools were first posted to
@samp{comp.sources.unix} in September 1987.

@cindex Horsley, Tom
@cindex Scofield, Doug
@cindex Leonard, Bill
@cindex Berry, Karl
Over the next few years, several versions diverged from the original
source.  Tom Horsley at Harris Computer Systems Division stepped forward
to take over maintenance and integrated some of the fixes from divergent
versions.  He also wrote the @code{iid} program.  A first release of
@code{mkid} @w{version 2} was posted to @file{alt.sources} near the end
of 1990.  At that time, Tom wrote this Texinfo manual with the
encouragement the net community.  (Tom especially thanks Doug Scofield
and Bill Leonard whom he dragooned into helping poorfraed and
edit---they found several problems in the initial version.)  Karl Berry
revamped the manual for Texinfo style, indexing, and organization in
1995.

@pindex cscope
@pindex grep
@cindex future
In January 1995, Greg McGary reemerged as the primary maintaner and
launched development of @code{mkid} version 3, whose primary new feature
is an efficient algorithm for building databases that is linear in both
time and space over the size of the input text.  (The old algorithm was
quadratic in space and therefore choked on very large source trees.)
The code is released under the GNU Public License, and might become a
part of the GNU system.  @code{mkid} 3 is an interim release, since
several significant enhancements are still in the works: an optional
coupling with GNU @code{grep}, so that @code{grep} can use an ID
database for hints; a @code{cscope} work-alike query interface;
incremental update of the ID database; and an automatic file-tree walker
so you need not explicitly supply every filename argument to the
@code{mkid} program.


@node mkid invocation
@chapter @code{mkid}: Creating ID databases

@pindex mkid
@cindex creating databases
@cindex databases, creating

@pindex cron
The @code{mkid} program builds an ID database.  To do this, it must scan
each file you tell it to include in the database.  This takes some time,
but once the work is done the query programs run very rapidly.  (You can
run @code{mkid} as a @code{cron} job to regularly update your
databases.)

The @code{mkid} program knows how to extract identifiers from various
types of files.  For example, it can recognize and skip over comments
and string constants in a C program.

@cindex numbers, in databases
Identifiers are not the only thing included in the database.  Numbers
are also recognized and included in the database indexed by their binary
value.  This feature allows you to find uses of constants without regard
to the radix used to specify them, since the same number can frequently
be written in many different ways (for instance, @samp{47}, @samp{0x2f},
@samp{057} in C).

All the places in this document which mention identifiers should really
mention both identifiers and numbers, but that gets fairly clumsy after
a while, so you just need to keep in mind that numbers are included in
the database as well as identifiers.

@cindex ID file format
@cindex architecture-independence
@cindex sharing ID files
The ID files that @code{mkid} creates are architecture- and
byte-order-independent; you can share them at will across systems.

@menu
* mkid options::                Command-line options to mkid.
* Scanners::                    Built-in and defining your own.
* mkid examples::               Examples of mkid usage.
@end menu


@node mkid options
@section @code{mkid} options

@cindex options for @code{mkid}
@pindex mkid @r{options}

By default, @code{mkid} scans the files you specify and writes the
database to a file named @file{ID} in the current directory.

@example
mkid [-v] [-S@var{scanarg}] [-a@var{argfile}] [-] [-f@var{idfile}] @c
@var{files}@dots{}
@end example

The program accepts the following options.

@table @samp

@item -v
@opindex -v
@cindex statistics
Verbose.  @code{mkid} tells you as it scans each file and indicates
which scanner it is using.  It also summarizes some statistics about the
database at the end.

@item -S@var{scanarg}
@opindex -S@var{scanarg}
Specify options regarding @code{mkid}'s scanners.  @xref{Scanner option
formats}.

@item -a@var{argfile}
@opindex -a@var{argfile}
Read additional command line arguments from @var{argfile}.  This is
typically used to specify lists of filenames longer than will fit on a
command line; some systems have severe limitations on the total length
of a command line.

@item -
@opindex -
Read additional command line arguments from standard input.

@item -f@var{idfile}
Write the database to the file @var{idfile}, instead of @file{ID}.  The
database stores filenames relative to the directory containing the
database, so if you move the database to a different directory after
creating it, you may have trouble finding files.

@c @item -u
@c @opindex -u
@c The @code{-u} option updates an existing database by rescanning any
@c files that have changed since the database was written.  Unfortunately
@c you cannot incrementally add new files to a database.
@c Greg is reimplementing this ...

@end table

The remaining arguments @var{files} are the files to be scanned and
included in the database.  If no files are given at all (either on
command line or via @samp{-a} or @samp{-}), @code{mkid} does nothing.


@node Scanners
@section Scanners

@cindex scanners

To determine which identifiers to extract from a file and store in the
database, @code{mkid} calls a @dfn{scanner}; we say a scanner
@dfn{recognizes} a particular language.  Scanners for several languages
are built-in to @code{mkid}; you can add your own scanners as well, as
explained in the sections below.

@cindex suffixes of filenames
@code{mkid} determines which scanner to use for a particular file by
looking at the suffix of the filename.  This @dfn{suffix} is everything
after and including the last @samp{.} in a filename; for example, the
suffix of @file{foo.c} is @file{.c}.  @code{mkid} has a built-in list of
bindings from some suffixes to corresponding scanners; for example,
@file{.c} files are (not surprisingly) scanned by the predefined C
language scanner.

@findex .default @r{scanner}
If @code{mkid} cannot determine what scanner to use for a particular
file, either because the file has no suffix (e.g., @file{foo}) or
because @code{mkid} has no binding for the file's suffix (e.g.,
@file{foo.bar}), it uses the scanner bound to the @samp{.default}
suffix.  By default, this is the plain text scanner (@pxref{Plain text
scanner}), but you can change this with the @samp{-S} option, as
explained below.

@menu
* Scanner option formats::      Overview of the -S option.
* Predefined scanners::         The C, plain text, and assembler scanners.
* Defining new scanners::       Either in source code or at runtime with -S.
* idx invocation::              Testing mkid scanners.
@end menu


@node Scanner option formats
@subsection Scanner option formats

@cindex scanner options
@opindex -S @r{scanner option}

With the @samp{-S} option, you can change which language scanner to use
for which files, give language-specific options, and get some limited
online help about scanner options.

Here are the different forms of the @samp{-S} option:

@table @samp

@item -S.@var{suffix}=@var{scanner}
@opindex -S.
Use @var{scanner} for a file with the given @samp{.@var{suffix}}.  For
example, @samp{-S.yacc=c} tells @code{mkid} to use the @samp{c} language
scanner for all files ending in @samp{.yacc}.

@item -S.@var{suffix}=?
Display which scanner is used for the given @samp{.@var{suffix}}.

@item -S?=@var{scanner}
@opindex -S?
Display which suffixes @var{scanner} is used for.

@item -S?=?
Display the scanner binding for every known suffix.

@item -S@var{scanner}+@var{arg}
@itemx -S@var{scanner}-@var{arg}
Each scanner accepts certain scanner-dependent arguments.  These options
all have one of these forms.  @xref{Predefined scanners}.

@item -S@var{scanner}?
Display the scanner-specific options accepted by @var{scanner}.

@item -S@var{new-scanner}/@var{old-scanner}/@var{filter-command}
Define @var{new-scanner} in terms of @var{old-scanner} and
@var{filter-command}.  @xref{Defining scanners with options}.

@end table


@node Predefined scanners
@subsection Predefined scanners

@cindex predefined scanners
@cindex scanners, predefined

@code{mkid} has built-in scanners for several types of languages; you
can get the list by running @code{mkid -S?=?}.
The supported languages are documented
below@footnote{This is not strictly true: @samp{vhil} is a supported
language, but it is an obsolete and arcane dialect of C and should be
ignored.}.

@menu
* C scanner::                   For the C programming language.
* Plain text scanner::          For documents or other non-source code.
* Assembler scanner::           For assembly language.
@end menu


@node C scanner
@subsubsection C scanner

@cindex C scanner, predefined
@flindex .[chly] @r{files, scanning}

The C scanner is the most commonly used.  Files with the usual @file{.c}
and @file{.h} suffixes, and the @file{.y} (yacc) and @file{.l} (lex)
suffixes, are processed with this scanner (by default).

Scanner-specific options:

@table @samp

@item -Sc-s@var{character}
@kindex $ @r{in identifiers}
@opindex -Sc-s
Allow the specified @var{character} in identifiers. For example, if you
use @samp{$} in identifiers, you'll want to use @samp{-Sc-s$}.

@item -Sc+u
@opindex -Sc+u
Strip leading underscores from identifiers. You might to do this in
peculiar circumstances, such as trying to parse the output from
@code{nm} or some other system utility.

@item -Sc-u
@opindex -Sc-u
Don't strip leading underscores from identifiers; this is the default.

@end table


@node Plain text scanner
@subsubsection Plain text scanner

@cindex plain text scanner

The plain text scanner is intended for scanning most non-source-code
files.  This is typically the scanner used when adding custom scanners
via @samp{-S} (@pxref{Defining scanners with options}).

@c @code{mkid} predefines a troff scanner in terms of the plain text
@c scanner and
@c the @code{deroff} utility. 
@c A compressed man page
@c scanner runs @code{pcat} piped into @code{col -b}, and a @TeX{} scanner
@c runs @code{detex}.

Scanner-specific options:

@table @samp

@item -Stext+a@var{character}
@opindex -Stext+a
Include @var{character} in identifiers.  By default, letters (a--z and
A--Z) and underscore are included.

@item -Stext-a@var{character}
@opindex -Stext-a
Exclude @var{character} from identifiers.

@item -Stext+s@var{character}
@opindex -Stext+s
@cindex squeezing characters from identifiers
Squeeze @var{character} from identifiers, i.e., do not terminate an
identifier when @var{character} is seen.  By default, the characters
@samp{'}, @samp{-}, and @samp{.} are squeezed out of identifiers.  For
example, the input @samp{fred's} leads to the identifier @samp{freds}.

@item -Stext-s@var{character}
Do not squeeze @var{character}.

@end table


@node Assembler scanner
@subsubsection Assembler scanner

@cindex assembler scanner

Since assembly languages come in several flavors, this scanner has a
number of options:

@table @samp

@item -Sasm-c@var{character}
@opindex -Sasm-c
@cindex comments in assembler
Define @var{character} as starting a comment that extends to the end of
the input line; no default.  In many assemblers this is @samp{;} or
@samp{#}.

@item -Sasm+u
@itemx -Sasm-u
@opindex -Sasm+u
Strip (@samp{+u}) or do not strip (@samp{-u}) leading underscores from
identifiers.  The default is to strip them.

@item -Sasm+a@var{character}
@opindex -Sasm+a
Allow @var{character} in identifiers.

@item -Sasm-a@var{character}
Allow @var{character} in identifiers, but if an identifier contains
@var{character}, ignore it. This is useful to ignore temporary labels,
which can be generated in great profusion; these often contain @samp{.}
or @samp{@@}.

@item -Sasm+p
@itemx -Sasm-p
@opindex -Sasm+p
Recognize (@samp{+p}) or do not recognize (@samp{-p}) C preprocessor
directives in assembler source. The default is to recognize them.

@item -Sasm+C
@itemx -Sasm-C
@opindex -Sasm+C
Skip over (@samp{+C}) or do not skip over (@samp{-C}) C style comments
in assembler source.  The default is to skip them.

@end table


@node Defining new scanners
@subsection Defining new scanners

@cindex scanners, adding new

You can add new scanners to @code{mkid} in two ways: modify the source
code and recompile, or at runtime via the @samp{-S} option.  Each has
their advantages and disadvantages, as explained below.

If you create a new scanner that would be of use to others, please
consider sending it back to the maintainer,
@samp{gkm@@magilla.cichlid.com}, for inclusion in future releases of
@code{mkid}.

@menu
* Defining scanners in source code::
* Defining scanners with options::
@end menu


@node Defining scanners in source code
@subsubsection Defining scanners in source code

@flindex scanners.c
@cindex scanners, defining in source code

@vindex languages_0
@vindex suffixes_0
To add a new scanner in source code, you should add a new section to the
file @file{scanners.c}.  Copy one of the existing scanners (most likely
either C or plain text), and modify as necessary.  Also add the new
scanner to the @code{languages_0} and @code{suffixes_0} tables near the
beginning of the file.

This is not a terribly difficult programming task, but it requires
recompiling and installing the new version of @code{mkid}, which may be
inconvenient.

This method leads to scanners which operate much more quickly than ones
that depend on external programmers.  It is also likely the easiest way
to define scanners for new programming languages.


@node Defining scanners with options
@subsubsection Defining scanners with options

@cindex scanners, defining with options

You can use the @samp{-S} option on the command line to define a new
language scanner:

@example
-S@var{new-scanner}/@var{existing-scanner}/@var{filter}
@end example

@noindent
Here, @var{new-scanner} is the name of the new scanner being defined,
@var{existing-scanner} is the name of an existing scanner, and
@var{filter} is a shell command or pipeline.

The new scanner works by passing the input file to @var{filter}, and
then arranging for the result to be passed through
@var{existing-scanner}. Typically, @var{existing-scanner} is @samp{text}.

Somewhere within @var{filter}, the string@samp{%s} should occur.  This
@samp{%s} is replaced by the name of the source file being scanned.

@cindex Texinfo, scanning example of
For example, @code{mkid} has no built-in scanner for Texinfo files (like
this one).  In indexing a Texinfo file, you most likely would want
to ignore the Texinfo @@-commands. Here's one way to specify a new
scanner to do this:

@example
-S/texinfo/text/sed s,@@[a-z]*,,g %s
@end example

This defines a new language scanner (@samp{texinfo}) defined in terms of
a @code{sed} command to strip out Texinfo directives (an @samp{@@}
character followed by letters).  Once the directives are stripped, the
remaining text is run through the plain text scanner.

This is a minimal example; to do a complete job, you would need to
completely delete some lines, such as those beginning with @code{@@end}
or @@node.


@node idx invocation
@subsection @code{idx}: Testing @code{mkid} scanners

@code{idx} prints the identifiers found in the files you specify to
standard output. This is useful in debugging new @code{mkid} scanners
(@pxref{Scanners}). Synopsis:

@example
idx [-S@var{scanarg}] @var{files}@dots{}
@end example

@code{idx} accepts the same @samp{-S} options as @code{mkid}.
@xref{Scanner option formats}.

The name ``idx'' stands for ``ID eXtract''.  The name may change in
future releases, since this is such an infrequently used program.


@node mkid examples
@section @code{mkid} examples

@cindex examples of @code{mkid}

The simplest example of @code{mkid} is something like:

@example
mkid *.[chy]
@end example

This will build an ID database indexing identifiers and numbers in the
all the @file{.c}, @file{.h}, and @file{.y} files in the current
directory.  Because @code{mkid} already knows how to scan files with
those suffixes, no additional options are needed.

@cindex man pages, compressed
@cindex compressed files, building ID from
Here's a more complex example. Suppose you want to build a database
indexing the contents of all the @code{man} pages, and furthur suppose
that your system is using @code{gzip} (@pxref{Top, , , gzip, Gzip}) to
store compressed @code{cat} versions of the @code{man} pages in the
directory @file{/usr/catman}.  The @code{gzip} program creates files
with a @code{.gz} suffix, so you must tell @code{mkid} how to scan
@file{.gz} files.  Here are the commands to do the job:

@example
cd /usr/catman
find . -name \*.gz -print | mkid '-Sman/text/gzip <%s' -S.gz=man -
@end example

@noindent Explanation:

@enumerate

@item
We first @code{cd} to @file{/usr/catman} so the ID database
will store the correct relative filenames.

@item
The @code{find} command prints the names of all @file{.gz} files under
the current directory.  @xref{find invocation, , , sh-utils, GNU shell
utilities}.

@item
This list is piped to @code{mkid}; the @code{-} option (at the end of
the line) tells @code{mkid} to read arguments (in this case, as is
typical, the list of filenames) from standard input.  @xref{mkid options}.

@item
The @samp{-Sman/text/gzip @dots{}} defines a new language @samp{man} in
terms of the @code{gzip} program and @code{mkid}'s existing text
scanner.  @xref{Defining scanners with options}.

@item
The @samp{-S.gz=man} tells @code{mkid} to treat all @file{.gz} files as
this new language @code{man}.  @xref{Scanner option formats}.

@end enumerate

As a further complication, @code{cat} pages typically contain
underlining and backspace sequences, which will confuse @code{mkid}.  To
handle this, the @code{gzip} command becomes a pipeline, like this:

@example
mkid '-Sman/text/gzip <%s | col -b' -S.gz=man -
@end example


@node Common query arguments
@chapter Common query arguments

@cindex common query arguments

Certain options, and regular expression syntax, are shared by the ID
query tools.  So we describe those things in the sections below, instead
of repeating the description for each tool.

@menu
* Query options::               -f -r -c -ew -kg -n -doxa -m -F -u.
* Patterns::                    Regular expression syntax for searches.
* Examples: Query examples.     Some common uses.
@end menu


@node Query options
@section Query options

@cindex query options, common
@cindex common query options

The ID query tools (@emph{not} @code{mkid}) share certain command line
options.  Not all of these options are recognized by all programs, but
if an option is used by more than one program, it is described below.
The description of each program gives the options that program uses.

@table @samp

@item -f@var{idfile}
@opindex -f@var{idfile}
@cindex database name, specifying
@cindex parent directories, searched for ID
Read the database from @var{idfile}, in the current directory or in any
directory above the current directory.  The default database name is
@file{ID}.  Searching parent directories lets you have a single ID
database at the root of a large source tree and then use the query tools
from anywhere within that tree.

@item -r@var{directory}
@opindex -r@var{directory}
Find files relative to @var{directory}, instead of the directory in
which the ID database was found.  This is useful if the ID database was
moved after its creation.

@item -c
@opindex -c
Equivalent to @code{-r`pwd`}, i.e., find files relative to the current
directory, instead of the directory in which the ID database was found.

@item -e
@itemx -w
@opindex -e
@opindex -w
@cindex regular expressions, forcing evaluation as
@cindex strings, forcing evaluation as
@cindex constant strings, forcing evaluation as
@samp{-e} forces pattern arguments to be treated as regular expressions,
and @samp{-w} forces pattern arguments to be treated as constant
strings.  By default, the query tools guess whether a pattern is regular
expressions or constant strings by looking for special characters.
@xref{Patterns}.

@item -k
@itemx -g
@opindex -k
@opindex -g
@cindex brace notation in filename lists
@cindex shell brace notation in filename lists
@samp{-k} suppresses use of shell brace notation in the output.  By
default, the query tools that generate lists of filenames attempt to
compress the lists using the usual shell brace notation, e.g.,
@file{@{foo,bar@}.c} to mean @file{foo.c} and @file{bar.c}.  (This is
useful if you use @code{ksh} or the original (not GNU) @code{sh} and
want to feed the list of names to another command, since those shells do
not support this brace notation; the name of the @code{-k} option comes
from the @code{k} in @code{ksh}).

@samp{-g} turns on use of brace notation; this is only needed if the
query tools were compiled with @samp{-k} as the default behavior.

@item -n
@opindex -n
@cindex suppressing matching identifier
Suppress the matching identifier before each list of filenames that the
query tools output by default. This is useful if you want a list of just
the names to feed to another command.

@item -d
@itemx -o
@itemx -x
@itemx -a
@opindex -d
@opindex -o
@opindex -x
@opindex -a
@cindex radix of numeric matches, specifying
@cindex numeric matches, specifying radix of
These options may be used in any combination to specify the radix of
numeric matches.  @samp{-d} allows matching on decimal numbers,
@samp{-o} on octal numbers, and @samp{-x} on hexadecimal numbers.  The
@code{-a} option is equivalent to specifying all three; this is the
default.  Any combination of these options may be used.

@item -m
@opindex -m
@cindex multiple lines, merging
Merge multiple lines of output into a single line.  If your query
matches more than one identifier, the default is to generate a separate
line of output for each matching identifier.

@itemx -F-
@itemx -F@var{n}
@itemx -F-@var{m}
@itemx -F@var{n}-@var{m}
@opindex -F
@cindex single matches, showing
Show identifiers matching at least @var{n} and at most @var{m} times.
@samp{-F-} is equivalent to @samp{-F1}, i.e., find identifiers that
appear only once in the database.  (This is useful to locate identifiers
that are defined but never used, or used once and never defined.)

@item -u@var{number}
@opindex -u
@cindex conflicting identifiers, finding
List identifiers that conflict in the first @var{number} characters.
This could be in useful porting programs to brain-dead computers that
refuse to support long identifiers, but your best long term option is to
set such computers on fire.

@end table


@node Patterns
@section Patterns

@cindex patterns
@cindex regular expression syntax

@dfn{Patterns}, also called @dfn{regular expressions}, allow you to
match many different identifiers in a single query.

The same regular expression syntax is recognized by all the query tools
that handle regular expressions.  The exact syntax depends on how the ID
tools were compiled, but the following constructs should always be
supported:

@table @samp

@item .
Match any single character.

@item [@var{chars}]
Match any of the characters specified within the brackets.  You can
match any characters @emph{except} the ones in brackets by typing
@samp{^} as the first character.  A range of characters can be specified
using @samp{-}.  For example, @samp{[abc]} and @samp{[a-c]} both match
@samp{a}, @samp{b}, or @samp{c}, and @samp{[^abc]} matches anything
@emph{except} @samp{a}, @samp{b}, or @samp{c}.

@item *
Match the previous construct zero or more times.

@item ^
@itemx $
@samp{^} (@samp{$}) at the beginning (end) of a pattern anchors the
match to the first (last) character of the identifier.

@end table

The query programs use either the @code{regex}/@code{regcmp} or
@code{re_comp}/@code{re_exec} functions, depending on which are
available in the library on your system.  These do not always support
the exact same regular expression syntax, so consult your local
@code{man} pages to find out.


@node Query examples
@section Query examples

@cindex examples, queries
@cindex query examples
Here are some examples of the options described in the previous
sections.

To restrict searches to exact matches, use @samp{^@dots{}$}. For example:

@example
prompt$ gid '^FILE$'
ansi2knr.c:144: @{	FILE *in, *out;
ansi2knr.c:315:     FILE *out;
fid.c:38: FILE *id_FILE;
filenames.c:576: FILE *
@dots{}
@end example

To show identifiers not unique in the first 16 characters:

@example
prompt$ lid -u16
RE_CONTEXT_INDEP_ANCHORS regex.c
RE_CONTEXT_INDEP_OPS regex.c
RE_SYNTAX_POSIX_BASIC regex.c
RE_SYNTAX_POSIX_EXTENDED regex.c
@dots{}
@end example

@cindex numeric searches
Numbers are searched for numerically rather than textually. For example:

@example
prompt$ lid 0xff
0377           @{lid,regex@}.c
0xff           @{bitops,fid,lid,mkid@}.c
255            regex.c
@end example

On the other hand, you can restrict a numeric search to a particular
radix if you want:

@example
laurie$ lid -x 0xff
0xff           @{bitops,fid,lid,mkid@}.c
@end example

Filenames in the output are always adjusted to be correct for the
correct working directory. For example:

@example
prompt$ lid bdevsw
bdevsw         sys/conf.h  cf/conf.c  io/bio.c  os/@{fio,main,prf,sys3@}.c
prompt$ cd io
prompt$ lid bdevsw
bdevsw         ../sys/conf.h  ../cf/conf.c  bio.c  ../os/@{fio,main,prf,sys3@}.c
@end example


@node gid invocation
@chapter @code{gid}: Listing matching lines

Synopsis:

@example
gid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}@dots{}]
@end example

@code{gid} finds the identifiers in the database that match the
specified @var{pattern}s, then searches for all occurrences of those
identifiers, in only the files containing matches.  In a large source
tree, this saves an enormous amount of time (compared to searching every
source file).

With no @var{pattern} arguments, @code{gid} prints every line of every
source file.

The name ``gid'' stands for ``grep for identifiers'', @code{grep} being
the standard utility to search regular files.

@xref{Common query arguments}, for a description of the command-line
options and @var{pattern} arguments.

@code{gid} uses the standard GNU output format for identifying source lines:

@example
@var{filename}:@var{linenum}: @var{text}
@end example

Here is an example:

@example
prompt$ gid FILE
ansi2knr.c:144: @{	FILE *in, *out;
ansi2knr.c:315:     FILE *out;
fid.c:38: FILE *id_FILE;
@dots{}
@end example

@menu
* GNU Emacs gid interface::     Using next-error with gid.
@end menu


@node GNU Emacs gid interface
@section GNU Emacs @code{gid} interface

@cindex Emacs interface to @code{gid}
@flindex gid.el @r{interface to Emacs}

@vindex load-path
The @code{mkid} source distribution comes with a file @file{gid.el},
which defines a GNU Emacs interface to @code{gid}.  To install it, put
@file{gid.el} somewhere that Emacs will find it (i.e., in your
@code{load-path}) and put

@example
(autoload 'gid "gid" nil t)
@end example

@noindent in one of Emacs' initialization files, e.g., @file{~/.emacs}.
You will then be able to use @kbd{M-x gid} to run the command.

@findex gid @r{Emacs function}
The @code{gid} function prompts you with the word around point.  If you
want to search for something else, simply delete the line and type the
pattern of interest.

@flindex *scratch* @r{Emacs buffer}
The function then runs the @code{gid} program in a @samp{*compilation*}
buffer, so the normal @code{next-error} function can be used to visit
all the places the identifier is found (@pxref{Compilation,,, emacs, The
GNU Emacs Manual}).


@node Looking up identifiers
@chapter Looking up identifiers

These commands look up identifiers in the ID database and operate on the
files containing matches.

@menu
* lid invocation::              Matching patterns.
* aid invocation::              Matching strings.
* eid invocation::              Invoking an editor on matches.
* fid invocation::              Listing a file's identifiers.
@end menu


@node lid invocation
@section @code{lid}: Matching patterns

@pindex lid

Synopsis:

@example
lid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c
@var{pattern}@dots{}
@end example

@code{lid} searches the database for identifiers matching the given
@var{pattern} arguments and prints the names of the files that match
each @var{pattern}.  With no @var{pattern}s, @code{lid} lists every
entry in the database.

The name ``lid'' stands for ``lookup identifier''.

@xref{Common query arguments}, for a description of the command-line
options and @var{pattern} arguments.

By default, each line of output consists of an identifier and all the
files containing that identifier.

Here is an example showing a search for a single identifier (omitting
some output to keep lines short):

@example
prompt$ lid FILE
FILE           extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c
@end example

This example shows a regular expression search:

@example
prompt$ lid 'FILE$'
AF_FILE        mkid.c
AF_IDFILE      mkid.c
FILE           extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c
IDFILE         id.h @{fid,lid,mkid@}.c
IdFILE         @{fid,lid@}.c
@dots{}
@end example

@noindent As you can see, when a regular expression is used, it is
possible to get more than one line of output.  To merge multiple lines
into one, use @samp{-m}:

@example
prompt$ lid -m ^get
^get           extern.h @{bitsvec,fid,gets0,getsFF,getscan,idx,lid,@dots{}@}.c
@end example


@node aid invocation
@section @code{aid}: Matching strings

@pindex aid

Synopsis:

@example
aid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c
@var{string}@dots{}
@end example

@cindex case-insensitive searching
@cindex string searching
@code{aid} searches the database for identifiers containing the given
@var{string} arguments.  The search is case-insensitive.

@flindex whatis
The name ``aid'' stands for ``apropos identifier'', @code{apropros}
being a command that does a similar search of the @code{whatis} database
of @code{man} descriptions.

For example, @samp{aid get} matches the identifiers @code{fgets},
@code{GETLINE}, and @code{getchar}.

The default output format is the same as @code{lid}; see the previous
section.

@xref{Common query arguments}, for a description of the command-line
options and @var{pattern} arguments.


@node eid invocation
@section @code{eid}: Invoking an editor on matches

@pindex eid

Synopsis:

@example
eid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}]@dots{}
@end example

@code{eid} runs the usual search (@pxref{lid invocation}) on the given
arguments, shows you the output, and then asks:

@example
Edit? [y1-9^S/nq] 
@end example

@noindent
You can respond with:

@table @samp
@item y
Edit all files listed.

@item 1@dots{}9
Start editing at the @math{@var{n} + 1}'st file.

@item /@var{string} @r{or} @kbd{CTRL-S}@var{string}
Start editing at the first filename containing @var{string}.

@item n
Go on to the next @var{pattern}, i.e., edit nothing for this one.

@item q
Quit @code{eid}.

@end table

@code{eid} invokes the editor defined by the @samp{EDITOR} environment
variable to edit a file.  If this editor can accept an initial search
argument on the command line, @code{eid} can move automatically to the
location of the match, via the environment variables below.

@xref{Common query arguments}, for a description of the command-line
options and @var{pattern} arguments.

Here are the environment variables relevant to @code{eid}:

@table @samp

@item EDITOR
@vindex EDITOR
The name of the editor program to invoke.

@item EIDARG
@vindex EIDARG
@cindex search for identifier, initial
The argument to pass to the editor to search for the matching
identifier.  For @code{vi}, this should be @samp{+/%s/'}.

@item EIDLDEL
@vindex EIDLDEL
@cindex left delimiter editor argument
@cindex beginning-of-word editor argument
A regular expression to force a match at the beginning of a word (``left
delimiter).  @code{eid} inserts this in front of the matching identifier
when composing the search argument.  For @code{vi}, this should be
@samp{\<}.

@item EIDRDEL
@vindex EIDRDEL
@cindex right delimiter editor argument
@cindex end-of-word editor argument
The end-of-word regular expression.  For @code{vi}, this should be
@samp{\>}.

@end table

For Emacs users, the interface in @code{gid.el} is probably preferable
to @code{eid}.  @xref{GNU Emacs gid interface}.


Here is an example:

@example
prompt$ eid FILE \^print
FILE           @{ansi2knr,fid,filenames,idfile,idx,iid,lid,misc,@dots{}@}.c
Edit? [y1-9^S/nq] n
^print         @{ansi2knr,fid,getopt,getopt1,iid,lid,mkid,regex,scanners@}.c
Edit? [y1-9^S/nq] 2
@end example

@noindent This will start editing at @file{getopt}.c.


@node fid invocation
@section @code{fid}: Listing a file's identifiers

@pindex fid
@cindex identifiers in a file

@code{fid} lists the identifiers found in a given file.  Synopsis:

@example
fid [-f@var{dbfile}] @var{file1} [@var{file2}]
@end example

@table @samp

@item -f@var{dbfile}
Read the database from @var{dbfile} instead of @file{ID}.

@item @var{file1}
List all the identifiers contained in @var{file1}.

@item @var{file2}
With a second file argument, list only the identifiers both files have
in common.

@end table

The output is simply one identifier (or number) per line.


@node pid invocation
@chapter @code{pid}: Looking up filenames

@pindex pid
@cindex filenames, matching
@cindex matching filenames

@code{pid} matches the filenames stored in the ID database, rather than
the identifiers.  Synopsis:

@example
pid [-f@var{dbfile}] [-r@var{dir}] [-ebkgnc] @var{wildcard}@dots{}
@end example

By default, the @var{wildcard} patterns are treated as shell globbing
patterns, rather than the regular expressions the other utilities
accept.  See the section below for details.

Besides the standard options given in the synopsis (@pxref{Query
options}), @code{pid} accepts the following:

@table @samp

@item -e
@opindex -e
Do the usual regular expression matching (@pxref{Patterns}), instead
of shell wildcard matching.

@item -b
@opindex -b
@cindex basename match
Match the basenames of the files in the database.  For example,
@samp{pid -b foo} will match the stored filename @file{dir/foo}, but not
@file{foo/file}.

@end table

For example, the command:

@example
pid \*.c
@end example

@noindent lists all the @file{.c} files in the database.  (The @samp{\}
here protects the @samp{*} from being expanded by the shell.)

@menu
* Wildcard patterns::           Shell-style globbing patterns.
@end menu


@node Wildcard patterns
@section Wildcard patterns

@cindex globbing patterns
@cindex shell wildcard patterns
@cindex wildcard wildcard patterns

@code{pid} does simplified shell wildcard matching (unless the @samp{-e}
option is specified), rather than the regular expression matching done
by the other utilities.  Here is a description of wildcard matching,
also called @dfn{globbing}:

@itemize

@item
@kindex * @r{in globbing}
@samp{*} matches zero or more characters.

@item
@kindex ? @r{in globbing}
@samp{?} matches any single character.

@item
@kindex \ @r{in globbing}
@samp{\} forces the next character to be taken literally.

@item
@kindex [@dots{}] @r{in globbing}
@samp{[@var{chars}]} matches any single character listed in @var{chars}.

@item
@kindex [!@dots{}] @r{in globbing}
@samp{[!@var{chars}]} matches any character @emph{not} listed in @var{chars}.

@end itemize

Most shells treat @samp{/} and leading @samp{.} characters
specially. @code{pid} does not do this.  It simply matches the filename
in the database against the wildcard pattern.


@node iid invocation
@chapter @code{iid}: Complex interactive queries

@pindex iid
@cindex interactive queries
@cindex complex queries

@code{iid} is an interactive query utility for ID databases.  It
operates by running another query program (@code{lid} by default,
@code{aid} if @samp{-a} is specified) and manipulating the sets of
filenames returned by these queries.

@menu
* iid command line options::    Command-line options.
* iid query expressions::       Operands to the commands.
* iid commands::		Printing matching filenames, etc.
@end menu


@node iid command line options
@section @code{iid} command line options

@cindex options for @code{iid}
@pindex iid @r{options}

@code{iid} recognizes the following options (the standard query options
described in @ref{Query options} are inapplicable):

@table @samp

@item -a
@opindex -a
@pindex aid @r{used for @code{iid} searches}
Use @code{aid} for searches, instead of @code{lid}.

@item -c@var{command}
@pindex -c
Execute @var{command} and exit, instead of prompting for interactive
commands.

@item -H
@pindex -H
@cindex help for @code{iid}
Print a usage message and exit successfully.  The @code{help} command
inside @code{iid} gives more information.  @xref{iid commands}.

@end table


@node iid query expressions
@section @code{iid} query expressions

@cindex queries for @code{iid}
@pindex iid @r{query expressions}

An @code{iid} @dfn{query expression} generates a set of filenames or
manipulates existing sets.  These expressions are operands to some of
the @code{iid} commands (see the next section), not commands themselves.

Here are the possible constructs, highest precedence first:

@table @samp

@item s@var{set-number}
Refer to a set previously created by a query operation.  During each
@code{iid} session, every query generates a different set number, so
any previously generated set may be used as part of any new query by
reference to its set number.

@item @var{pattern}
@code{iid} treats any non-keyword input (i.e., anything not in this
table) as an identifier to be searched for in the database.  It is
passed to the search program (@code{lid} by default, @code{aid} if the
@code{-a} option was specified).  The result of this operation is a set
of filenames, and it is assigned a unique set number.

@item lid @var{identifier-list}
@cmindex lid @r{iid operator}
Invoke the @code{lid} program on @var{identifier-list} and construct a
new set from the result.

@item aid @var{identifier-list}
@cmindex lid @r{iid operator}
Like @code{lid}, but use the @code{aid} program.

@item match @var{wildcards}
@cmindex match @r{iid operator}
Invoke the @code{pid} program on @var{wildcards}, therefore matching on
the filenames in the database instead of the identifiers.  The resulting
set contains the filenames that match the specified patterns.  @xref{pid
invocation}.

@item not @var{expr}
@cmindex not @r{iid operator}
The result is those filenames in the database that are not in
@var{expr}.

@item @var{expr1} and @var{expr2}
@cmindex and @r{iid operator}
The result is the intersection of the sets @var{expr1} and @var{expr2},
i.e., only those filenames contained in both.

@item @var{expr1} or @var{expr2}
@cmindex or @r{iid operator}
The result is the union of the sets @var{expr1} and @var{expr2}, i.e.,
all the filenames contained in either or both.

@end table

Operator names are recognized independent of case, so @code{AND},
@code{and}, and @code{aNd} are all the same as far as @code{iid} is
concerned.

To pass a keyword as an operand, you must enclose it in double quotes:
the command @samp{lid "lid"} generates the set of all filenames matching
the string @samp{lid}.

Patterns containing shell metacharacters (such as @samp{*} or @samp{?})
must also be properly quoted, since the query commands are run by
invoking them with the shell.

@c Summary of query expression syntax:
@c 
@c A @var{query} is:
@c @example
@c    <set number>
@c    <identifier>
@c    lid <identifier list>
@c    aid <identifier list>
@c    match <wildcard list>
@c    <query> or <query>
@c    <query> and <query>
@c    not <query>
@c    ( <query> )
@c @end example


@node iid commands
@section @code{iid} commands

@cindex commands for @code{iid}
@pindex iid @r{commands}

This section describes the interactive commands that @code{iid}
recognizes.  The database query expressions you can pass to the
@samp{ss} and @samp{files} commands are described in the previous
section.

Some commands output a @dfn{summary line} for sets. These lines show the
set number, the number of filenames in the set, and the command that
generated it.

@table @samp

@item ss @var{query}
@cmindex ss iid @r{command}
Build the set(s) of filenames resulting from the query expression
@var{query}.  The output is a summary line for each set.

@item files @var{query}
@itemx f @var{query}
@cmindex files iid @r{command}
@cmindex f iid @r{command}
Evaluate the query expression @var{query} as in @code{ss}, but output
the full list of matching filenames instead of a summary.

@item sets
@cmindex sets iid @r{command}
Output a summary line for each extant set.

@item show @var{set}
@itemx p @var{set}
@cmindex show iid @r{command}
@cmindex p iid @r{command}
@vindex PAGER
@pindex emacsclient
Pass the filename in the set number @var{set} to the program named in
the @code{PAGER} environment variable.  Typically, this is a
page-at-a-time display program like @code{less} or @code{more}.  If you
use Emacs, you might want to set @samp{PAGER} to @code{emacsclient}
(@pxref{Emacs Server,,, emacs, The GNU Emacs Manual}).

@item @r{anything else}
@cindex shell commands in @code{iid}
When @code{iid} does not recognize the first word on an input line as a
builtin @code{iid} command, it assumes the input is a shell command
which will write a list of filenames to standard output, which it
gathers into a set as usual.

Any set numbers that appear in the input are expanded into the lists of
filenames they represent prior to running the command.

@item !@var{shell-command}
@cmindex ! iid @r{command}
@cindex shell escape
Expand set numbers appear in @var{shell-command} into the filenames they
represent, and pass the result to @file{/bin/sh}. The output is not
interpreted.

@item begin @var{directory}
@itemx b @var{directory}
@cmindex begin iid @r{command}
@cmindex b iid @r{command}
Begin a new @code{iid} session in a different directory (which
presumably contains a different database).  It deletes all the sets
created so far and switches to the specified directory.  It is
equivalent to exiting @code{iid}, changing directories in the shell, and
running @code{iid} again.

@item help
@itemx h
@itemx ?
@cmindex help iid @r{command}
@cmindex h iid @r{command}
@cmindex ? iid @r{command}
Display a short help file using the program named in @samp{PAGER}.

@item quit
@itemx q
@itemx off
@cmindex quit iid @r{command}
@cmindex q iid @r{command}
@cmindex off iid @r{command}
Quit @code{iid}. An end-of-file character (usually @kbd{CTRL-D}) also exits.

@end table


@node Index
@unnumbered Index

@printindex cp

@contents
@bye