\input texinfo @comment %**start of header @setfilename id.info @settitle ID database utilities @comment %**end of header @include version.texi @c Define new indices for filenames, commands and options. @defcodeindex fl @defcodeindex cm @defcodeindex op @c Put everything in one index (arbitrarily chosen to be the concept index). @syncodeindex fl cp @syncodeindex fn cp @syncodeindex ky cp @syncodeindex op cp @syncodeindex pg cp @syncodeindex vr cp @ifinfo @set Francois Franc,ois @end ifinfo @tex @set Francois Fran\noexpand\ptexc cois @end tex @ifinfo @format START-INFO-DIR-ENTRY * ID database: (id). Identifier database utilities. * aid: (id)aid invocation:: Matching strings. * eid: (id)eid invocation:: Invoking an editor on matches. * fid: (id)fid invocation:: Listing a file's identifiers. * gid: (id)gid invocation:: Listing all matching lines. * idx: (id)idx invocation:: Testing mkid scanners. * iid: (id)iid invocation:: Interactive complex queries. * lid: (id)lid invocation:: Matching patterns. * mkid: (id)mkid invocation:: Creating an ID database. * pid: (id)pid invocation:: Looking up filenames. END-INFO-DIR-ENTRY @end format @end ifinfo @ifinfo This file documents the @code{mkid} identifier database utilities. Copyright (C) 1991, 1995 Tom Horsley. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. @ignore Permission is granted to process this file through TeX and print the results, provided the printed document carries copying permission notice identical to this one except for the removal of this paragraph (this paragraph not being relevant to the printed manual). @end ignore Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation. @end ifinfo @titlepage @title ID database utilities @subtitle Programs for simple, fast, high-capacity cross-referencing @subtitle for version @value{VERSION} @author Tom Horsley @page @vskip 0pt plus 1filll Copyright @copyright{} 1991, 1995 Tom Horsley. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation. @end titlepage @ifinfo @node Top @top ID database utilities This manual documents version @value{VERSION} of the ID database utilities. @menu * Introduction:: Overview of the tools, and authors. * mkid invocation:: Creating an ID database. * Common query arguments:: Common lookup options and search patterns. * gid invocation:: Listing all matching lines. * Looking up identifiers:: lid, aid, eid, and fid. * pid invocation:: Looking up filenames. * iid invocation:: Interactive and complex queries. * Index:: General index. @end menu @end ifinfo @node Introduction @chapter Introduction @cindex overview @cindex introduction @cindex ID database, definition of An @dfn{ID database} is a binary file containing a list of filenames, a list of identifiers, and a matrix indicating which identifiers appear in which files. With this database and some tools to manipulate it (described in this manual), a host of tasks become simpler and faster. For example, you can list all files containing a particular @code{#include} throughout a huge source hierarchy, search for all the memos containing references to a project, or automatically invoke an editor on all files containing references to some function. Anyone with a large software project to maintain, or a large set of text files to organize, can benefit from an ID database. Although the ID utilities are most commonly used with identifiers, numeric constants are also stored in the database, and can be searched for in the same way (independent of radix, if desired). There are a number of programs in the ID family: @table @code @item mkid scans files for identifiers and numeric constants and builds the ID database file. @item gid lists all lines that match given patterns. @item lid lists the filenames containing identifiers that match given patterns. @item aid lists the filenames containing identifiers that contain given strings, independent of case. @item eid invokes an editor on each file containing identifiers that match given patterns. @item fid lists all identifiers recorded in the database for given files, or identifiers common to two files. @item pid matches the filenames in the database, rather than the identifiers. @item iid interactively supports more complex queries, such as intersection and union. @item idx helps with testing of new @code{mkid} scanners. @end table @cindex bugs, reporting Please report bugs to @samp{gkm@@magilla.cichlid.com}. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. @xref{Bugs, , , gcc, GNU CC}. @menu * Past and future:: How the ID tools came about, and where they're going. @end menu @node Past and future @section Past and future @cindex history @pindex look @r{and @code{mkid} 1} @cindex McGary, Greg Greg McGary conceived of the ideas behind mkid when he began hacking the Unix kernel in 1984. He needed a navigation tool to help him find his way the expansive, unfamiliar landscape. The first @code{mkid}-like tools were shell scripts, and produced an ASCII database that looks much like the output of @code{lid} with no arguments. It took over an hour on a VAX 11/750 to build a database for a 4.1BSD-ish kernel. Lookups were done with the system utility @code{look}, modified to handle very long lines. In 1986, Greg rewrote @code{mkid}, @code{lid}, @code{fid} and @code{idx} in C to improve performance. Database-build times were shortened by an order of magnitude. The @code{mkid} tools were first posted to @samp{comp.sources.unix} in September 1987. @cindex Horsley, Tom @cindex Scofield, Doug @cindex Leonard, Bill @cindex Berry, Karl Over the next few years, several versions diverged from the original source. Tom Horsley at Harris Computer Systems Division stepped forward to take over maintenance and integrated some of the fixes from divergent versions. He also wrote the @code{iid} program. A first release of @code{mkid} @w{version 2} was posted to @file{alt.sources} near the end of 1990. At that time, Tom wrote this Texinfo manual with the encouragement the net community. (Tom especially thanks Doug Scofield and Bill Leonard whom he dragooned into helping poorfraed and edit---they found several problems in the initial version.) Karl Berry revamped the manual for Texinfo style, indexing, and organization in 1995. @pindex cscope @pindex grep @cindex future In January 1995, Greg McGary reemerged as the primary maintaner and launched development of @code{mkid} version 3, whose primary new feature is an efficient algorithm for building databases that is linear in both time and space over the size of the input text. (The old algorithm was quadratic in space and therefore choked on very large source trees.) The code is released under the GNU Public License, and might become a part of the GNU system. @code{mkid} 3 is an interim release, since several significant enhancements are still in the works: an optional coupling with GNU @code{grep}, so that @code{grep} can use an ID database for hints; a @code{cscope} work-alike query interface; incremental update of the ID database; and an automatic file-tree walker so you need not explicitly supply every filename argument to the @code{mkid} program. @node mkid invocation @chapter @code{mkid}: Creating ID databases @pindex mkid @cindex creating databases @cindex databases, creating @pindex cron The @code{mkid} program builds an ID database. To do this, it must scan each file you tell it to include in the database. This takes some time, but once the work is done the query programs run very rapidly. (You can run @code{mkid} as a @code{cron} job to regularly update your databases.) The @code{mkid} program knows how to extract identifiers from various types of files. For example, it can recognize and skip over comments and string constants in a C program. @cindex numbers, in databases Identifiers are not the only thing included in the database. Numbers are also recognized and included in the database indexed by their binary value. This feature allows you to find uses of constants without regard to the radix used to specify them, since the same number can frequently be written in many different ways (for instance, @samp{47}, @samp{0x2f}, @samp{057} in C). All the places in this document which mention identifiers should really mention both identifiers and numbers, but that gets fairly clumsy after a while, so you just need to keep in mind that numbers are included in the database as well as identifiers. @cindex ID file format @cindex architecture-independence @cindex sharing ID files The ID files that @code{mkid} creates are architecture- and byte-order-independent; you can share them at will across systems. @menu * mkid options:: Command-line options to mkid. * Scanners:: Built-in and defining your own. * mkid examples:: Examples of mkid usage. @end menu @node mkid options @section @code{mkid} options @cindex options for @code{mkid} @pindex mkid @r{options} By default, @code{mkid} scans the files you specify and writes the database to a file named @file{ID} in the current directory. @example mkid [-v] [-S@var{scanarg}] [-a@var{argfile}] [-] [-f@var{idfile}] @c @var{files}@dots{} @end example The program accepts the following options. @table @samp @item -v @opindex -v @cindex statistics Verbose. @code{mkid} tells you as it scans each file and indicates which scanner it is using. It also summarizes some statistics about the database at the end. @item -S@var{scanarg} @opindex -S@var{scanarg} Specify options regarding @code{mkid}'s scanners. @xref{Scanner option formats}. @item -a@var{argfile} @opindex -a@var{argfile} Read additional command line arguments from @var{argfile}. This is typically used to specify lists of filenames longer than will fit on a command line; some systems have severe limitations on the total length of a command line. @item - @opindex - Read additional command line arguments from standard input. @item -f@var{idfile} Write the database to the file @var{idfile}, instead of @file{ID}. The database stores filenames relative to the directory containing the database, so if you move the database to a different directory after creating it, you may have trouble finding files. @c @item -u @c @opindex -u @c The @code{-u} option updates an existing database by rescanning any @c files that have changed since the database was written. Unfortunately @c you cannot incrementally add new files to a database. @c Greg is reimplementing this ... @end table The remaining arguments @var{files} are the files to be scanned and included in the database. If no files are given at all (either on command line or via @samp{-a} or @samp{-}), @code{mkid} does nothing. @node Scanners @section Scanners @cindex scanners To determine which identifiers to extract from a file and store in the database, @code{mkid} calls a @dfn{scanner}; we say a scanner @dfn{recognizes} a particular language. Scanners for several languages are built-in to @code{mkid}; you can add your own scanners as well, as explained in the sections below. @cindex suffixes of filenames @code{mkid} determines which scanner to use for a particular file by looking at the suffix of the filename. This @dfn{suffix} is everything after and including the last @samp{.} in a filename; for example, the suffix of @file{foo.c} is @file{.c}. @code{mkid} has a built-in list of bindings from some suffixes to corresponding scanners; for example, @file{.c} files are (not surprisingly) scanned by the predefined C language scanner. @findex .default @r{scanner} If @code{mkid} cannot determine what scanner to use for a particular file, either because the file has no suffix (e.g., @file{foo}) or because @code{mkid} has no binding for the file's suffix (e.g., @file{foo.bar}), it uses the scanner bound to the @samp{.default} suffix. By default, this is the plain text scanner (@pxref{Plain text scanner}), but you can change this with the @samp{-S} option, as explained below. @menu * Scanner option formats:: Overview of the -S option. * Predefined scanners:: The C, plain text, and assembler scanners. * Defining new scanners:: Either in source code or at runtime with -S. * idx invocation:: Testing mkid scanners. @end menu @node Scanner option formats @subsection Scanner option formats @cindex scanner options @opindex -S @r{scanner option} With the @samp{-S} option, you can change which language scanner to use for which files, give language-specific options, and get some limited online help about scanner options. Here are the different forms of the @samp{-S} option: @table @samp @item -S.@var{suffix}=@var{scanner} @opindex -S. Use @var{scanner} for a file with the given @samp{.@var{suffix}}. For example, @samp{-S.yacc=c} tells @code{mkid} to use the @samp{c} language scanner for all files ending in @samp{.yacc}. @item -S.@var{suffix}=? Display which scanner is used for the given @samp{.@var{suffix}}. @item -S?=@var{scanner} @opindex -S? Display which suffixes @var{scanner} is used for. @item -S?=? Display the scanner binding for every known suffix. @item -S@var{scanner}+@var{arg} @itemx -S@var{scanner}-@var{arg} Each scanner accepts certain scanner-dependent arguments. These options all have one of these forms. @xref{Predefined scanners}. @item -S@var{scanner}? Display the scanner-specific options accepted by @var{scanner}. @item -S@var{new-scanner}/@var{old-scanner}/@var{filter-command} Define @var{new-scanner} in terms of @var{old-scanner} and @var{filter-command}. @xref{Defining scanners with options}. @end table @node Predefined scanners @subsection Predefined scanners @cindex predefined scanners @cindex scanners, predefined @code{mkid} has built-in scanners for several types of languages; you can get the list by running @code{mkid -S?=?}. The supported languages are documented below@footnote{This is not strictly true: @samp{vhil} is a supported language, but it is an obsolete and arcane dialect of C and should be ignored.}. @menu * C scanner:: For the C programming language. * Plain text scanner:: For documents or other non-source code. * Assembler scanner:: For assembly language. @end menu @node C scanner @subsubsection C scanner @cindex C scanner, predefined @flindex .[chly] @r{files, scanning} The C scanner is the most commonly used. Files with the usual @file{.c} and @file{.h} suffixes, and the @file{.y} (yacc) and @file{.l} (lex) suffixes, are processed with this scanner (by default). Scanner-specific options: @table @samp @item -Sc-s@var{character} @kindex $ @r{in identifiers} @opindex -Sc-s Allow the specified @var{character} in identifiers. For example, if you use @samp{$} in identifiers, you'll want to use @samp{-Sc-s$}. @item -Sc+u @opindex -Sc+u Strip leading underscores from identifiers. You might to do this in peculiar circumstances, such as trying to parse the output from @code{nm} or some other system utility. @item -Sc-u @opindex -Sc-u Don't strip leading underscores from identifiers; this is the default. @end table @node Plain text scanner @subsubsection Plain text scanner @cindex plain text scanner The plain text scanner is intended for scanning most non-source-code files. This is typically the scanner used when adding custom scanners via @samp{-S} (@pxref{Defining scanners with options}). @c @code{mkid} predefines a troff scanner in terms of the plain text @c scanner and @c the @code{deroff} utility. @c A compressed man page @c scanner runs @code{pcat} piped into @code{col -b}, and a @TeX{} scanner @c runs @code{detex}. Scanner-specific options: @table @samp @item -Stext+a@var{character} @opindex -Stext+a Include @var{character} in identifiers. By default, letters (a--z and A--Z) and underscore are included. @item -Stext-a@var{character} @opindex -Stext-a Exclude @var{character} from identifiers. @item -Stext+s@var{character} @opindex -Stext+s @cindex squeezing characters from identifiers Squeeze @var{character} from identifiers, i.e., do not terminate an identifier when @var{character} is seen. By default, the characters @samp{'}, @samp{-}, and @samp{.} are squeezed out of identifiers. For example, the input @samp{fred's} leads to the identifier @samp{freds}. @item -Stext-s@var{character} Do not squeeze @var{character}. @end table @node Assembler scanner @subsubsection Assembler scanner @cindex assembler scanner Since assembly languages come in several flavors, this scanner has a number of options: @table @samp @item -Sasm-c@var{character} @opindex -Sasm-c @cindex comments in assembler Define @var{character} as starting a comment that extends to the end of the input line; no default. In many assemblers this is @samp{;} or @samp{#}. @item -Sasm+u @itemx -Sasm-u @opindex -Sasm+u Strip (@samp{+u}) or do not strip (@samp{-u}) leading underscores from identifiers. The default is to strip them. @item -Sasm+a@var{character} @opindex -Sasm+a Allow @var{character} in identifiers. @item -Sasm-a@var{character} Allow @var{character} in identifiers, but if an identifier contains @var{character}, ignore it. This is useful to ignore temporary labels, which can be generated in great profusion; these often contain @samp{.} or @samp{@@}. @item -Sasm+p @itemx -Sasm-p @opindex -Sasm+p Recognize (@samp{+p}) or do not recognize (@samp{-p}) C preprocessor directives in assembler source. The default is to recognize them. @item -Sasm+C @itemx -Sasm-C @opindex -Sasm+C Skip over (@samp{+C}) or do not skip over (@samp{-C}) C style comments in assembler source. The default is to skip them. @end table @node Defining new scanners @subsection Defining new scanners @cindex scanners, adding new You can add new scanners to @code{mkid} in two ways: modify the source code and recompile, or at runtime via the @samp{-S} option. Each has their advantages and disadvantages, as explained below. If you create a new scanner that would be of use to others, please consider sending it back to the maintainer, @samp{gkm@@magilla.cichlid.com}, for inclusion in future releases of @code{mkid}. @menu * Defining scanners in source code:: * Defining scanners with options:: @end menu @node Defining scanners in source code @subsubsection Defining scanners in source code @flindex scanners.c @cindex scanners, defining in source code @vindex languages_0 @vindex suffixes_0 To add a new scanner in source code, you should add a new section to the file @file{scanners.c}. Copy one of the existing scanners (most likely either C or plain text), and modify as necessary. Also add the new scanner to the @code{languages_0} and @code{suffixes_0} tables near the beginning of the file. This is not a terribly difficult programming task, but it requires recompiling and installing the new version of @code{mkid}, which may be inconvenient. This method leads to scanners which operate much more quickly than ones that depend on external programmers. It is also likely the easiest way to define scanners for new programming languages. @node Defining scanners with options @subsubsection Defining scanners with options @cindex scanners, defining with options You can use the @samp{-S} option on the command line to define a new language scanner: @example -S@var{new-scanner}/@var{existing-scanner}/@var{filter} @end example @noindent Here, @var{new-scanner} is the name of the new scanner being defined, @var{existing-scanner} is the name of an existing scanner, and @var{filter} is a shell command or pipeline. The new scanner works by passing the input file to @var{filter}, and then arranging for the result to be passed through @var{existing-scanner}. Typically, @var{existing-scanner} is @samp{text}. Somewhere within @var{filter}, the string@samp{%s} should occur. This @samp{%s} is replaced by the name of the source file being scanned. @cindex Texinfo, scanning example of For example, @code{mkid} has no built-in scanner for Texinfo files (like this one). In indexing a Texinfo file, you most likely would want to ignore the Texinfo @@-commands. Here's one way to specify a new scanner to do this: @example -S/texinfo/text/sed s,@@[a-z]*,,g %s @end example This defines a new language scanner (@samp{texinfo}) defined in terms of a @code{sed} command to strip out Texinfo directives (an @samp{@@} character followed by letters). Once the directives are stripped, the remaining text is run through the plain text scanner. This is a minimal example; to do a complete job, you would need to completely delete some lines, such as those beginning with @code{@@end} or @@node. @node idx invocation @subsection @code{idx}: Testing @code{mkid} scanners @code{idx} prints the identifiers found in the files you specify to standard output. This is useful in debugging new @code{mkid} scanners (@pxref{Scanners}). Synopsis: @example idx [-S@var{scanarg}] @var{files}@dots{} @end example @code{idx} accepts the same @samp{-S} options as @code{mkid}. @xref{Scanner option formats}. The name ``idx'' stands for ``ID eXtract''. The name may change in future releases, since this is such an infrequently used program. @node mkid examples @section @code{mkid} examples @cindex examples of @code{mkid} The simplest example of @code{mkid} is something like: @example mkid *.[chy] @end example This will build an ID database indexing identifiers and numbers in the all the @file{.c}, @file{.h}, and @file{.y} files in the current directory. Because @code{mkid} already knows how to scan files with those suffixes, no additional options are needed. @cindex man pages, compressed @cindex compressed files, building ID from Here's a more complex example. Suppose you want to build a database indexing the contents of all the @code{man} pages, and furthur suppose that your system is using @code{gzip} (@pxref{Top, , , gzip, Gzip}) to store compressed @code{cat} versions of the @code{man} pages in the directory @file{/usr/catman}. The @code{gzip} program creates files with a @code{.gz} suffix, so you must tell @code{mkid} how to scan @file{.gz} files. Here are the commands to do the job: @example cd /usr/catman find . -name \*.gz -print | mkid '-Sman/text/gzip <%s' -S.gz=man - @end example @noindent Explanation: @enumerate @item We first @code{cd} to @file{/usr/catman} so the ID database will store the correct relative filenames. @item The @code{find} command prints the names of all @file{.gz} files under the current directory. @xref{find invocation, , , sh-utils, GNU shell utilities}. @item This list is piped to @code{mkid}; the @code{-} option (at the end of the line) tells @code{mkid} to read arguments (in this case, as is typical, the list of filenames) from standard input. @xref{mkid options}. @item The @samp{-Sman/text/gzip @dots{}} defines a new language @samp{man} in terms of the @code{gzip} program and @code{mkid}'s existing text scanner. @xref{Defining scanners with options}. @item The @samp{-S.gz=man} tells @code{mkid} to treat all @file{.gz} files as this new language @code{man}. @xref{Scanner option formats}. @end enumerate As a further complication, @code{cat} pages typically contain underlining and backspace sequences, which will confuse @code{mkid}. To handle this, the @code{gzip} command becomes a pipeline, like this: @example mkid '-Sman/text/gzip <%s | col -b' -S.gz=man - @end example @node Common query arguments @chapter Common query arguments @cindex common query arguments Certain options, and regular expression syntax, are shared by the ID query tools. So we describe those things in the sections below, instead of repeating the description for each tool. @menu * Query options:: -f -r -c -ew -kg -n -doxa -m -F -u. * Patterns:: Regular expression syntax for searches. * Examples: Query examples. Some common uses. @end menu @node Query options @section Query options @cindex query options, common @cindex common query options The ID query tools (@emph{not} @code{mkid}) share certain command line options. Not all of these options are recognized by all programs, but if an option is used by more than one program, it is described below. The description of each program gives the options that program uses. @table @samp @item -f@var{idfile} @opindex -f@var{idfile} @cindex database name, specifying @cindex parent directories, searched for ID Read the database from @var{idfile}, in the current directory or in any directory above the current directory. The default database name is @file{ID}. Searching parent directories lets you have a single ID database at the root of a large source tree and then use the query tools from anywhere within that tree. @item -r@var{directory} @opindex -r@var{directory} Find files relative to @var{directory}, instead of the directory in which the ID database was found. This is useful if the ID database was moved after its creation. @item -c @opindex -c Equivalent to @code{-r`pwd`}, i.e., find files relative to the current directory, instead of the directory in which the ID database was found. @item -e @itemx -w @opindex -e @opindex -w @cindex regular expressions, forcing evaluation as @cindex strings, forcing evaluation as @cindex constant strings, forcing evaluation as @samp{-e} forces pattern arguments to be treated as regular expressions, and @samp{-w} forces pattern arguments to be treated as constant strings. By default, the query tools guess whether a pattern is regular expressions or constant strings by looking for special characters. @xref{Patterns}. @item -k @itemx -g @opindex -k @opindex -g @cindex brace notation in filename lists @cindex shell brace notation in filename lists @samp{-k} suppresses use of shell brace notation in the output. By default, the query tools that generate lists of filenames attempt to compress the lists using the usual shell brace notation, e.g., @file{@{foo,bar@}.c} to mean @file{foo.c} and @file{bar.c}. (This is useful if you use @code{ksh} or the original (not GNU) @code{sh} and want to feed the list of names to another command, since those shells do not support this brace notation; the name of the @code{-k} option comes from the @code{k} in @code{ksh}). @samp{-g} turns on use of brace notation; this is only needed if the query tools were compiled with @samp{-k} as the default behavior. @item -n @opindex -n @cindex suppressing matching identifier Suppress the matching identifier before each list of filenames that the query tools output by default. This is useful if you want a list of just the names to feed to another command. @item -d @itemx -o @itemx -x @itemx -a @opindex -d @opindex -o @opindex -x @opindex -a @cindex radix of numeric matches, specifying @cindex numeric matches, specifying radix of These options may be used in any combination to specify the radix of numeric matches. @samp{-d} allows matching on decimal numbers, @samp{-o} on octal numbers, and @samp{-x} on hexadecimal numbers. The @code{-a} option is equivalent to specifying all three; this is the default. Any combination of these options may be used. @item -m @opindex -m @cindex multiple lines, merging Merge multiple lines of output into a single line. If your query matches more than one identifier, the default is to generate a separate line of output for each matching identifier. @itemx -F- @itemx -F@var{n} @itemx -F-@var{m} @itemx -F@var{n}-@var{m} @opindex -F @cindex single matches, showing Show identifiers matching at least @var{n} and at most @var{m} times. @samp{-F-} is equivalent to @samp{-F1}, i.e., find identifiers that appear only once in the database. (This is useful to locate identifiers that are defined but never used, or used once and never defined.) @item -u@var{number} @opindex -u @cindex conflicting identifiers, finding List identifiers that conflict in the first @var{number} characters. This could be in useful porting programs to brain-dead computers that refuse to support long identifiers, but your best long term option is to set such computers on fire. @end table @node Patterns @section Patterns @cindex patterns @cindex regular expression syntax @dfn{Patterns}, also called @dfn{regular expressions}, allow you to match many different identifiers in a single query. The same regular expression syntax is recognized by all the query tools that handle regular expressions. The exact syntax depends on how the ID tools were compiled, but the following constructs should always be supported: @table @samp @item . Match any single character. @item [@var{chars}] Match any of the characters specified within the brackets. You can match any characters @emph{except} the ones in brackets by typing @samp{^} as the first character. A range of characters can be specified using @samp{-}. For example, @samp{[abc]} and @samp{[a-c]} both match @samp{a}, @samp{b}, or @samp{c}, and @samp{[^abc]} matches anything @emph{except} @samp{a}, @samp{b}, or @samp{c}. @item * Match the previous construct zero or more times. @item ^ @itemx $ @samp{^} (@samp{$}) at the beginning (end) of a pattern anchors the match to the first (last) character of the identifier. @end table The query programs use either the @code{regex}/@code{regcmp} or @code{re_comp}/@code{re_exec} functions, depending on which are available in the library on your system. These do not always support the exact same regular expression syntax, so consult your local @code{man} pages to find out. @node Query examples @section Query examples @cindex examples, queries @cindex query examples Here are some examples of the options described in the previous sections. To restrict searches to exact matches, use @samp{^@dots{}$}. For example: @example prompt$ gid '^FILE$' ansi2knr.c:144: @{ FILE *in, *out; ansi2knr.c:315: FILE *out; fid.c:38: FILE *id_FILE; filenames.c:576: FILE * @dots{} @end example To show identifiers not unique in the first 16 characters: @example prompt$ lid -u16 RE_CONTEXT_INDEP_ANCHORS regex.c RE_CONTEXT_INDEP_OPS regex.c RE_SYNTAX_POSIX_BASIC regex.c RE_SYNTAX_POSIX_EXTENDED regex.c @dots{} @end example @cindex numeric searches Numbers are searched for numerically rather than textually. For example: @example prompt$ lid 0xff 0377 @{lid,regex@}.c 0xff @{bitops,fid,lid,mkid@}.c 255 regex.c @end example On the other hand, you can restrict a numeric search to a particular radix if you want: @example laurie$ lid -x 0xff 0xff @{bitops,fid,lid,mkid@}.c @end example Filenames in the output are always adjusted to be correct for the correct working directory. For example: @example prompt$ lid bdevsw bdevsw sys/conf.h cf/conf.c io/bio.c os/@{fio,main,prf,sys3@}.c prompt$ cd io prompt$ lid bdevsw bdevsw ../sys/conf.h ../cf/conf.c bio.c ../os/@{fio,main,prf,sys3@}.c @end example @node gid invocation @chapter @code{gid}: Listing matching lines Synopsis: @example gid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}@dots{}] @end example @code{gid} finds the identifiers in the database that match the specified @var{pattern}s, then searches for all occurrences of those identifiers, in only the files containing matches. In a large source tree, this saves an enormous amount of time (compared to searching every source file). With no @var{pattern} arguments, @code{gid} prints every line of every source file. The name ``gid'' stands for ``grep for identifiers'', @code{grep} being the standard utility to search regular files. @xref{Common query arguments}, for a description of the command-line options and @var{pattern} arguments. @code{gid} uses the standard GNU output format for identifying source lines: @example @var{filename}:@var{linenum}: @var{text} @end example Here is an example: @example prompt$ gid FILE ansi2knr.c:144: @{ FILE *in, *out; ansi2knr.c:315: FILE *out; fid.c:38: FILE *id_FILE; @dots{} @end example @menu * GNU Emacs gid interface:: Using next-error with gid. @end menu @node GNU Emacs gid interface @section GNU Emacs @code{gid} interface @cindex Emacs interface to @code{gid} @flindex gid.el @r{interface to Emacs} @vindex load-path The @code{mkid} source distribution comes with a file @file{gid.el}, which defines a GNU Emacs interface to @code{gid}. To install it, put @file{gid.el} somewhere that Emacs will find it (i.e., in your @code{load-path}) and put @example (autoload 'gid "gid" nil t) @end example @noindent in one of Emacs' initialization files, e.g., @file{~/.emacs}. You will then be able to use @kbd{M-x gid} to run the command. @findex gid @r{Emacs function} The @code{gid} function prompts you with the word around point. If you want to search for something else, simply delete the line and type the pattern of interest. @flindex *scratch* @r{Emacs buffer} The function then runs the @code{gid} program in a @samp{*compilation*} buffer, so the normal @code{next-error} function can be used to visit all the places the identifier is found (@pxref{Compilation,,, emacs, The GNU Emacs Manual}). @node Looking up identifiers @chapter Looking up identifiers These commands look up identifiers in the ID database and operate on the files containing matches. @menu * lid invocation:: Matching patterns. * aid invocation:: Matching strings. * eid invocation:: Invoking an editor on matches. * fid invocation:: Listing a file's identifiers. @end menu @node lid invocation @section @code{lid}: Matching patterns @pindex lid Synopsis: @example lid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c @var{pattern}@dots{} @end example @code{lid} searches the database for identifiers matching the given @var{pattern} arguments and prints the names of the files that match each @var{pattern}. With no @var{pattern}s, @code{lid} lists every entry in the database. The name ``lid'' stands for ``lookup identifier''. @xref{Common query arguments}, for a description of the command-line options and @var{pattern} arguments. By default, each line of output consists of an identifier and all the files containing that identifier. Here is an example showing a search for a single identifier (omitting some output to keep lines short): @example prompt$ lid FILE FILE extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c @end example This example shows a regular expression search: @example prompt$ lid 'FILE$' AF_FILE mkid.c AF_IDFILE mkid.c FILE extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c IDFILE id.h @{fid,lid,mkid@}.c IdFILE @{fid,lid@}.c @dots{} @end example @noindent As you can see, when a regular expression is used, it is possible to get more than one line of output. To merge multiple lines into one, use @samp{-m}: @example prompt$ lid -m ^get ^get extern.h @{bitsvec,fid,gets0,getsFF,getscan,idx,lid,@dots{}@}.c @end example @node aid invocation @section @code{aid}: Matching strings @pindex aid Synopsis: @example aid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c @var{string}@dots{} @end example @cindex case-insensitive searching @cindex string searching @code{aid} searches the database for identifiers containing the given @var{string} arguments. The search is case-insensitive. @flindex whatis The name ``aid'' stands for ``apropos identifier'', @code{apropros} being a command that does a similar search of the @code{whatis} database of @code{man} descriptions. For example, @samp{aid get} matches the identifiers @code{fgets}, @code{GETLINE}, and @code{getchar}. The default output format is the same as @code{lid}; see the previous section. @xref{Common query arguments}, for a description of the command-line options and @var{pattern} arguments. @node eid invocation @section @code{eid}: Invoking an editor on matches @pindex eid Synopsis: @example eid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}]@dots{} @end example @code{eid} runs the usual search (@pxref{lid invocation}) on the given arguments, shows you the output, and then asks: @example Edit? [y1-9^S/nq] @end example @noindent You can respond with: @table @samp @item y Edit all files listed. @item 1@dots{}9 Start editing at the @math{@var{n} + 1}'st file. @item /@var{string} @r{or} @kbd{CTRL-S}@var{string} Start editing at the first filename containing @var{string}. @item n Go on to the next @var{pattern}, i.e., edit nothing for this one. @item q Quit @code{eid}. @end table @code{eid} invokes the editor defined by the @samp{EDITOR} environment variable to edit a file. If this editor can accept an initial search argument on the command line, @code{eid} can move automatically to the location of the match, via the environment variables below. @xref{Common query arguments}, for a description of the command-line options and @var{pattern} arguments. Here are the environment variables relevant to @code{eid}: @table @samp @item EDITOR @vindex EDITOR The name of the editor program to invoke. @item EIDARG @vindex EIDARG @cindex search for identifier, initial The argument to pass to the editor to search for the matching identifier. For @code{vi}, this should be @samp{+/%s/'}. @item EIDLDEL @vindex EIDLDEL @cindex left delimiter editor argument @cindex beginning-of-word editor argument A regular expression to force a match at the beginning of a word (``left delimiter). @code{eid} inserts this in front of the matching identifier when composing the search argument. For @code{vi}, this should be @samp{\<}. @item EIDRDEL @vindex EIDRDEL @cindex right delimiter editor argument @cindex end-of-word editor argument The end-of-word regular expression. For @code{vi}, this should be @samp{\>}. @end table For Emacs users, the interface in @code{gid.el} is probably preferable to @code{eid}. @xref{GNU Emacs gid interface}. Here is an example: @example prompt$ eid FILE \^print FILE @{ansi2knr,fid,filenames,idfile,idx,iid,lid,misc,@dots{}@}.c Edit? [y1-9^S/nq] n ^print @{ansi2knr,fid,getopt,getopt1,iid,lid,mkid,regex,scanners@}.c Edit? [y1-9^S/nq] 2 @end example @noindent This will start editing at @file{getopt}.c. @node fid invocation @section @code{fid}: Listing a file's identifiers @pindex fid @cindex identifiers in a file @code{fid} lists the identifiers found in a given file. Synopsis: @example fid [-f@var{dbfile}] @var{file1} [@var{file2}] @end example @table @samp @item -f@var{dbfile} Read the database from @var{dbfile} instead of @file{ID}. @item @var{file1} List all the identifiers contained in @var{file1}. @item @var{file2} With a second file argument, list only the identifiers both files have in common. @end table The output is simply one identifier (or number) per line. @node pid invocation @chapter @code{pid}: Looking up filenames @pindex pid @cindex filenames, matching @cindex matching filenames @code{pid} matches the filenames stored in the ID database, rather than the identifiers. Synopsis: @example pid [-f@var{dbfile}] [-r@var{dir}] [-ebkgnc] @var{wildcard}@dots{} @end example By default, the @var{wildcard} patterns are treated as shell globbing patterns, rather than the regular expressions the other utilities accept. See the section below for details. Besides the standard options given in the synopsis (@pxref{Query options}), @code{pid} accepts the following: @table @samp @item -e @opindex -e Do the usual regular expression matching (@pxref{Patterns}), instead of shell wildcard matching. @item -b @opindex -b @cindex basename match Match the basenames of the files in the database. For example, @samp{pid -b foo} will match the stored filename @file{dir/foo}, but not @file{foo/file}. @end table For example, the command: @example pid \*.c @end example @noindent lists all the @file{.c} files in the database. (The @samp{\} here protects the @samp{*} from being expanded by the shell.) @menu * Wildcard patterns:: Shell-style globbing patterns. @end menu @node Wildcard patterns @section Wildcard patterns @cindex globbing patterns @cindex shell wildcard patterns @cindex wildcard wildcard patterns @code{pid} does simplified shell wildcard matching (unless the @samp{-e} option is specified), rather than the regular expression matching done by the other utilities. Here is a description of wildcard matching, also called @dfn{globbing}: @itemize @item @kindex * @r{in globbing} @samp{*} matches zero or more characters. @item @kindex ? @r{in globbing} @samp{?} matches any single character. @item @kindex \ @r{in globbing} @samp{\} forces the next character to be taken literally. @item @kindex [@dots{}] @r{in globbing} @samp{[@var{chars}]} matches any single character listed in @var{chars}. @item @kindex [!@dots{}] @r{in globbing} @samp{[!@var{chars}]} matches any character @emph{not} listed in @var{chars}. @end itemize Most shells treat @samp{/} and leading @samp{.} characters specially. @code{pid} does not do this. It simply matches the filename in the database against the wildcard pattern. @node iid invocation @chapter @code{iid}: Complex interactive queries @pindex iid @cindex interactive queries @cindex complex queries @code{iid} is an interactive query utility for ID databases. It operates by running another query program (@code{lid} by default, @code{aid} if @samp{-a} is specified) and manipulating the sets of filenames returned by these queries. @menu * iid command line options:: Command-line options. * iid query expressions:: Operands to the commands. * iid commands:: Printing matching filenames, etc. @end menu @node iid command line options @section @code{iid} command line options @cindex options for @code{iid} @pindex iid @r{options} @code{iid} recognizes the following options (the standard query options described in @ref{Query options} are inapplicable): @table @samp @item -a @opindex -a @pindex aid @r{used for @code{iid} searches} Use @code{aid} for searches, instead of @code{lid}. @item -c@var{command} @pindex -c Execute @var{command} and exit, instead of prompting for interactive commands. @item -H @pindex -H @cindex help for @code{iid} Print a usage message and exit successfully. The @code{help} command inside @code{iid} gives more information. @xref{iid commands}. @end table @node iid query expressions @section @code{iid} query expressions @cindex queries for @code{iid} @pindex iid @r{query expressions} An @code{iid} @dfn{query expression} generates a set of filenames or manipulates existing sets. These expressions are operands to some of the @code{iid} commands (see the next section), not commands themselves. Here are the possible constructs, highest precedence first: @table @samp @item s@var{set-number} Refer to a set previously created by a query operation. During each @code{iid} session, every query generates a different set number, so any previously generated set may be used as part of any new query by reference to its set number. @item @var{pattern} @code{iid} treats any non-keyword input (i.e., anything not in this table) as an identifier to be searched for in the database. It is passed to the search program (@code{lid} by default, @code{aid} if the @code{-a} option was specified). The result of this operation is a set of filenames, and it is assigned a unique set number. @item lid @var{identifier-list} @cmindex lid @r{iid operator} Invoke the @code{lid} program on @var{identifier-list} and construct a new set from the result. @item aid @var{identifier-list} @cmindex lid @r{iid operator} Like @code{lid}, but use the @code{aid} program. @item match @var{wildcards} @cmindex match @r{iid operator} Invoke the @code{pid} program on @var{wildcards}, therefore matching on the filenames in the database instead of the identifiers. The resulting set contains the filenames that match the specified patterns. @xref{pid invocation}. @item not @var{expr} @cmindex not @r{iid operator} The result is those filenames in the database that are not in @var{expr}. @item @var{expr1} and @var{expr2} @cmindex and @r{iid operator} The result is the intersection of the sets @var{expr1} and @var{expr2}, i.e., only those filenames contained in both. @item @var{expr1} or @var{expr2} @cmindex or @r{iid operator} The result is the union of the sets @var{expr1} and @var{expr2}, i.e., all the filenames contained in either or both. @end table Operator names are recognized independent of case, so @code{AND}, @code{and}, and @code{aNd} are all the same as far as @code{iid} is concerned. To pass a keyword as an operand, you must enclose it in double quotes: the command @samp{lid "lid"} generates the set of all filenames matching the string @samp{lid}. Patterns containing shell metacharacters (such as @samp{*} or @samp{?}) must also be properly quoted, since the query commands are run by invoking them with the shell. @c Summary of query expression syntax: @c @c A @var{query} is: @c @example @c @c @c lid @c aid @c match @c or @c and @c not @c ( ) @c @end example @node iid commands @section @code{iid} commands @cindex commands for @code{iid} @pindex iid @r{commands} This section describes the interactive commands that @code{iid} recognizes. The database query expressions you can pass to the @samp{ss} and @samp{files} commands are described in the previous section. Some commands output a @dfn{summary line} for sets. These lines show the set number, the number of filenames in the set, and the command that generated it. @table @samp @item ss @var{query} @cmindex ss iid @r{command} Build the set(s) of filenames resulting from the query expression @var{query}. The output is a summary line for each set. @item files @var{query} @itemx f @var{query} @cmindex files iid @r{command} @cmindex f iid @r{command} Evaluate the query expression @var{query} as in @code{ss}, but output the full list of matching filenames instead of a summary. @item sets @cmindex sets iid @r{command} Output a summary line for each extant set. @item show @var{set} @itemx p @var{set} @cmindex show iid @r{command} @cmindex p iid @r{command} @vindex PAGER @pindex emacsclient Pass the filename in the set number @var{set} to the program named in the @code{PAGER} environment variable. Typically, this is a page-at-a-time display program like @code{less} or @code{more}. If you use Emacs, you might want to set @samp{PAGER} to @code{emacsclient} (@pxref{Emacs Server,,, emacs, The GNU Emacs Manual}). @item @r{anything else} @cindex shell commands in @code{iid} When @code{iid} does not recognize the first word on an input line as a builtin @code{iid} command, it assumes the input is a shell command which will write a list of filenames to standard output, which it gathers into a set as usual. Any set numbers that appear in the input are expanded into the lists of filenames they represent prior to running the command. @item !@var{shell-command} @cmindex ! iid @r{command} @cindex shell escape Expand set numbers appear in @var{shell-command} into the filenames they represent, and pass the result to @file{/bin/sh}. The output is not interpreted. @item begin @var{directory} @itemx b @var{directory} @cmindex begin iid @r{command} @cmindex b iid @r{command} Begin a new @code{iid} session in a different directory (which presumably contains a different database). It deletes all the sets created so far and switches to the specified directory. It is equivalent to exiting @code{iid}, changing directories in the shell, and running @code{iid} again. @item help @itemx h @itemx ? @cmindex help iid @r{command} @cmindex h iid @r{command} @cmindex ? iid @r{command} Display a short help file using the program named in @samp{PAGER}. @item quit @itemx q @itemx off @cmindex quit iid @r{command} @cmindex q iid @r{command} @cmindex off iid @r{command} Quit @code{iid}. An end-of-file character (usually @kbd{CTRL-D}) also exits. @end table @node Index @unnumbered Index @printindex cp @contents @bye