diff options
Diffstat (limited to 'doc/id-utils.texi')
-rw-r--r-- | doc/id-utils.texi | 1378 |
1 files changed, 1378 insertions, 0 deletions
diff --git a/doc/id-utils.texi b/doc/id-utils.texi new file mode 100644 index 0000000..9cc7dd4 --- /dev/null +++ b/doc/id-utils.texi @@ -0,0 +1,1378 @@ +\input texinfo +@comment %**start of header +@setfilename id-utils.info +@settitle ID database utilities +@comment %**end of header + +@include version.texi + +@c Define new indices for filenames, commands and options. +@defcodeindex fl +@defcodeindex cm +@defcodeindex op + +@c Put everything in one index (arbitrarily chosen to be the concept index). +@syncodeindex fl cp +@syncodeindex fn cp +@syncodeindex ky cp +@syncodeindex op cp +@syncodeindex pg cp +@syncodeindex vr cp + +@ifinfo +@format +START-INFO-DIR-ENTRY +* ID database: (id). Identifier database utilities. +* aid: (id)aid invocation. Matching strings. +* eid: (id)eid invocation. Invoking an editor on matches. +* fid: (id)fid invocation. Listing a file's identifiers. +* gid: (id)gid invocation. Listing all matching lines. +* idx: (id)idx invocation. Testing mkid scanners. +* lid: (id)lid invocation. Matching patterns. +* mkid: (id)mkid invocation. Creating an ID database. +* pid: (id)pid invocation. Looking up filenames. +END-INFO-DIR-ENTRY +@end format +@end ifinfo + +@ifinfo +This file documents the @code{mkid} identifier database utilities. + +Copyright (C) 1991, 1995 Tom Horsley. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through TeX and print the +results, provided the printed document carries copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions, +except that this permission notice may be stated in a translation. +@end ifinfo + +@titlepage +@title ID database utilities +@subtitle Programs for simple, fast, high-capacity cross-referencing +@subtitle for version @value{VERSION} +@author Tom Horsley +@author Greg McGary + +@page +@vskip 0pt plus 1filll +Copyright @copyright{} 1991, 1995 Tom Horsley. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions, +except that this permission notice may be stated in a translation. +@end titlepage + + +@ifinfo +@node Top +@top ID database utilities + +This manual documents version @value{VERSION} of the ID database +utilities. + +@menu +* Introduction:: Overview of the tools, and authors. +* mkid invocation:: Creating an ID database. +* Common query arguments:: Common lookup options and search patterns. +* gid invocation:: Listing all matching lines. +* Looking up identifiers:: lid, aid, eid, and fid. +* pid invocation:: Looking up filenames. +* Index:: General index. +@end menu +@end ifinfo + + +@node Introduction +@chapter Introduction + +@cindex overview +@cindex introduction + +@cindex ID database, definition of +An @dfn{ID database} is a binary file containing a list of filenames, a +list of identifiers, and a matrix indicating which identifiers appear in +which files. With this database and some tools to manipulate it +(described in this manual), a host of tasks become simpler and faster. +For example, you can list all files containing a particular +@code{#include} throughout a huge source hierarchy, search for all the +memos containing references to a project, or automatically invoke an +editor on all files containing references to some function. Anyone with +a large software project to maintain, or a large set of text files to +organize, can benefit from an ID database. + +Although the ID utilities are most commonly used with identifiers, +numeric constants are also stored in the database, and can be searched +for in the same way (independent of radix, if desired). + +There are a number of programs in the ID family: + +@table @code + +@item mkid +scans files for identifiers and numeric constants and builds the ID +database file. + +@item gid +lists all lines that match given patterns. + +@item lid +lists the filenames containing identifiers that match given patterns. + +@item aid +lists the filenames containing identifiers that contain given strings, +independent of case. + +@item eid +invokes an editor on each file containing identifiers that match given +patterns. + +@item fid +lists all identifiers recorded in the database for given files, or +identifiers common to two files. + +@item pid +matches the filenames in the database, rather than the identifiers. + +@item idx +helps with testing of new @code{mkid} scanners. + +@end table + +@cindex bugs, reporting +Please report bugs to @samp{gkm@@magilla.cichlid.com}. Remember to +include the version number, machine architecture, input files, and any +other information needed to reproduce the bug: your input, what you +expected, what you got, and why it is wrong. Diffs are welcome, but +please include a description of the problem as well, since this is +sometimes difficult to infer. @xref{Bugs, , , gcc, GNU CC}. + +@menu +* Past and future:: How the ID tools came about, and where they're going. +@end menu + + +@node Past and future +@section Past and future + +@cindex history + +@pindex look @r{and @code{mkid} 1} +@cindex McGary, Greg +Greg McGary conceived of the ideas behind mkid when he began hacking the +Unix kernel in 1984. He needed a navigation tool to help him find his +way around the expansive, unfamiliar landscape. The first @code{mkid}-like +tools were shell scripts, and produced an ASCII database that looks much +like the output of @code{lid} with no arguments. It took over an hour +on a VAX 11/750 to build a database for a 4.1BSD-ish kernel. Lookups +were done with the system utility @code{look}, modified to handle very +long lines. + +In 1986, Greg rewrote @code{mkid}, @code{lid}, @code{fid} and @code{idx} +in C to improve performance. Database-build times were shortened by an +order of magnitude. The @code{mkid} tools were first posted to +@samp{comp.sources.unix} in September 1987. + +@cindex Horsley, Tom +@cindex Scofield, Doug +@cindex Leonard, Bill +@cindex Berry, Karl +Over the next few years, several versions diverged from the original +source. Tom Horsley at Harris Computer Systems Division stepped forward +to take over maintenance and integrated some of the fixes from divergent +versions. A first release of +@code{mkid} @w{version 2} was posted to @file{alt.sources} near the end +of 1990. At that time, Tom wrote this Texinfo manual with the +encouragement the net community. (Tom especially thanks Doug Scofield +and Bill Leonard whom he dragooned into helping poorfraed and +edit---they found several problems in the initial version.) Karl Berry +revamped the manual for Texinfo style, indexing, and organization in +1995. + +@pindex cscope +@pindex grep +@cindex future +In January 1995, Greg McGary reemerged as the primary maintaner and +launched development of @code{mkid} version 3, whose primary new feature +is an efficient algorithm for building databases that is linear in both +time and space over the size of the input text. (The old algorithm was +quadratic in space and therefore choked on very large source trees.) +The code is released under the GNU Public License, and might become a +part of the GNU system. @code{mkid} 3 is an interim release, since +several significant enhancements are still in the works: an optional +coupling with GNU @code{grep}, so that @code{grep} can use an ID +database for hints; a @code{cscope} work-alike query interface; +incremental update of the ID database; and an automatic file-tree walker +so you need not explicitly supply every filename argument to the +@code{mkid} program. + + +@node mkid invocation +@chapter @code{mkid}: Creating ID databases + +@pindex mkid +@cindex creating databases +@cindex databases, creating + +@pindex cron +The @code{mkid} program builds an ID database. To do this, it must scan +each file you tell it to include in the database. This takes some time, +but once the work is done the query programs run very rapidly. (You can +run @code{mkid} as a @code{cron} job to regularly update your +databases.) + +The @code{mkid} program knows how to extract identifiers from various +types of files. For example, it can recognize and skip over comments +and string constants in a C program. + +@cindex numbers, in databases +Identifiers are not the only thing included in the database. Numbers +are also recognized and included in the database indexed by their binary +value. This feature allows you to find uses of constants without regard +to the radix used to specify them, since the same number can frequently +be written in many different ways (for instance, @samp{47}, @samp{0x2f}, +@samp{057} in C). + +All the places in this document which mention identifiers should really +mention both identifiers and numbers, but that gets fairly clumsy after +a while, so you just need to keep in mind that numbers are included in +the database as well as identifiers. + +@cindex ID file format +@cindex architecture-independence +@cindex sharing ID files +The ID files that @code{mkid} creates are architecture- and +byte-order-independent; you can share them at will across systems. + +@menu +* mkid options:: Command-line options to mkid. +* Scanners:: Built-in and defining your own. +* mkid examples:: Examples of mkid usage. +@end menu + + +@node mkid options +@section @code{mkid} options + +@cindex options for @code{mkid} +@pindex mkid @r{options} + +By default, @code{mkid} scans the files you specify and writes the +database to a file named @file{ID} in the current directory. + +@example +mkid [-v] [-S@var{scanarg}] [-a@var{argfile}] [-] [-f@var{idfile}] @c +@var{files}@dots{} +@end example + +The program accepts the following options. + +@table @samp + +@item -v +@opindex -v +@cindex statistics +Verbose. @code{mkid} tells you as it scans each file and indicates +which scanner it is using. It also summarizes some statistics about the +database at the end. + +@item -S@var{scanarg} +@opindex -S@var{scanarg} +Specify options regarding @code{mkid}'s scanners. @xref{Scanner option +formats}. + +@item -a@var{argfile} +@opindex -a@var{argfile} +Read additional command line arguments from @var{argfile}. This is +typically used to specify lists of filenames longer than will fit on a +command line; some systems have severe limitations on the total length +of a command line. + +@item - +@opindex - +Read additional command line arguments from standard input. + +@item -f@var{idfile} +Write the database to the file @var{idfile}, instead of @file{ID}. The +database stores filenames relative to the directory containing the +database, so if you move the database to a different directory after +creating it, you may have trouble finding files. + +@c @item -u +@c @opindex -u +@c The @code{-u} option updates an existing database by rescanning any +@c files that have changed since the database was written. Unfortunately +@c you cannot incrementally add new files to a database. +@c Greg is reimplementing this ... + +@end table + +The remaining arguments @var{files} are the files to be scanned and +included in the database. If no files are given at all (either on +command line or via @samp{-a} or @samp{-}), @code{mkid} does nothing. + + +@node Scanners +@section Scanners + +@cindex scanners + +To determine which identifiers to extract from a file and store in the +database, @code{mkid} calls a @dfn{scanner}; we say a scanner +@dfn{recognizes} a particular language. Scanners for several languages +are built-in to @code{mkid}; you can add your own scanners as well, as +explained in the sections below. + +@cindex suffixes of filenames +@code{mkid} determines which scanner to use for a particular file by +looking at the suffix of the filename. This @dfn{suffix} is everything +after and including the last @samp{.} in a filename; for example, the +suffix of @file{foo.c} is @file{.c}. @code{mkid} has a built-in list of +bindings from some suffixes to corresponding scanners; for example, +@file{.c} files are (not surprisingly) scanned by the predefined C +language scanner. + +@findex .default @r{scanner} +If @code{mkid} cannot determine what scanner to use for a particular +file, either because the file has no suffix (e.g., @file{foo}) or +because @code{mkid} has no binding for the file's suffix (e.g., +@file{foo.bar}), it uses the scanner bound to the @samp{.default} +suffix. By default, this is the plain text scanner (@pxref{Plain text +scanner}), but you can change this with the @samp{-S} option, as +explained below. + +@menu +* Scanner option formats:: Overview of the -S option. +* Predefined scanners:: The C, plain text, and assembler scanners. +* Defining new scanners:: Either in source code or at runtime with -S. +* idx invocation:: Testing mkid scanners. +@end menu + + +@node Scanner option formats +@subsection Scanner option formats + +@cindex scanner options +@opindex -S @r{scanner option} + +With the @samp{-S} option, you can change which language scanner to use +for which files, give language-specific options, and get some limited +online help about scanner options. + +Here are the different forms of the @samp{-S} option: + +@table @samp + +@item -S.@var{suffix}=@var{scanner} +@opindex -S. +Use @var{scanner} for a file with the given @samp{.@var{suffix}}. For +example, @samp{-S.yacc=c} tells @code{mkid} to use the @samp{c} language +scanner for all files ending in @samp{.yacc}. + +@item -S.@var{suffix}=? +Display which scanner is used for the given @samp{.@var{suffix}}. + +@item -S?=@var{scanner} +@opindex -S? +Display which suffixes @var{scanner} is used for. + +@item -S?=? +Display the scanner binding for every known suffix. + +@item -S@var{scanner}+@var{arg} +@itemx -S@var{scanner}-@var{arg} +Each scanner accepts certain scanner-dependent arguments. These options +all have one of these forms. @xref{Predefined scanners}. + +@item -S@var{scanner}? +Display the scanner-specific options accepted by @var{scanner}. + +@item -S@var{new-scanner}/@var{old-scanner}/@var{filter-command} +Define @var{new-scanner} in terms of @var{old-scanner} and +@var{filter-command}. @xref{Defining scanners with options}. + +@end table + + +@node Predefined scanners +@subsection Predefined scanners + +@cindex predefined scanners +@cindex scanners, predefined + +@code{mkid} has built-in scanners for several types of languages; you +can get the list by running @code{mkid -S?=?}. +The supported languages are documented +below@footnote{This is not strictly true: @samp{vhil} is a supported +language, but it is an obsolete and arcane dialect of C and should be +ignored.}. + +@menu +* C scanner:: For the C programming language. +* Plain text scanner:: For documents or other non-source code. +* Assembler scanner:: For assembly language. +@end menu + + +@node C scanner +@subsubsection C scanner + +@cindex C scanner, predefined +@flindex .[chly] @r{files, scanning} + +The C scanner is the most commonly used. Files with the usual @file{.c} +and @file{.h} suffixes, and the @file{.y} (yacc) and @file{.l} (lex) +suffixes, are processed with this scanner (by default). + +Scanner-specific options: + +@table @samp + +@item -Sc-s@var{character} +@kindex $ @r{in identifiers} +@opindex -Sc-s +Allow the specified @var{character} in identifiers. For example, if you +use @samp{$} in identifiers, you'll want to use @samp{-Sc-s$}. + +@item -Sc+u +@opindex -Sc+u +Strip leading underscores from identifiers. You might to do this in +peculiar circumstances, such as trying to parse the output from +@code{nm} or some other system utility. + +@item -Sc-u +@opindex -Sc-u +Don't strip leading underscores from identifiers; this is the default. + +@end table + + +@node Plain text scanner +@subsubsection Plain text scanner + +@cindex plain text scanner + +The plain text scanner is intended for scanning most non-source-code +files. This is typically the scanner used when adding custom scanners +via @samp{-S} (@pxref{Defining scanners with options}). + +@c @code{mkid} predefines a troff scanner in terms of the plain text +@c scanner and +@c the @code{deroff} utility. +@c A compressed man page +@c scanner runs @code{pcat} piped into @code{col -b}, and a @TeX{} scanner +@c runs @code{detex}. + +Scanner-specific options: + +@table @samp + +@item -Stext+a@var{character} +@opindex -Stext+a +Include @var{character} in identifiers. By default, letters (a--z and +A--Z) and underscore are included. + +@item -Stext-a@var{character} +@opindex -Stext-a +Exclude @var{character} from identifiers. + +@item -Stext+s@var{character} +@opindex -Stext+s +@cindex squeezing characters from identifiers +Squeeze @var{character} from identifiers, i.e., do not terminate an +identifier when @var{character} is seen. By default, the characters +@samp{'}, @samp{-}, and @samp{.} are squeezed out of identifiers. For +example, the input @samp{fred's} leads to the identifier @samp{freds}. + +@item -Stext-s@var{character} +Do not squeeze @var{character}. + +@end table + + +@node Assembler scanner +@subsubsection Assembler scanner + +@cindex assembler scanner + +Since assembly languages come in several flavors, this scanner has a +number of options: + +@table @samp + +@item -Sasm-c@var{character} +@opindex -Sasm-c +@cindex comments in assembler +Define @var{character} as starting a comment that extends to the end of +the input line; no default. In many assemblers this is @samp{;} or +@samp{#}. + +@item -Sasm+u +@itemx -Sasm-u +@opindex -Sasm+u +Strip (@samp{+u}) or do not strip (@samp{-u}) leading underscores from +identifiers. The default is to strip them. + +@item -Sasm+a@var{character} +@opindex -Sasm+a +Allow @var{character} in identifiers. + +@item -Sasm-a@var{character} +Allow @var{character} in identifiers, but if an identifier contains +@var{character}, ignore it. This is useful to ignore temporary labels, +which can be generated in great profusion; these often contain @samp{.} +or @samp{@@}. + +@item -Sasm+p +@itemx -Sasm-p +@opindex -Sasm+p +Recognize (@samp{+p}) or do not recognize (@samp{-p}) C preprocessor +directives in assembler source. The default is to recognize them. + +@item -Sasm+C +@itemx -Sasm-C +@opindex -Sasm+C +Skip over (@samp{+C}) or do not skip over (@samp{-C}) C style comments +in assembler source. The default is to skip them. + +@end table + + +@node Defining new scanners +@subsection Defining new scanners + +@cindex scanners, adding new + +You can add new scanners to @code{mkid} in two ways: modify the source +code and recompile, or at runtime via the @samp{-S} option. Each has +their advantages and disadvantages, as explained below. + +If you create a new scanner that would be of use to others, please +consider sending it back to the maintainer, +@samp{gkm@@magilla.cichlid.com}, for inclusion in future releases of +@code{mkid}. + +@menu +* Defining scanners in source code:: +* Defining scanners with options:: +@end menu + + +@node Defining scanners in source code +@subsubsection Defining scanners in source code + +@flindex scanners.c +@cindex scanners, defining in source code + +@vindex languages_0 +@vindex suffixes_0 +To add a new scanner in source code, you should add a new section to the +file @file{scanners.c}. Copy one of the existing scanners (most likely +either C or plain text), and modify as necessary. Also add the new +scanner to the @code{languages_0} and @code{suffixes_0} tables near the +beginning of the file. + +This is not a terribly difficult programming task, but it requires +recompiling and installing the new version of @code{mkid}, which may be +inconvenient. + +This method leads to scanners which operate much more quickly than ones +that depend on external programmers. It is also likely the easiest way +to define scanners for new programming languages. + + +@node Defining scanners with options +@subsubsection Defining scanners with options + +@cindex scanners, defining with options + +You can use the @samp{-S} option on the command line to define a new +language scanner: + +@example +-S@var{new-scanner}/@var{existing-scanner}/@var{filter} +@end example + +@noindent +Here, @var{new-scanner} is the name of the new scanner being defined, +@var{existing-scanner} is the name of an existing scanner, and +@var{filter} is a shell command or pipeline. + +The new scanner works by passing the input file to @var{filter}, and +then arranging for the result to be passed through +@var{existing-scanner}. Typically, @var{existing-scanner} is @samp{text}. + +Somewhere within @var{filter}, the string@samp{%s} should occur. This +@samp{%s} is replaced by the name of the source file being scanned. + +@cindex Texinfo, scanning example of +For example, @code{mkid} has no built-in scanner for Texinfo files (like +this one). In indexing a Texinfo file, you most likely would want +to ignore the Texinfo @@-commands. Here's one way to specify a new +scanner to do this: + +@example +-S/texinfo/text/sed s,@@[a-z]*,,g %s +@end example + +This defines a new language scanner (@samp{texinfo}) defined in terms of +a @code{sed} command to strip out Texinfo directives (an @samp{@@} +character followed by letters). Once the directives are stripped, the +remaining text is run through the plain text scanner. + +This is a minimal example; to do a complete job, you would need to +completely delete some lines, such as those beginning with @code{@@end} +or @@node. + + +@node idx invocation +@subsection @code{idx}: Testing @code{mkid} scanners + +@code{idx} prints the identifiers found in the files you specify to +standard output. This is useful in debugging new @code{mkid} scanners +(@pxref{Scanners}). Synopsis: + +@example +idx [-S@var{scanarg}] @var{files}@dots{} +@end example + +@code{idx} accepts the same @samp{-S} options as @code{mkid}. +@xref{Scanner option formats}. + +The name ``idx'' stands for ``ID eXtract''. The name may change in +future releases, since this is such an infrequently used program. + + +@node mkid examples +@section @code{mkid} examples + +@cindex examples of @code{mkid} + +The simplest example of @code{mkid} is something like: + +@example +mkid *.[chy] +@end example + +This will build an ID database indexing identifiers and numbers in the +all the @file{.c}, @file{.h}, and @file{.y} files in the current +directory. Because @code{mkid} already knows how to scan files with +those suffixes, no additional options are needed. + +@cindex man pages, compressed +@cindex compressed files, building ID from +Here's a more complex example. Suppose you want to build a database +indexing the contents of all the @code{man} pages, and furthur suppose +that your system is using @code{gzip} (@pxref{Top, , , gzip, Gzip}) to +store compressed @code{cat} versions of the @code{man} pages in the +directory @file{/usr/catman}. The @code{gzip} program creates files +with a @code{.gz} suffix, so you must tell @code{mkid} how to scan +@file{.gz} files. Here are the commands to do the job: + +@example +cd /usr/catman +find . -name \*.gz -print | mkid '-Sman/text/gzip <%s' -S.gz=man - +@end example + +@noindent Explanation: + +@enumerate + +@item +We first @code{cd} to @file{/usr/catman} so the ID database +will store the correct relative filenames. + +@item +The @code{find} command prints the names of all @file{.gz} files under +the current directory. @xref{find invocation, , , sh-utils, GNU shell +utilities}. + +@item +This list is piped to @code{mkid}; the @code{-} option (at the end of +the line) tells @code{mkid} to read arguments (in this case, as is +typical, the list of filenames) from standard input. @xref{mkid options}. + +@item +The @samp{-Sman/text/gzip @dots{}} defines a new language @samp{man} in +terms of the @code{gzip} program and @code{mkid}'s existing text +scanner. @xref{Defining scanners with options}. + +@item +The @samp{-S.gz=man} tells @code{mkid} to treat all @file{.gz} files as +this new language @code{man}. @xref{Scanner option formats}. + +@end enumerate + +As a further complication, @code{cat} pages typically contain +underlining and backspace sequences, which will confuse @code{mkid}. To +handle this, the @code{gzip} command becomes a pipeline, like this: + +@example +mkid '-Sman/text/gzip <%s | col -b' -S.gz=man - +@end example + + +@node Common query arguments +@chapter Common query arguments + +@cindex common query arguments + +Certain options, and regular expression syntax, are shared by the ID +query tools. So we describe those things in the sections below, instead +of repeating the description for each tool. + +@menu +* Query options:: -f -r -c -ew -kg -n -doxa -m -F -u. +* Patterns:: Regular expression syntax for searches. +* Examples: Query examples. Some common uses. +@end menu + + +@node Query options +@section Query options + +@cindex query options, common +@cindex common query options + +The ID query tools (@emph{not} @code{mkid}) share certain command line +options. Not all of these options are recognized by all programs, but +if an option is used by more than one program, it is described below. +The description of each program gives the options that program uses. + +@table @samp + +@item -f@var{idfile} +@opindex -f@var{idfile} +@cindex database name, specifying +@cindex parent directories, searched for ID +Read the database from @var{idfile}, in the current directory or in any +directory above the current directory. The default database name is +@file{ID}. Searching parent directories lets you have a single ID +database at the root of a large source tree and then use the query tools +from anywhere within that tree. + +@item -r@var{directory} +@opindex -r@var{directory} +Find files relative to @var{directory}, instead of the directory in +which the ID database was found. This is useful if the ID database was +moved after its creation. + +@item -c +@opindex -c +Equivalent to @code{-r`pwd`}, i.e., find files relative to the current +directory, instead of the directory in which the ID database was found. + +@item -e +@itemx -w +@opindex -e +@opindex -w +@cindex regular expressions, forcing evaluation as +@cindex strings, forcing evaluation as +@cindex constant strings, forcing evaluation as +@samp{-e} forces pattern arguments to be treated as regular expressions, +and @samp{-w} forces pattern arguments to be treated as constant +strings. By default, the query tools guess whether a pattern is regular +expressions or constant strings by looking for special characters. +@xref{Patterns}. + +@item -k +@itemx -g +@opindex -k +@opindex -g +@cindex brace notation in filename lists +@cindex shell brace notation in filename lists +@samp{-k} suppresses use of shell brace notation in the output. By +default, the query tools that generate lists of filenames attempt to +compress the lists using the usual shell brace notation, e.g., +@file{@{foo,bar@}.c} to mean @file{foo.c} and @file{bar.c}. (This is +useful if you use @code{ksh} or the original (not GNU) @code{sh} and +want to feed the list of names to another command, since those shells do +not support this brace notation; the name of the @code{-k} option comes +from the @code{k} in @code{ksh}). + +@samp{-g} turns on use of brace notation; this is only needed if the +query tools were compiled with @samp{-k} as the default behavior. + +@item -n +@opindex -n +@cindex suppressing matching identifier +Suppress the matching identifier before each list of filenames that the +query tools output by default. This is useful if you want a list of just +the names to feed to another command. + +@item -d +@itemx -o +@itemx -x +@itemx -a +@opindex -d +@opindex -o +@opindex -x +@opindex -a +@cindex radix of numeric matches, specifying +@cindex numeric matches, specifying radix of +These options may be used in any combination to specify the radix of +numeric matches. @samp{-d} allows matching on decimal numbers, +@samp{-o} on octal numbers, and @samp{-x} on hexadecimal numbers. The +@code{-a} option is equivalent to specifying all three; this is the +default. Any combination of these options may be used. + +@item -m +@opindex -m +@cindex multiple lines, merging +Merge multiple lines of output into a single line. If your query +matches more than one identifier, the default is to generate a separate +line of output for each matching identifier. + +@itemx -F- +@itemx -F@var{n} +@itemx -F-@var{m} +@itemx -F@var{n}-@var{m} +@opindex -F +@cindex single matches, showing +Show identifiers matching at least @var{n} and at most @var{m} times. +@samp{-F-} is equivalent to @samp{-F1}, i.e., find identifiers that +appear only once in the database. (This is useful to locate identifiers +that are defined but never used, or used once and never defined.) + +@item -u@var{number} +@opindex -u +@cindex conflicting identifiers, finding +List identifiers that conflict in the first @var{number} characters. +This could be in useful porting programs to brain-dead computers that +refuse to support long identifiers, but your best long term option is to +set such computers on fire. + +@end table + + +@node Patterns +@section Patterns + +@cindex patterns +@cindex regular expression syntax + +@dfn{Patterns}, also called @dfn{regular expressions}, allow you to +match many different identifiers in a single query. + +The same regular expression syntax is recognized by all the query tools +that handle regular expressions. The exact syntax depends on how the ID +tools were compiled, but the following constructs should always be +supported: + +@table @samp + +@item . +Match any single character. + +@item [@var{chars}] +Match any of the characters specified within the brackets. You can +match any characters @emph{except} the ones in brackets by typing +@samp{^} as the first character. A range of characters can be specified +using @samp{-}. For example, @samp{[abc]} and @samp{[a-c]} both match +@samp{a}, @samp{b}, or @samp{c}, and @samp{[^abc]} matches anything +@emph{except} @samp{a}, @samp{b}, or @samp{c}. + +@item * +Match the previous construct zero or more times. + +@item ^ +@itemx $ +@samp{^} (@samp{$}) at the beginning (end) of a pattern anchors the +match to the first (last) character of the identifier. + +@end table + +The query programs use either the @code{regex}/@code{regcmp} or +@code{re_comp}/@code{re_exec} functions, depending on which are +available in the library on your system. These do not always support +the exact same regular expression syntax, so consult your local +@code{man} pages to find out. + + +@node Query examples +@section Query examples + +@cindex examples, queries +@cindex query examples +Here are some examples of the options described in the previous +sections. + +To restrict searches to exact matches, use @samp{^@dots{}$}. For example: + +@example +prompt$ gid '^FILE$' +ansi2knr.c:144: @{ FILE *in, *out; +ansi2knr.c:315: FILE *out; +fid.c:38: FILE *id_FILE; +filenames.c:576: FILE * +@dots{} +@end example + +To show identifiers not unique in the first 16 characters: + +@example +prompt$ lid -u16 +RE_CONTEXT_INDEP_ANCHORS regex.c +RE_CONTEXT_INDEP_OPS regex.c +RE_SYNTAX_POSIX_BASIC regex.c +RE_SYNTAX_POSIX_EXTENDED regex.c +@dots{} +@end example + +@cindex numeric searches +Numbers are searched for numerically rather than textually. For example: + +@example +prompt$ lid 0xff +0377 @{lid,regex@}.c +0xff @{bitops,fid,lid,mkid@}.c +255 regex.c +@end example + +On the other hand, you can restrict a numeric search to a particular +radix if you want: + +@example +laurie$ lid -x 0xff +0xff @{bitops,fid,lid,mkid@}.c +@end example + +Filenames in the output are always adjusted to be correct for the +correct working directory. For example: + +@example +prompt$ lid bdevsw +bdevsw sys/conf.h cf/conf.c io/bio.c os/@{fio,main,prf,sys3@}.c +prompt$ cd io +prompt$ lid bdevsw +bdevsw ../sys/conf.h ../cf/conf.c bio.c ../os/@{fio,main,prf,sys3@}.c +@end example + + +@node gid invocation +@chapter @code{gid}: Listing matching lines + +Synopsis: + +@example +gid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}@dots{}] +@end example + +@code{gid} finds the identifiers in the database that match the +specified @var{pattern}s, then searches for all occurrences of those +identifiers, in only the files containing matches. In a large source +tree, this saves an enormous amount of time (compared to searching every +source file). + +With no @var{pattern} arguments, @code{gid} prints every line of every +source file. + +The name ``gid'' stands for ``grep for identifiers'', @code{grep} being +the standard utility to search regular files. + +@xref{Common query arguments}, for a description of the command-line +options and @var{pattern} arguments. + +@code{gid} uses the standard GNU output format for identifying source lines: + +@example +@var{filename}:@var{linenum}: @var{text} +@end example + +Here is an example: + +@example +prompt$ gid FILE +ansi2knr.c:144: @{ FILE *in, *out; +ansi2knr.c:315: FILE *out; +fid.c:38: FILE *id_FILE; +@dots{} +@end example + +@menu +* GNU Emacs gid interface:: Using next-error with gid. +@end menu + + +@node GNU Emacs gid interface +@section GNU Emacs @code{gid} interface + +@cindex Emacs interface to @code{gid} +@flindex gid.el @r{interface to Emacs} + +@vindex load-path +The @code{mkid} source distribution comes with a file @file{gid.el}, +which defines a GNU Emacs interface to @code{gid}. To install it, put +@file{gid.el} somewhere that Emacs will find it (i.e., in your +@code{load-path}) and put + +@example +(autoload 'gid "gid" nil t) +@end example + +@noindent in one of Emacs' initialization files, e.g., @file{~/.emacs}. +You will then be able to use @kbd{M-x gid} to run the command. + +@findex gid @r{Emacs function} +The @code{gid} function prompts you with the word around point. If you +want to search for something else, simply delete the line and type the +pattern of interest. + +@flindex *scratch* @r{Emacs buffer} +The function then runs the @code{gid} program in a @samp{*compilation*} +buffer, so the normal @code{next-error} function can be used to visit +all the places the identifier is found (@pxref{Compilation,,, emacs, The +GNU Emacs Manual}). + + +@node Looking up identifiers +@chapter Looking up identifiers + +These commands look up identifiers in the ID database and operate on the +files containing matches. + +@menu +* lid invocation:: Matching patterns. +* aid invocation:: Matching strings. +* eid invocation:: Invoking an editor on matches. +* fid invocation:: Listing a file's identifiers. +@end menu + + +@node lid invocation +@section @code{lid}: Matching patterns + +@pindex lid + +Synopsis: + +@example +lid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c +@var{pattern}@dots{} +@end example + +@code{lid} searches the database for identifiers matching the given +@var{pattern} arguments and prints the names of the files that match +each @var{pattern}. With no @var{pattern}s, @code{lid} lists every +entry in the database. + +The name ``lid'' stands for ``lookup identifier''. + +@xref{Common query arguments}, for a description of the command-line +options and @var{pattern} arguments. + +By default, each line of output consists of an identifier and all the +files containing that identifier. + +Here is an example showing a search for a single identifier (omitting +some output to keep lines short): + +@example +prompt$ lid FILE +FILE extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c +@end example + +This example shows a regular expression search: + +@example +prompt$ lid 'FILE$' +AF_FILE mkid.c +AF_IDFILE mkid.c +FILE extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c +IDFILE id.h @{fid,lid,mkid@}.c +IdFILE @{fid,lid@}.c +@dots{} +@end example + +@noindent As you can see, when a regular expression is used, it is +possible to get more than one line of output. To merge multiple lines +into one, use @samp{-m}: + +@example +prompt$ lid -m ^get +^get extern.h @{bitsvec,fid,gets0,getsFF,getscan,idx,lid,@dots{}@}.c +@end example + + +@node aid invocation +@section @code{aid}: Matching strings + +@pindex aid + +Synopsis: + +@example +aid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c +@var{string}@dots{} +@end example + +@cindex case-insensitive searching +@cindex string searching +@code{aid} searches the database for identifiers containing the given +@var{string} arguments. The search is case-insensitive. + +@flindex whatis +The name ``aid'' stands for ``apropos identifier'', @code{apropros} +being a command that does a similar search of the @code{whatis} database +of @code{man} descriptions. + +For example, @samp{aid get} matches the identifiers @code{fgets}, +@code{GETLINE}, and @code{getchar}. + +The default output format is the same as @code{lid}; see the previous +section. + +@xref{Common query arguments}, for a description of the command-line +options and @var{pattern} arguments. + + +@node eid invocation +@section @code{eid}: Invoking an editor on matches + +@pindex eid + +Synopsis: + +@example +eid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}]@dots{} +@end example + +@code{eid} runs the usual search (@pxref{lid invocation}) on the given +arguments, shows you the output, and then asks: + +@example +Edit? [y1-9^S/nq] +@end example + +@noindent +You can respond with: + +@table @samp +@item y +Edit all files listed. + +@item 1@dots{}9 +Edit all files starting at the @math{@var{n} + 1}'st file. + +@item /@var{string} @r{or} @kbd{CTRL-S}@var{string} +Edit all files whose name contains @var{string}. + +@item n +Go on to the next @var{pattern}, i.e., edit no files for this one. + +@item q +Quit @code{eid}. + +@end table + +@code{eid} invokes an editor once per @var{pattern}; all the specified +files are given to the editor for you to edit simultaneously. + +@code{eid} invokes the editor defined by the @samp{EDITOR} environment +variable. If the editor can accept an initial search argument on the +command line, @code{eid} moves automatically to the location of the +match, via the environment variables below. + +@xref{Common query arguments}, for a description of the command-line +options and @var{pattern} arguments. + +Here are the environment variables relevant to @code{eid}: + +@table @samp + +@item EDITOR +@vindex EDITOR +The name of the editor program to invoke. + +@item EIDARG +@vindex EIDARG +@cindex search for identifier, initial +The argument to pass to the editor to search for the matching +identifier. For @code{vi}, this should be @samp{+/%s/'}. + +@item EIDLDEL +@vindex EIDLDEL +@cindex left delimiter editor argument +@cindex beginning-of-word editor argument +A regular expression to force a match at the beginning of a word (``left +delimiter). @code{eid} inserts this in front of the matching identifier +when composing the search argument. For @code{vi}, this should be +@samp{\<}. + +@item EIDRDEL +@vindex EIDRDEL +@cindex right delimiter editor argument +@cindex end-of-word editor argument +The end-of-word regular expression. For @code{vi}, this should be +@samp{\>}. + +@end table + +For Emacs users, the interface in @code{gid.el} is probably preferable +to @code{eid}. @xref{GNU Emacs gid interface}. + + +Here is an example: + +@example +prompt$ eid FILE \^print +FILE @{ansi2knr,fid,filenames,idfile,idx,lid,misc,@dots{}@}.c +Edit? [y1-9^S/nq] n +^print @{ansi2knr,fid,getopt,getopt1,lid,mkid,regex,scanners@}.c +Edit? [y1-9^S/nq] 2 +@end example + +@noindent This will start editing at @file{getopt}.c. + + +@node fid invocation +@section @code{fid}: Listing a file's identifiers + +@pindex fid +@cindex identifiers in a file + +@code{fid} lists the identifiers found in a given file. Synopsis: + +@example +fid [-f@var{dbfile}] @var{file1} [@var{file2}] +@end example + +@table @samp + +@item -f@var{dbfile} +Read the database from @var{dbfile} instead of @file{ID}. + +@item @var{file1} +List all the identifiers contained in @var{file1}. + +@item @var{file2} +With a second file argument, list only the identifiers both files have +in common. + +@end table + +The output is simply one identifier (or number) per line. + + +@node pid invocation +@chapter @code{pid}: Looking up filenames + +@pindex pid +@cindex filenames, matching +@cindex matching filenames + +@code{pid} matches the filenames stored in the ID database, rather than +the identifiers. Synopsis: + +@example +pid [-f@var{dbfile}] [-r@var{dir}] [-ebkgnc] @var{wildcard}@dots{} +@end example + +By default, the @var{wildcard} patterns are treated as shell globbing +patterns, rather than the regular expressions the other utilities +accept. See the section below for details. + +Besides the standard options given in the synopsis (@pxref{Query +options}), @code{pid} accepts the following: + +@table @samp + +@item -e +@opindex -e +Do the usual regular expression matching (@pxref{Patterns}), instead +of shell wildcard matching. + +@item -b +@opindex -b +@cindex basename match +Match the basenames of the files in the database. For example, +@samp{pid -b foo} will match the stored filename @file{dir/foo}, but not +@file{foo/file}. + +@end table + +For example, the command: + +@example +pid \*.c +@end example + +@noindent lists all the @file{.c} files in the database. (The @samp{\} +here protects the @samp{*} from being expanded by the shell.) + +@menu +* Wildcard patterns:: Shell-style globbing patterns. +@end menu + + +@node Wildcard patterns +@section Wildcard patterns + +@cindex globbing patterns +@cindex shell wildcard patterns +@cindex wildcard wildcard patterns + +@code{pid} does simplified shell wildcard matching (unless the @samp{-e} +option is specified), rather than the regular expression matching done +by the other utilities. Here is a description of wildcard matching, +also called @dfn{globbing}: + +@itemize @bullet + +@item +@kindex * @r{in globbing} +@samp{*} matches zero or more characters. + +@item +@kindex ? @r{in globbing} +@samp{?} matches any single character. + +@item +@kindex \ @r{in globbing} +@samp{\} forces the next character to be taken literally. + +@item +@kindex [@dots{}] @r{in globbing} +@samp{[@var{chars}]} matches any single character listed in @var{chars}. + +@item +@kindex [!@dots{}] @r{in globbing} +@samp{[!@var{chars}]} matches any character @emph{not} listed in @var{chars}. + +@end itemize + +Most shells treat @samp{/} and leading @samp{.} characters +specially. @code{pid} does not do this. It simply matches the filename +in the database against the wildcard pattern. + + +@node Index +@unnumbered Index + +@printindex cp + +@contents +@bye |