diff options
Diffstat (limited to 'doc/id-utils.texi')
-rw-r--r-- | doc/id-utils.texi | 1921 |
1 files changed, 905 insertions, 1016 deletions
diff --git a/doc/id-utils.texi b/doc/id-utils.texi index 9cc7dd4..bda4734 100644 --- a/doc/id-utils.texi +++ b/doc/id-utils.texi @@ -6,7 +6,7 @@ @include version.texi -@c Define new indices for filenames, commands and options. +@c Define new indices for file names, commands and options. @defcodeindex fl @defcodeindex cm @defcodeindex op @@ -22,23 +22,20 @@ @ifinfo @format START-INFO-DIR-ENTRY -* ID database: (id). Identifier database utilities. -* aid: (id)aid invocation. Matching strings. -* eid: (id)eid invocation. Invoking an editor on matches. -* fid: (id)fid invocation. Listing a file's identifiers. -* gid: (id)gid invocation. Listing all matching lines. -* idx: (id)idx invocation. Testing mkid scanners. -* lid: (id)lid invocation. Matching patterns. -* mkid: (id)mkid invocation. Creating an ID database. -* pid: (id)pid invocation. Looking up filenames. +* ID database: (id-utils). Identifier database utilities. +* mkid: (id-utils)mkid invocation. Creating an ID database. +* lid: (id-utils)lid invocation. Matching words and patterns. +* fid: (id-utils)fid invocation. Listing a file's tokens. +* fnid: (id-utils)fnid invocation. Looking up file names. +* xtokid: (id-utils)xtokid invocation. Testing mkid scanners. END-INFO-DIR-ENTRY @end format @end ifinfo @ifinfo -This file documents the @code{mkid} identifier database utilities. +This file documents the @file{id-utils} database utilities. -Copyright (C) 1991, 1995 Tom Horsley. +Copyright (C) 1996 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -63,970 +60,955 @@ except that this permission notice may be stated in a translation. @titlepage @title ID database utilities -@subtitle Programs for simple, fast, high-capacity cross-referencing +@subtitle Programs for simple, fast, high-capacity cross-referencing @subtitle for version @value{VERSION} -@author Tom Horsley @author Greg McGary - -@page -@vskip 0pt plus 1filll -Copyright @copyright{} 1991, 1995 Tom Horsley. - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the entire -resulting derived work is distributed under the terms of a permission -notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that this permission notice may be stated in a translation. +@author Tom Horsley @end titlepage - @ifinfo +@c ************* gkm ********************************************************* @node Top -@top ID database utilities +@top ID utilities -This manual documents version @value{VERSION} of the ID database -utilities. +This manual documents version @value{VERSION} of the ID utilities. @menu -* Introduction:: Overview of the tools, and authors. +* Introduction:: Overview of the tools with tutorial. +* Quick start:: Quick start procedure. +* Common options:: Common command-line options. * mkid invocation:: Creating an ID database. -* Common query arguments:: Common lookup options and search patterns. -* gid invocation:: Listing all matching lines. -* Looking up identifiers:: lid, aid, eid, and fid. -* pid invocation:: Looking up filenames. +* lid invocation:: Querying an ID database by token. +* fid invocation:: Listing a file's tokens. +* fnid invocation:: Looking up file names. +* xtokid invocation:: Testing language scanners. +* Past and Future:: History and future directions. * Index:: General index. @end menu @end ifinfo - +@c ************* gkm ********************************************************* @node Introduction @chapter Introduction @cindex overview @cindex introduction - @cindex ID database, definition of -An @dfn{ID database} is a binary file containing a list of filenames, a -list of identifiers, and a matrix indicating which identifiers appear in -which files. With this database and some tools to manipulate it -(described in this manual), a host of tasks become simpler and faster. -For example, you can list all files containing a particular -@code{#include} throughout a huge source hierarchy, search for all the -memos containing references to a project, or automatically invoke an -editor on all files containing references to some function. Anyone with -a large software project to maintain, or a large set of text files to -organize, can benefit from an ID database. -Although the ID utilities are most commonly used with identifiers, -numeric constants are also stored in the database, and can be searched -for in the same way (independent of radix, if desired). +An @dfn{ID database} is a binary file containing a list of file names, a +list of tokens, and a sparse matrix indicating which tokens +appear in which files. -There are a number of programs in the ID family: +With this database and some tools to query it (described in this +manual), many text-searching tasks become simpler and faster. For +example, you can list all files that reference a particular +@code{#include} file throughout a huge source hierarchy, search for all +the memos containing references to a project, or automatically invoke an +editor on all files containing references to some function or variable. +Anyone with a large software project to maintain, or a large set of text +files to organize, can benefit from the ID utilities. -@table @code +Although the name `ID' is short for `identifier', the ID utilities +handle more than just identifiers; they also treat other kinds of +tokens, most notably numeric constants, and the contents of certain +character strings. Thus, this manual will use the word @dfn{token} as a +term that is inclusive of identifiers, numbers and strings. -@item mkid -scans files for identifiers and numeric constants and builds the ID -database file. +There are several programs in the ID utilities family: -@item gid -lists all lines that match given patterns. +@table @file + +@item mkid +scans files for tokens and builds the ID database file. @item lid -lists the filenames containing identifiers that match given patterns. +queries the ID database for tokens, then reports matching file names or +matching lines. -@item aid -lists the filenames containing identifiers that contain given strings, -independent of case. +@item fid +lists all tokens recorded in the database for given files, or +tokens common to two files. -@item eid -invokes an editor on each file containing identifiers that match given -patterns. +@item fnid +matches the file names in the database, rather than the tokens. -@item fid -lists all identifiers recorded in the database for given files, or -identifiers common to two files. +@item xtokid +extracts raw tokens---helps with testing of new @file{mkid} scanners. + +@end table + +In addition, the ID utilities have historically provided several query +programs which are specializations of @file{lid}: + +@table @file + +@item gid +(alias for @samp{lid -R grep}) +lists all lines containing the requested pattern. -@item pid -matches the filenames in the database, rather than the identifiers. +@item eid +(alias for @samp{lid -R edit}) +invokes an editor on all files containing the requested pattern, and +if possible, initiates a text search for that pattern. -@item idx -helps with testing of new @code{mkid} scanners. +@item aid +(alias for @samp{lid -ils}) treats the requested pattern +as a case-insensitive literal substring. @end table @cindex bugs, reporting -Please report bugs to @samp{gkm@@magilla.cichlid.com}. Remember to +Please report bugs to @samp{bug-gnu-utils@@gnu.ai.mit.edu}. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. @xref{Bugs, , , gcc, GNU CC}. -@menu -* Past and future:: How the ID tools came about, and where they're going. -@end menu +@c ************* gkm ********************************************************* +@node Quick start +@chapter Quick Start Procedure +@table @bullet -@node Past and future -@section Past and future +Unpack the distribution. -@cindex history +Type @file{./configure} -@pindex look @r{and @code{mkid} 1} -@cindex McGary, Greg -Greg McGary conceived of the ideas behind mkid when he began hacking the -Unix kernel in 1984. He needed a navigation tool to help him find his -way around the expansive, unfamiliar landscape. The first @code{mkid}-like -tools were shell scripts, and produced an ASCII database that looks much -like the output of @code{lid} with no arguments. It took over an hour -on a VAX 11/750 to build a database for a 4.1BSD-ish kernel. Lookups -were done with the system utility @code{look}, modified to handle very -long lines. - -In 1986, Greg rewrote @code{mkid}, @code{lid}, @code{fid} and @code{idx} -in C to improve performance. Database-build times were shortened by an -order of magnitude. The @code{mkid} tools were first posted to -@samp{comp.sources.unix} in September 1987. +Type @samp{make} -@cindex Horsley, Tom -@cindex Scofield, Doug -@cindex Leonard, Bill -@cindex Berry, Karl -Over the next few years, several versions diverged from the original -source. Tom Horsley at Harris Computer Systems Division stepped forward -to take over maintenance and integrated some of the fixes from divergent -versions. A first release of -@code{mkid} @w{version 2} was posted to @file{alt.sources} near the end -of 1990. At that time, Tom wrote this Texinfo manual with the -encouragement the net community. (Tom especially thanks Doug Scofield -and Bill Leonard whom he dragooned into helping poorfraed and -edit---they found several problems in the initial version.) Karl Berry -revamped the manual for Texinfo style, indexing, and organization in -1995. +Type @samp{make install} as a user with the appropriate privileges +(e.g., @samp{bin} or perhaps even @samp{root}). -@pindex cscope -@pindex grep -@cindex future -In January 1995, Greg McGary reemerged as the primary maintaner and -launched development of @code{mkid} version 3, whose primary new feature -is an efficient algorithm for building databases that is linear in both -time and space over the size of the input text. (The old algorithm was -quadratic in space and therefore choked on very large source trees.) -The code is released under the GNU Public License, and might become a -part of the GNU system. @code{mkid} 3 is an interim release, since -several significant enhancements are still in the works: an optional -coupling with GNU @code{grep}, so that @code{grep} can use an ID -database for hints; a @code{cscope} work-alike query interface; -incremental update of the ID database; and an automatic file-tree walker -so you need not explicitly supply every filename argument to the -@code{mkid} program. +Type @samp{cd /usr/include; mkid} to build an ID database covering +all of the system header files. +Type @samp{lid FILE}, then @samp{gid strtok}, then @samp{aid stdout}. -@node mkid invocation -@chapter @code{mkid}: Creating ID databases +@end table -@pindex mkid -@cindex creating databases -@cindex databases, creating +You have just built, installed and used the most common commands of the +GNU ID utilities. If you ever need help remembering which system header +files contain a particular declaration, or reference a particular symbol, +you'll want to keep the ID file you built in @file{/usr/include} for +later use. If your working directory is elsewhere at the time, simply +provide the @samp{-f /usr/include} option to @file{lid} (@pxref{Reading +options}). -@pindex cron -The @code{mkid} program builds an ID database. To do this, it must scan -each file you tell it to include in the database. This takes some time, -but once the work is done the query programs run very rapidly. (You can -run @code{mkid} as a @code{cron} job to regularly update your -databases.) - -The @code{mkid} program knows how to extract identifiers from various -types of files. For example, it can recognize and skip over comments -and string constants in a C program. - -@cindex numbers, in databases -Identifiers are not the only thing included in the database. Numbers -are also recognized and included in the database indexed by their binary -value. This feature allows you to find uses of constants without regard -to the radix used to specify them, since the same number can frequently -be written in many different ways (for instance, @samp{47}, @samp{0x2f}, -@samp{057} in C). - -All the places in this document which mention identifiers should really -mention both identifiers and numbers, but that gets fairly clumsy after -a while, so you just need to keep in mind that numbers are included in -the database as well as identifiers. +@c ************* gkm ********************************************************* +@node Common options +@chapter Common command-line options -@cindex ID file format -@cindex architecture-independence -@cindex sharing ID files -The ID files that @code{mkid} creates are architecture- and -byte-order-independent; you can share them at will across systems. +@cindex common command-line options + +Certain options, and regular expression syntax, are shared by various +groupings of the ID utilities. We describe these in the sections below, +rather than repeating them for each program. @menu -* mkid options:: Command-line options to mkid. -* Scanners:: Built-in and defining your own. -* mkid examples:: Examples of mkid usage. +* Universal options:: Options common to all programs. +* Extraction options:: Options for programs that extract tokens from source files. +* Walker options:: Options for programs that walk file and directory trees. +* Reading options:: Options for programs that read ID databases. +* Writing options:: Options for programs that write ID databases. +* File listing options:: Options for programs that list file names. @end menu - -@node mkid options -@section @code{mkid} options - -@cindex options for @code{mkid} -@pindex mkid @r{options} - -By default, @code{mkid} scans the files you specify and writes the -database to a file named @file{ID} in the current directory. - -@example -mkid [-v] [-S@var{scanarg}] [-a@var{argfile}] [-] [-f@var{idfile}] @c -@var{files}@dots{} -@end example - -The program accepts the following options. +@c ************* gkm ********************************************************* +@node Universal options +@section Options Common to All Programs @table @samp -@item -v -@opindex -v -@cindex statistics -Verbose. @code{mkid} tells you as it scans each file and indicates -which scanner it is using. It also summarizes some statistics about the -database at the end. - -@item -S@var{scanarg} -@opindex -S@var{scanarg} -Specify options regarding @code{mkid}'s scanners. @xref{Scanner option -formats}. - -@item -a@var{argfile} -@opindex -a@var{argfile} -Read additional command line arguments from @var{argfile}. This is -typically used to specify lists of filenames longer than will fit on a -command line; some systems have severe limitations on the total length -of a command line. - -@item - -@opindex - -Read additional command line arguments from standard input. - -@item -f@var{idfile} -Write the database to the file @var{idfile}, instead of @file{ID}. The -database stores filenames relative to the directory containing the -database, so if you move the database to a different directory after -creating it, you may have trouble finding files. - -@c @item -u -@c @opindex -u -@c The @code{-u} option updates an existing database by rescanning any -@c files that have changed since the database was written. Unfortunately -@c you cannot incrementally add new files to a database. -@c Greg is reimplementing this ... - -@end table +@item --help +@opindex --help +@cindex help, online +Print a usage message listing all available options, then exit successfully. -The remaining arguments @var{files} are the files to be scanned and -included in the database. If no files are given at all (either on -command line or via @samp{-a} or @samp{-}), @code{mkid} does nothing. +@item --version +@opindex --version +@cindex version number, finding +Print the version number, then exit successfully. +@end table -@node Scanners -@section Scanners +@c ************* gkm ********************************************************* +@node Reading options +@section Options for Programs that Read ID Databases -@cindex scanners +@table @samp -To determine which identifiers to extract from a file and store in the -database, @code{mkid} calls a @dfn{scanner}; we say a scanner -@dfn{recognizes} a particular language. Scanners for several languages -are built-in to @code{mkid}; you can add your own scanners as well, as -explained in the sections below. - -@cindex suffixes of filenames -@code{mkid} determines which scanner to use for a particular file by -looking at the suffix of the filename. This @dfn{suffix} is everything -after and including the last @samp{.} in a filename; for example, the -suffix of @file{foo.c} is @file{.c}. @code{mkid} has a built-in list of -bindings from some suffixes to corresponding scanners; for example, -@file{.c} files are (not surprisingly) scanned by the predefined C -language scanner. - -@findex .default @r{scanner} -If @code{mkid} cannot determine what scanner to use for a particular -file, either because the file has no suffix (e.g., @file{foo}) or -because @code{mkid} has no binding for the file's suffix (e.g., -@file{foo.bar}), it uses the scanner bound to the @samp{.default} -suffix. By default, this is the plain text scanner (@pxref{Plain text -scanner}), but you can change this with the @samp{-S} option, as -explained below. +@item -f @var{filename} +@itemx --file=@var{filename} +@opindex -f +@opindex --file +@cindex ID database file name -@menu -* Scanner option formats:: Overview of the -S option. -* Predefined scanners:: The C, plain text, and assembler scanners. -* Defining new scanners:: Either in source code or at runtime with -S. -* idx invocation:: Testing mkid scanners. -@end menu +@var{Filename} is the ID database to read when processing queries. At +present, only a single @samp{--file} option is processed, but in future +releases, more than one ID database may be named on the command line. +@item $IDPATH +@cindex ID database file name -@node Scanner option formats -@subsection Scanner option formats +@samp{IDPATH} is an environment variable that contains a +colon-separated list of ID database names. If this variable is present, +and no @samp{--file} options are presented on the command line, the ID +databases named in @samp{IDPATH} are implied.@footnote{At present, this +feature is fully implemented, since only the first of a list of ID +database names is processed.} -@cindex scanner options -@opindex -S @r{scanner option} +@end table -With the @samp{-S} option, you can change which language scanner to use -for which files, give language-specific options, and get some limited -online help about scanner options. +If no ID databases are specified either on the command line or via the +@samp{IDPATH} environment variable, then the ID utilities search for a +file named @file{ID} in the current working directory, and then in +successive parent directories. -Here are the different forms of the @samp{-S} option: +@c ************* gkm ********************************************************* +@node Writing options +@section Options for Programs that Write ID Databases @table @samp -@item -S.@var{suffix}=@var{scanner} -@opindex -S. -Use @var{scanner} for a file with the given @samp{.@var{suffix}}. For -example, @samp{-S.yacc=c} tells @code{mkid} to use the @samp{c} language -scanner for all files ending in @samp{.yacc}. - -@item -S.@var{suffix}=? -Display which scanner is used for the given @samp{.@var{suffix}}. - -@item -S?=@var{scanner} -@opindex -S? -Display which suffixes @var{scanner} is used for. - -@item -S?=? -Display the scanner binding for every known suffix. +@item -o @var{filename} +@itemx --output=@var{filename} +@opindex -o +@opindex --output +@cindex ID database file name -@item -S@var{scanner}+@var{arg} -@itemx -S@var{scanner}-@var{arg} -Each scanner accepts certain scanner-dependent arguments. These options -all have one of these forms. @xref{Predefined scanners}. +The @samp{--output} option names the file in which to write a new ID +database. If no @samp{--output} (or @samp{--file}) option is present, +an output file named @file{ID} is implied. -@item -S@var{scanner}? -Display the scanner-specific options accepted by @var{scanner}. +@item -f @var{filename} +@itemx --file=@var{filename} +@opindex -f +@opindex --file +@cindex ID database file name -@item -S@var{new-scanner}/@var{old-scanner}/@var{filter-command} -Define @var{new-scanner} in terms of @var{old-scanner} and -@var{filter-command}. @xref{Defining scanners with options}. +This is a synonym for @samp{--output} @end table +@c ************* gkm ********************************************************* +@node Walker options +@section Options for Programs that Walk File and Directory Trees. -@node Predefined scanners -@subsection Predefined scanners +The programs @file{mkid} and @file{xtokid} accept the names of files and +directories on the command line. Files are scanned if there is a +scanner available and enabled for the file's source language. +Directories are recursively descended, searching for files whose names +match the rules listed in the @emph{language map} file (@pxref{Language +map}). -@cindex predefined scanners -@cindex scanners, predefined +The following option controls the file tree walker: -@code{mkid} has built-in scanners for several types of languages; you -can get the list by running @code{mkid -S?=?}. -The supported languages are documented -below@footnote{This is not strictly true: @samp{vhil} is a supported -language, but it is an obsolete and arcane dialect of C and should be -ignored.}. - -@menu -* C scanner:: For the C programming language. -* Plain text scanner:: For documents or other non-source code. -* Assembler scanner:: For assembly language. -@end menu +@table @samp +@item -p @var{names} +@itemx --prune=@var{names} +@opindex -p +@opindex --prune +@cindex file tree pruning -@node C scanner -@subsubsection C scanner +One or more file or directory names may appear in @var{names}. The file +tree walker will stop short at these files and directories and their +contents will not be scanned. -@cindex C scanner, predefined -@flindex .[chly] @r{files, scanning} +@end table -The C scanner is the most commonly used. Files with the usual @file{.c} -and @file{.h} suffixes, and the @file{.y} (yacc) and @file{.l} (lex) -suffixes, are processed with this scanner (by default). +@c ************* gkm ********************************************************* +@node File listing options +@section Options for Programs that List File Names -Scanner-specific options: +The programs @file{lid} and @file{fnid} can print lists of file names as +the result of queries. The following option controls how these lists +are formatted: @table @samp -@item -Sc-s@var{character} -@kindex $ @r{in identifiers} -@opindex -Sc-s -Allow the specified @var{character} in identifiers. For example, if you -use @samp{$} in identifiers, you'll want to use @samp{-Sc-s$}. - -@item -Sc+u -@opindex -Sc+u -Strip leading underscores from identifiers. You might to do this in -peculiar circumstances, such as trying to parse the output from -@code{nm} or some other system utility. +@item -S @var{style} +@itemx --separator=@var{style} +@opindex -S +@opindex --separator +@cindex file name separator -@item -Sc-u -@opindex -Sc-u -Don't strip leading underscores from identifiers; this is the default. - -@end table +@var{Style} may be one of @samp{braces}, @samp{space} or @samp{newline}. +The @var{style} of @samp{braces} means that file names with common +directory prefix and common suffix are printed using the shell's brace +notation in order to compress the output. For example, +@file{../src/foo.c ../src/bar.c} can be printed in brace notation as +@file{../src/@{foo,bar@}.c}. -@node Plain text scanner -@subsubsection Plain text scanner +The @var{style}s of @samp{space} and @samp{newline} mean that file names +are separated spaces or by newlines, respectively. -@cindex plain text scanner +If the list of files is being printed on a terminal, brace notation is +the default. If not, file names are separated by spaces if the +@var{key} is included in the output, and by newlines the @var{key style} +is @samp{none} (@pxref{lid invocation}). -The plain text scanner is intended for scanning most non-source-code -files. This is typically the scanner used when adding custom scanners -via @samp{-S} (@pxref{Defining scanners with options}). +@end table -@c @code{mkid} predefines a troff scanner in terms of the plain text -@c scanner and -@c the @code{deroff} utility. -@c A compressed man page -@c scanner runs @code{pcat} piped into @code{col -b}, and a @TeX{} scanner -@c runs @code{detex}. +@c ************* gkm ********************************************************* +@node Extraction options +@section Options for Programs that Scan Source Files -Scanner-specific options: +@file{mkid} and @file{xtokid} walk file trees, select source files by +name, and extract tokens from source files. They accept the following +options: @table @samp -@item -Stext+a@var{character} -@opindex -Stext+a -Include @var{character} in identifiers. By default, letters (a--z and -A--Z) and underscore are included. - -@item -Stext-a@var{character} -@opindex -Stext-a -Exclude @var{character} from identifiers. +@item -m @var{mapfile} +@itemx --lang-map=@var{mapfile} +@opindex -m +@opindex --lang-map +@cindex language map file -@item -Stext+s@var{character} -@opindex -Stext+s -@cindex squeezing characters from identifiers -Squeeze @var{character} from identifiers, i.e., do not terminate an -identifier when @var{character} is seen. By default, the characters -@samp{'}, @samp{-}, and @samp{.} are squeezed out of identifiers. For -example, the input @samp{fred's} leads to the identifier @samp{freds}. +@var{mapfile} contains rules for determining the source languages from +file names. @xref{Language map} -@item -Stext-s@var{character} -Do not squeeze @var{character}. +@item -i @var{languages} +@itemx --include=@var{languages} +@opindex -i +@opindex --include +@cindex include languages -@end table +The @samp{--include} option names @var{languages} whose source files +should be scanned and incorporated into the ID database. By default, +all languages known to the ID utilities are enabled. +@item -x @var{languages} +@itemx --exclude=@var{languages} +@opindex -x +@opindex --exclude +@cindex exclude languages + +The @samp{--exclude} option names @var{languages} whose source files +should @var{not} be scanned. The default list of excluded languages is +empty. Note that only one of @samp{--include} or @samp{--exclude} may +be specified on the command line for a single run. + +@item -l @var{language}:@var{options} +@itemx --lang-option=@var{language}:@var{options} +@opindex -l +@opindex --lang-option +@cindex language-specific option + +Language-specific scanners also accept options. @var{Language} denotes +the desired scanner, and @var{option} are the command-line options that +should be passed through to it. For example, to pass the @var{-x +--coke-bottle} options to the scanner for the language @var{swizzle}, +pass this: @var{-l swizzle:"-x --coke-bottle"}, or this: +@var{-lang-option=swizzle:"-x --coke-bottle"}, or this: @var{-l +swizzle-x -l swizzle:--coke-bottle}. Use the @samp{--help} option to +see the command-line option summary for -@node Assembler scanner -@subsubsection Assembler scanner +@end table -@cindex assembler scanner +@cindex scanners -Since assembly languages come in several flavors, this scanner has a -number of options: +To determine which tokens to extract from a file and store in the +database, @file{mkid} calls a @dfn{scanner}; we say a scanner +@dfn{recognizes} a particular language. Scanners for several languages +are built-in to @file{mkid}; you can add your own scanners as well, as +explained in @ref{Defining scanners}. -@table @samp +The ID utilities determine which scanner to use for a particular file by +consulting the language-map file. Scanners for several are already +built-in to the ID utilities. You can see which languages have built-in +scanners, and examine their language-specific options by invoking +@samp{mkid --help} or @samp{xtokid --help}. -@item -Sasm-c@var{character} -@opindex -Sasm-c -@cindex comments in assembler -Define @var{character} as starting a comment that extends to the end of -the input line; no default. In many assemblers this is @samp{;} or -@samp{#}. - -@item -Sasm+u -@itemx -Sasm-u -@opindex -Sasm+u -Strip (@samp{+u}) or do not strip (@samp{-u}) leading underscores from -identifiers. The default is to strip them. - -@item -Sasm+a@var{character} -@opindex -Sasm+a -Allow @var{character} in identifiers. - -@item -Sasm-a@var{character} -Allow @var{character} in identifiers, but if an identifier contains -@var{character}, ignore it. This is useful to ignore temporary labels, -which can be generated in great profusion; these often contain @samp{.} -or @samp{@@}. - -@item -Sasm+p -@itemx -Sasm-p -@opindex -Sasm+p -Recognize (@samp{+p}) or do not recognize (@samp{-p}) C preprocessor -directives in assembler source. The default is to recognize them. - -@item -Sasm+C -@itemx -Sasm-C -@opindex -Sasm+C -Skip over (@samp{+C}) or do not skip over (@samp{-C}) C style comments -in assembler source. The default is to skip them. +@menu +* Language map:: Mapping file names to source languages. +* C/C++ scanner:: For the C and C++ programming language. +* Assembler scanner:: For assembly language. +* Text scanner:: For documents or other non-source code. +* Defining scanners:: Defining new scanners in the source code. +@end menu -@end table +@c ************* gkm ********************************************************* +@node Language map +@subsection Mapping file names to source languages +The file @file{id-lang.map}, installed by default in +@file{$(prefix)/share/id-lang.map}, contains rules for mapping file +names to source languages. Each rule comprises three parts: a shell +@var{glob} pattern, a language name, and language-specific scanner +options. -@node Defining new scanners -@subsection Defining new scanners +The special pattern @samp{**} denotes the default source language. This is +the language that's assigned to file names that don't match any other +pattern. -@cindex scanners, adding new +The special pattern @samp{***} should be followed by a file name. The +named file should contain more language-map rules and is included at +this point. -You can add new scanners to @code{mkid} in two ways: modify the source -code and recompile, or at runtime via the @samp{-S} option. Each has -their advantages and disadvantages, as explained below. +The order in which rules are presented in a language-map file is +significant. This order influences the order in which files are +displayed as the result of queries. For example, the distributed +language-map file places all rules for C @var{.h} files ahead of +@var{.c} files, so that in general, declarations will precede +definitions in query output. The same thing is done for C++ and its +many different source file name extensions. -If you create a new scanner that would be of use to others, please -consider sending it back to the maintainer, -@samp{gkm@@magilla.cichlid.com}, for inclusion in future releases of -@code{mkid}. +Here is a pared-down version of the @file{id-lang.map} file distributed +with the ID utilities: -@menu -* Defining scanners in source code:: -* Defining scanners with options:: -@end menu +@example +# Default language +** IGNORE # Although this is listed first, + # the default language pattern is + # logically matched last. + +# Backup files +*~ IGNORE +*.bak IGNORE +*.bk[0-9] IGNORE + +# SCCS files +[sp].* IGNORE + +# list header files before code files +*.h C +*.h.in C +*.H C++ +*.hh C++ +*.hpp C++ +*.hxx C++ + +# list C `meta' files next +*.l C +*.lex C +*.y C +*.yacc C + +# list C code files after header files +*.c C +*.C C++ +*.cc C++ +*.cpp C++ +*.cxx C++ + +# list assembly language after C +*.[sS] asm --comment=; +*.asm asm --comment=; + +# [nt]roff +*.[0-9] roff +*.ms roff +*.me roff +*.mm roff + +# TeX and friends +*.tex TeX +*.ltx TeX +*.texi texinfo +*.texinfo texinfo -@node Defining scanners in source code -@subsubsection Defining scanners in source code +@end example -@flindex scanners.c -@cindex scanners, defining in source code +@c ************* gkm ********************************************************* +@node C/C++ scanner +@subsection C/C++ Language Scanner -@vindex languages_0 -@vindex suffixes_0 -To add a new scanner in source code, you should add a new section to the -file @file{scanners.c}. Copy one of the existing scanners (most likely -either C or plain text), and modify as necessary. Also add the new -scanner to the @code{languages_0} and @code{suffixes_0} tables near the -beginning of the file. +@cindex C scanner, predefined -This is not a terribly difficult programming task, but it requires -recompiling and installing the new version of @code{mkid}, which may be -inconvenient. +The C scanner is the most commonly used. Files that match the glob +pattern @file{*.h}, @file{*.c}, as well as @file{yacc} files that match +@file{*.y} or @file{*.yacc}, and @file{lex} files that match @file{*.l} +or @file{*.lex}, are processed with this scanner. -This method leads to scanners which operate much more quickly than ones -that depend on external programmers. It is also likely the easiest way -to define scanners for new programming languages. +Scanner-specific options (Note, these options are presented +@var{without} the required @samp{-l} or @samp{--lang-option=} prefix): +@table @samp -@node Defining scanners with options -@subsubsection Defining scanners with options +@item -k @var{character-class} +@itemx --keep=@var{character-class} +@opindex -k +@opindex --keep +@opindex -l C:-k +@opindex -l C:--keep +@opindex --lang-option=C:-k +@opindex --lang-option=C:--keep + +Consider the characters in @var{character-class} as valid constituents of +identifier names. For example, if you are indexing C code that contains +@samp{$} in some of its identifiers, you can include these by using +@samp{--lang-option=C:--keep=$}, or @samp{-l C:"-k $"} (if you don't like +to type so much). + +@item -i @var{character-class} +@itemx --ignore=@var{character-class} +@opindex -i +@opindex --ignore +@opindex -l C:-i +@opindex -l C:--ignore +@opindex --lang-option=C:-i +@opindex --lang-option=C:--ignore + + x mkiConsider the characters in @var{character-class} as valid constituents of +identifier names, but discard all tokens containing these characters. +For example, if some C code has identifiers containing @samp{$}, but you +don't want these cluttering up your ID database, use +@samp{--lang-option=C:--ignore=$}, or the terser equivalent @samp{-l +C:"-i $"}. + +@item -u +@itemx --strip-underscore +@opindex -u +@opindex --strip-underscore +@opindex -l C:-u +@opindex -l C:--strip-underscore +@opindex --lang-option=C:-u +@opindex --lang-option=C:--strip-underscore + +Strip one leading underscore from C identifiers encapsulated as +character strings. This option is useful if you are indexing C code +that contains symbol-table name strings for systems that prepend an +underscore to external symbols. By default, the leading underscore is +retained. -@cindex scanners, defining with options +@end table -You can use the @samp{-S} option on the command line to define a new -language scanner: +@c ************* gkm ********************************************************* +@node Assembler scanner +@subsection Assembly Language Scanner -@example --S@var{new-scanner}/@var{existing-scanner}/@var{filter} -@end example +@cindex assembler scanner +@cindex assembly language scanner -@noindent -Here, @var{new-scanner} is the name of the new scanner being defined, -@var{existing-scanner} is the name of an existing scanner, and -@var{filter} is a shell command or pipeline. +Assembly languages use a variety of commenting conventions, and allow a +variety of special characters to @emph{dirty up} local symbols, +preventing name space conflicts with symbols defined by higher-level +languages. Also, some compilation systems prepend an underscore to +external symbols. The options listed below are designed to address +these differences. -The new scanner works by passing the input file to @var{filter}, and -then arranging for the result to be passed through -@var{existing-scanner}. Typically, @var{existing-scanner} is @samp{text}. +@table @samp -Somewhere within @var{filter}, the string@samp{%s} should occur. This -@samp{%s} is replaced by the name of the source file being scanned. +@item -c @var{character-class} +@itemx --comment=@var{character-class} +@opindex -c +@opindex --comment +@opindex -l asm:-c +@opindex -l asm:--comment +@opindex --lang-option=asm:-c +@opindex --lang-option=asm:--comment -@cindex Texinfo, scanning example of -For example, @code{mkid} has no built-in scanner for Texinfo files (like -this one). In indexing a Texinfo file, you most likely would want -to ignore the Texinfo @@-commands. Here's one way to specify a new -scanner to do this: +The characters in @var{character-class} are considered left delimiters +for comments that extend until the end of the current line. -@example --S/texinfo/text/sed s,@@[a-z]*,,g %s -@end example +@item -k @var{character-class} +@itemx --keep=@var{character-class} +@opindex -k +@opindex --keep +@opindex -l asm:-k +@opindex -l asm:--keep +@opindex --lang-option=asm:-k +@opindex --lang-option=asm:--keep + +Consider the characters of @var{character-class} as valid constituents of +identifier names. For example, if you are indexing assembly code that +prepends @samp{.} to assembler directives, and prepends @samp{%} to +register names, you can keep these characters in the tokens by specifying +@samp{--lang-option=asm:--keep=.%}, or @samp{-l asm:"-k .%"}. + +@item -i @var{character-class} +@itemx --ignore=@var{character-class} +@opindex -i +@opindex --ignore +@opindex -l asm:-i +@opindex -l asm:--ignore +@opindex --lang-option=asm:-i +@opindex --lang-option=asm:--ignore + +Consider the characters of @var{character-class} as valid consituents of +identifier names, but discard all tokens containing these characters. +For example, if you don't want to clutter your ID database with +assembler directives that begin with a leading @samp{.} or with +assembler labels that contain @samp{@@}, use +@samp{--lang-option=asm:--ignore=.@@}, or @samp{-l asm:"-i .@@"}. + +@item -u +@itemx --strip-underscore +@opindex -u +@opindex --strip-underscore +@opindex -l asm:-u +@opindex -l asm:--strip-underscore +@opindex --lang-option=asm:-u +@opindex --lang-option=asm:--strip-underscore + +Strip one leading underscore from identifiers. This option is useful if +your compilation system prepends an underscore to external symbols. By +stripping the underscore, you can canonicalize such names and bring them +into conformance the way they are expressed in the C language. By +default, the leading underscore is retained. -This defines a new language scanner (@samp{texinfo}) defined in terms of -a @code{sed} command to strip out Texinfo directives (an @samp{@@} -character followed by letters). Once the directives are stripped, the -remaining text is run through the plain text scanner. +@item -n +@itemx --no-cpp +@opindex -n +@opindex --no-cpp +@opindex -l asm:-n +@opindex -l asm:--no-cpp +@opindex --lang-option=asm:-n +@opindex --lang-option=asm:--no-cpp -This is a minimal example; to do a complete job, you would need to -completely delete some lines, such as those beginning with @code{@@end} -or @@node. +Do not recognize C preprocessor directives. By default, such lines are +handled in the same way as they are by the C language scanner. +@end table -@node idx invocation -@subsection @code{idx}: Testing @code{mkid} scanners +@c ************* gkm ********************************************************* +@node Text scanner +@subsection Text Scanner -@code{idx} prints the identifiers found in the files you specify to -standard output. This is useful in debugging new @code{mkid} scanners -(@pxref{Scanners}). Synopsis: +@cindex text scanner -@example -idx [-S@var{scanarg}] @var{files}@dots{} -@end example +The plain text scanner is intended for human-language documents, or as the +scanner of last resort for files that have no scanner that is more +specific. It is customizable to the extent that character classes can +be designated as token constituents or as token delimiters. The default +token constituents are the alpha-numerics; all other characters are +considered token delimiters. -@code{idx} accepts the same @samp{-S} options as @code{mkid}. -@xref{Scanner option formats}. +@table @samp -The name ``idx'' stands for ``ID eXtract''. The name may change in -future releases, since this is such an infrequently used program. +@item -i @var{character-class} +@itemx --include=@var{character-class} +@opindex -i +@opindex --include +@opindex -l text:-i +@opindex -l text:--include +@opindex --lang-option=text:-i +@opindex --lang-option=text:--include +Include characters belonging to @var{character-class} in tokens. -@node mkid examples -@section @code{mkid} examples +@item -x @var{character-class} +@itemx --exclude=@var{character-class} +@opindex -x +@opindex --exclude +@opindex -l text:-x +@opindex -l text:--exclude +@opindex --lang-option=text:-x +@opindex --lang-option=text:--exclude -@cindex examples of @code{mkid} +Exclude characters belonging to @var{character-class} from tokens, i.e., treat +them as token delimiters. -The simplest example of @code{mkid} is something like: +@end table -@example -mkid *.[chy] -@end example +@c ************* gkm ********************************************************* +@node Defining scanners +@subsection Defining New Scanners in the Source Code -This will build an ID database indexing identifiers and numbers in the -all the @file{.c}, @file{.h}, and @file{.y} files in the current -directory. Because @code{mkid} already knows how to scan files with -those suffixes, no additional options are needed. - -@cindex man pages, compressed -@cindex compressed files, building ID from -Here's a more complex example. Suppose you want to build a database -indexing the contents of all the @code{man} pages, and furthur suppose -that your system is using @code{gzip} (@pxref{Top, , , gzip, Gzip}) to -store compressed @code{cat} versions of the @code{man} pages in the -directory @file{/usr/catman}. The @code{gzip} program creates files -with a @code{.gz} suffix, so you must tell @code{mkid} how to scan -@file{.gz} files. Here are the commands to do the job: +@flindex scanners.c +@cindex scanners, defining in source code -@example -cd /usr/catman -find . -name \*.gz -print | mkid '-Sman/text/gzip <%s' -S.gz=man - -@end example +@vindex languages_0 -@noindent Explanation: +To add a new scanner in source code, you should add a new section to the +file @file{scanners.c}. It might be easiest to clone one of the +existing scanners and modify it as necessary. For the hypothetical +language @var{foo}, you must define the functions @code{get_token_foo}, +@code{parse_args_foo}, @code{help_me_foo}, as well as the tables +@code{long_options_foo} and @code{args_foo}. If your scanner is +modelled after one of the existing scanners, you'll also need a +character-attribute table @code{ctype_foo}. -@enumerate +This is not a terribly difficult programming task, but it requires +recompiling and installing the new version of @file{mkid} and @file{xtokid}. +You should use @file{xtokid} to test the operation of the new scanner. -@item -We first @code{cd} to @file{/usr/catman} so the ID database -will store the correct relative filenames. +Once these functions and tables are ready, add function prototypes and +an entry to to the @code{languages_0} table near the beginning of the +file. -@item -The @code{find} command prints the names of all @file{.gz} files under -the current directory. @xref{find invocation, , , sh-utils, GNU shell -utilities}. +Be warned that the existing scanners are built for speed, not elegance +or readability. You might wish to create a new scanner that's easier to +read and understand if you don't feel that speed is so important. -@item -This list is piped to @code{mkid}; the @code{-} option (at the end of -the line) tells @code{mkid} to read arguments (in this case, as is -typical, the list of filenames) from standard input. @xref{mkid options}. +@c ************* gkm ********************************************************* +@node mkid invocation +@chapter @samp{mkid}: Creating an ID Database +@cindex creating databases +@cindex databases, creating +@cindex ID file format +@cindex architecture-independence +@cindex sharing ID files -@item -The @samp{-Sman/text/gzip @dots{}} defines a new language @samp{man} in -terms of the @code{gzip} program and @code{mkid}'s existing text -scanner. @xref{Defining scanners with options}. +@file{mkid} builds an ID database. It accepts the names of files and/or +directories on the command line, selects files that have an enabled +scanner, then extracts and stores tokens from those files. The +resulting ID database is architecture- and byte-order-independent so it +can be shared among all systems. + +The primary virtues of @file{mkid} are speed and high capacity. The +size of the source trees it can index is limited only by available +system memory. @file{mkid}'s indexing algorithm is very space-efficient +and exhibits excellent locality-of-reference, and so is capable of +operating with a working-set size that is only half the size of its +virtual address space. A typical @sc{UNIX}-like operating system with +16 megabytes of system memory should be able to build an ID database +covering approximately 12,000-14,000 source files totalling +approximately 50--100 Megabytes. A 66 Mhz 486 computer can build such +a large ID database in approximately 10-15 minutes. -@item -The @samp{-S.gz=man} tells @code{mkid} to treat all @file{.gz} files as -this new language @code{man}. @xref{Scanner option formats}. +@pindex cron +In a future release, @file{mkid} will be able to incrementally update an +ID database much faster than it can build one from scratch. Until this +feature becomes available, it might be a good idea to schedule a +@file{cron} job to regularly update large ID databases during off-hours. + +@file{mkid} writes the ID file, therefore it accepts the @samp{--output} +(and @samp{--file}) options as described in @ref{Writing options}. +@file{mkid} extracts tokens from source files, therefore it accepts the +@samp{--lang-map}, @samp{--include}, @samp{--exclude}, and +@samp{--lang-option} options, as well as the language-specific scanner +options, all of which are described in @ref{Extraction options}. +@file{mkid} walks file trees, therefore it handles file and directory +names on its command line and the @samp{--prune} option as described in +@ref{Walker options}. + +In addition, @file{mkid} accepts the following command-line options: -@end enumerate +@table @samp -As a further complication, @code{cat} pages typically contain -underlining and backspace sequences, which will confuse @code{mkid}. To -handle this, the @code{gzip} command becomes a pipeline, like this: +@item -s +@itemx --statistics +@opindex -s +@opindex --statistics +@cindex statistics -@example -mkid '-Sman/text/gzip <%s | col -b' -S.gz=man - -@end example +@file{mkid} reports statistics about resource usage at the end of its +run. +@item -v +@itemx --verbose +@opindex -v +@opindex --verbose +@cindex @file{mkid} progress -@node Common query arguments -@chapter Common query arguments +@file{mkid} reports statistics about each file as it is scanned, and +about the resource usage of its indexing algorithm at regular intervals. -@cindex common query arguments +@end table -Certain options, and regular expression syntax, are shared by the ID -query tools. So we describe those things in the sections below, instead -of repeating the description for each tool. +@c ************* gkm ********************************************************* +@node lid invocation +@chapter @code{lid}: Querying an ID Database by Token + +The @file{lid} program accepts @var{patterns} on the command line which +it matches against the tokens stored in an ID database. The +interpretation of a @var{pattern} is determined by the makeup of the +@var{pattern} string itself, or can be overridden by command-line +options. If a @var{pattern} contains regular expression meta-characters, +it is used to perform a regular-expression substring search. If no such +meta-characters are present, @var{pattern} is used to perform a literal +word search. (By default, all searches are sensitive to alphabetic +case.) If no @var{pattern} is supplied on the command line, @file{lid} +lists every entry in the ID database. + +@file{lid} reads the ID database, therefore it accepts the @samp{--file} +option, and consults the @samp{IDPATH} environment variable, as +described in @ref{Reading options}. @file{lid} lists file names, +therefore it accepts the @samp{--separator} option, as described in +@ref{File listing options}. + +In addition, @code{lid} accepts the following command-line options: -@menu -* Query options:: -f -r -c -ew -kg -n -doxa -m -F -u. -* Patterns:: Regular expression syntax for searches. -* Examples: Query examples. Some common uses. -@end menu +@table @samp +@item -i +@itemx --ignore-case +@opindex -i +@opindex --ignore-case +@cindex alphabetic case, ignoring differences in +@cindex ignoring differences in alphabetic case + +Ignoring differences in alphabetic case between the @var{pattern} and +the tokens in the ID database. + +@item -l +@itemx --literal +@opindex -l +@opindex --literal + +Match @var{pattern} as a literal string. Use this option if +@var{pattern} contains regular-expression meta-characters, but you don't +wish to perform a regular-expression search. + +@item -r +@itemx --regexp +@opindex -r +@opindex --regexp + +Match @var{pattern} as an @emph{extended} regular expression@footnote{Extended +regular expressions are the same as those accepted by @file{egrep}.}. +Use this option if no regular-expression expression meta-characters are +present in @var{pattern}, but you wish to force a regular-expression +search (note: in this case, a @emph{literal substring} search might be +faster). + +@item -w +@itemx --word +@opindex -w +@opindex --word -@node Query options -@section Query options +Match @var{pattern} using a word-delimited (non substring) search. This is the default +for literal searches. -@cindex query options, common -@cindex common query options +@item -s +@itemx --substring +@opindex -s +@opindex --substring -The ID query tools (@emph{not} @code{mkid}) share certain command line -options. Not all of these options are recognized by all programs, but -if an option is used by more than one program, it is described below. -The description of each program gives the options that program uses. +Match @var{pattern} using a substring (non word-delimited) search. This +is the default for regular expression searches. -@table @samp +@item -k @var{style} +@itemx --key=@var{style} +@opindex -k +@opindex --substring -@item -f@var{idfile} -@opindex -f@var{idfile} -@cindex database name, specifying -@cindex parent directories, searched for ID -Read the database from @var{idfile}, in the current directory or in any -directory above the current directory. The default database name is -@file{ID}. Searching parent directories lets you have a single ID -database at the root of a large source tree and then use the query tools -from anywhere within that tree. - -@item -r@var{directory} -@opindex -r@var{directory} -Find files relative to @var{directory}, instead of the directory in -which the ID database was found. This is useful if the ID database was -moved after its creation. - -@item -c -@opindex -c -Equivalent to @code{-r`pwd`}, i.e., find files relative to the current -directory, instead of the directory in which the ID database was found. +@var{Style} can be one of @samp{token}, @samp{pattern} or @samp{none}. +This option controls how the subject of the query is presented. This is +best illustrated by example: -@item -e -@itemx -w -@opindex -e -@opindex -w -@cindex regular expressions, forcing evaluation as -@cindex strings, forcing evaluation as -@cindex constant strings, forcing evaluation as -@samp{-e} forces pattern arguments to be treated as regular expressions, -and @samp{-w} forces pattern arguments to be treated as constant -strings. By default, the query tools guess whether a pattern is regular -expressions or constant strings by looking for special characters. -@xref{Patterns}. - -@item -k -@itemx -g -@opindex -k -@opindex -g -@cindex brace notation in filename lists -@cindex shell brace notation in filename lists -@samp{-k} suppresses use of shell brace notation in the output. By -default, the query tools that generate lists of filenames attempt to -compress the lists using the usual shell brace notation, e.g., -@file{@{foo,bar@}.c} to mean @file{foo.c} and @file{bar.c}. (This is -useful if you use @code{ksh} or the original (not GNU) @code{sh} and -want to feed the list of names to another command, since those shells do -not support this brace notation; the name of the @code{-k} option comes -from the @code{k} in @code{ksh}). - -@samp{-g} turns on use of brace notation; this is only needed if the -query tools were compiled with @samp{-k} as the default behavior. +@example +$ lid --key=token '^dest.' +destaddr libsys/memcpy.c +destination libsys/regex.c +destlst libsys/rx.c +destpos libsys/rx.c +destset libsys/rx.h libsys/rx.c + +$ lid --key=pattern '^dest.' +^dest. libsys/rx.h libsys/@{memcpy,regex,rx@}.c + +$ lid --key=none '^dest.' +libsys/rx.h libsys/@{memcpy,regex,rx@}.c +@end example -@item -n -@opindex -n -@cindex suppressing matching identifier -Suppress the matching identifier before each list of filenames that the -query tools output by default. This is useful if you want a list of just -the names to feed to another command. +When @samp{--key} is either @samp{token} or @samp{pattern}, the first +column of output is a @var{token} or @var{pattern}, respectively. When +@samp{--key} is @samp{none}, neither of these is printed, and the file +name list begins immediately. The default is @samp{token}. + +@item -R @var{style} +@itemx --result=@var{style} +@opindex -R +@opindex --result + +@var{Style} can be one of @samp{filenames}, @samp{grep}, @samp{edit} or +@samp{none}. This option controls how the value associated with the +query's @var{key} presented. When @var{style} is @samp{filenames}, a +list of file names is printed (this is the default). When @var{style} +is @samp{grep}, the lines that match @var{pattern} are printed in the +same format as @samp{egrep -n}. When @var{style} is @samp{edit}, the +file names are passed to an editor, and if possible @var{pattern} is +passed as an initial search string (@pxref{eid invocation}). When +@var{style} is @samp{none}, the file names are not processed in any way. +This can be useful if you wish to see what tokens match a @var{pattern}, +but don't care about where they reside. @item -d @itemx -o @itemx -x -@itemx -a @opindex -d @opindex -o @opindex -x -@opindex -a @cindex radix of numeric matches, specifying @cindex numeric matches, specifying radix of + These options may be used in any combination to specify the radix of numeric matches. @samp{-d} allows matching on decimal numbers, -@samp{-o} on octal numbers, and @samp{-x} on hexadecimal numbers. The -@code{-a} option is equivalent to specifying all three; this is the -default. Any combination of these options may be used. +@samp{-o} on octal numbers, and @samp{-x} on hexadecimal numbers. Any +combination of these options may be used. The default is to match all +three radixes. -@item -m -@opindex -m -@cindex multiple lines, merging -Merge multiple lines of output into a single line. If your query -matches more than one identifier, the default is to generate a separate -line of output for each matching identifier. - -@itemx -F- -@itemx -F@var{n} -@itemx -F-@var{m} -@itemx -F@var{n}-@var{m} +@item -F @var{range} +@itemx --frequency=@var{range} @opindex -F +@opindex --frequency @cindex single matches, showing -Show identifiers matching at least @var{n} and at most @var{m} times. -@samp{-F-} is equivalent to @samp{-F1}, i.e., find identifiers that -appear only once in the database. (This is useful to locate identifiers -that are defined but never used, or used once and never defined.) - -@item -u@var{number} -@opindex -u -@cindex conflicting identifiers, finding -List identifiers that conflict in the first @var{number} characters. -This could be in useful porting programs to brain-dead computers that -refuse to support long identifiers, but your best long term option is to -set such computers on fire. - -@end table - -@node Patterns -@section Patterns - -@cindex patterns -@cindex regular expression syntax - -@dfn{Patterns}, also called @dfn{regular expressions}, allow you to -match many different identifiers in a single query. - -The same regular expression syntax is recognized by all the query tools -that handle regular expressions. The exact syntax depends on how the ID -tools were compiled, but the following constructs should always be -supported: - -@table @samp - -@item . -Match any single character. - -@item [@var{chars}] -Match any of the characters specified within the brackets. You can -match any characters @emph{except} the ones in brackets by typing -@samp{^} as the first character. A range of characters can be specified -using @samp{-}. For example, @samp{[abc]} and @samp{[a-c]} both match -@samp{a}, @samp{b}, or @samp{c}, and @samp{[^abc]} matches anything -@emph{except} @samp{a}, @samp{b}, or @samp{c}. - -@item * -Match the previous construct zero or more times. +Match tokens whose occurrence count falls in @var{range}. @var{Range} +may be expressed as a single number @var{n}, or as a range +@var{n@code{..}m}. Either limit of the range may be omitted (e.g., +@var{@code{..}m}, or @var{n..@code{..}}). If the lower limit @var{n} is +omitted, it defaults to @code{1}. If the upper limit is omitted, it +defaults in the present implementation to @code{65535}, the maximum +value of an unsigned 16-bit integer. + +Particularly useful queries are @samp{lid -F1}, which helps locate +identifiers that are defined but never used, or are used but never +defined. Similarly, @code{lid -F2} can help find functions that possess +a prototype declaration and a definition, but are never called. + +@item -a @var{number} +@itemx --ambiguous=@var{number} +@opindex -a +@opindex --ambiguous +@cindex ambiguous identifier names, finding -@item ^ -@itemx $ -@samp{^} (@samp{$}) at the beginning (end) of a pattern anchors the -match to the first (last) character of the identifier. +List identifiers (not numbers) that are ambiguous for the first +@var{number} characters. This feature might be in useful when porting +programs to ancient pea-brained compilers that don't support long +identifier names. However, the best long-term option is to set such +systems on fire. @end table -The query programs use either the @code{regex}/@code{regcmp} or -@code{re_comp}/@code{re_exec} functions, depending on which are -available in the library on your system. These do not always support -the exact same regular expression syntax, so consult your local -@code{man} pages to find out. - - -@node Query examples -@section Query examples - -@cindex examples, queries -@cindex query examples -Here are some examples of the options described in the previous -sections. - -To restrict searches to exact matches, use @samp{^@dots{}$}. For example: - -@example -prompt$ gid '^FILE$' -ansi2knr.c:144: @{ FILE *in, *out; -ansi2knr.c:315: FILE *out; -fid.c:38: FILE *id_FILE; -filenames.c:576: FILE * -@dots{} -@end example - -To show identifiers not unique in the first 16 characters: - -@example -prompt$ lid -u16 -RE_CONTEXT_INDEP_ANCHORS regex.c -RE_CONTEXT_INDEP_OPS regex.c -RE_SYNTAX_POSIX_BASIC regex.c -RE_SYNTAX_POSIX_EXTENDED regex.c -@dots{} -@end example - -@cindex numeric searches -Numbers are searched for numerically rather than textually. For example: - -@example -prompt$ lid 0xff -0377 @{lid,regex@}.c -0xff @{bitops,fid,lid,mkid@}.c -255 regex.c -@end example - -On the other hand, you can restrict a numeric search to a particular -radix if you want: - -@example -laurie$ lid -x 0xff -0xff @{bitops,fid,lid,mkid@}.c -@end example - -Filenames in the output are always adjusted to be correct for the -correct working directory. For example: - -@example -prompt$ lid bdevsw -bdevsw sys/conf.h cf/conf.c io/bio.c os/@{fio,main,prf,sys3@}.c -prompt$ cd io -prompt$ lid bdevsw -bdevsw ../sys/conf.h ../cf/conf.c bio.c ../os/@{fio,main,prf,sys3@}.c -@end example - - -@node gid invocation -@chapter @code{gid}: Listing matching lines - -Synopsis: - -@example -gid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}@dots{}] -@end example - -@code{gid} finds the identifiers in the database that match the -specified @var{pattern}s, then searches for all occurrences of those -identifiers, in only the files containing matches. In a large source -tree, this saves an enormous amount of time (compared to searching every -source file). - -With no @var{pattern} arguments, @code{gid} prints every line of every -source file. - -The name ``gid'' stands for ``grep for identifiers'', @code{grep} being -the standard utility to search regular files. +@menu +* lid aliases:: Aliases for specialized lid queries +* Emacs gid interface:: GNU Emacs query interface +* eid invocation:: Invoking an editor on query results +@end menu -@xref{Common query arguments}, for a description of the command-line -options and @var{pattern} arguments. +@c ************* gkm ********************************************************* +@node lid aliases +@section Aliases for Specialized @file{lid} Queries -@code{gid} uses the standard GNU output format for identifying source lines: +Historically, the ID utilities have provided several query interfaces +which are specializations of @code{lid} (@pxref{lid invocation}). -@example -@var{filename}:@var{linenum}: @var{text} -@end example +@table @file -Here is an example: +@item gid +(alias for @samp{lid -R grep}) +lists all lines containing the requested pattern. -@example -prompt$ gid FILE -ansi2knr.c:144: @{ FILE *in, *out; -ansi2knr.c:315: FILE *out; -fid.c:38: FILE *id_FILE; -@dots{} -@end example +@item eid +(alias for @samp{lid -R edit}) +invokes an editor on all files containing the requested pattern, and +optionally initiates a text search for that pattern. -@menu -* GNU Emacs gid interface:: Using next-error with gid. -@end menu +@item aid +(alias for @samp{lid -ils}) treats the requested pattern +as a case-insensitive literal substring. +@end table -@node GNU Emacs gid interface -@section GNU Emacs @code{gid} interface +@c *************************************************************************** +@node Emacs gid interface +@section GNU Emacs query interface @cindex Emacs interface to @code{gid} -@flindex gid.el @r{interface to Emacs} +@flindex id-utils.el @r{interface to Emacs} @vindex load-path -The @code{mkid} source distribution comes with a file @file{gid.el}, +The @code{id-utils} source distribution comes with a file @file{id-utils.el}, which defines a GNU Emacs interface to @code{gid}. To install it, put -@file{gid.el} somewhere that Emacs will find it (i.e., in your +@file{id-utils.el} somewhere that Emacs will find it (i.e., in your @code{load-path}) and put @example @@ -1041,334 +1023,241 @@ The @code{gid} function prompts you with the word around point. If you want to search for something else, simply delete the line and type the pattern of interest. -@flindex *scratch* @r{Emacs buffer} +@flindex *compilation* @r{Emacs buffer} The function then runs the @code{gid} program in a @samp{*compilation*} buffer, so the normal @code{next-error} function can be used to visit all the places the identifier is found (@pxref{Compilation,,, emacs, The GNU Emacs Manual}). - -@node Looking up identifiers -@chapter Looking up identifiers - -These commands look up identifiers in the ID database and operate on the -files containing matches. - -@menu -* lid invocation:: Matching patterns. -* aid invocation:: Matching strings. -* eid invocation:: Invoking an editor on matches. -* fid invocation:: Listing a file's identifiers. -@end menu - - -@node lid invocation -@section @code{lid}: Matching patterns - -@pindex lid - -Synopsis: - -@example -lid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c -@var{pattern}@dots{} -@end example - -@code{lid} searches the database for identifiers matching the given -@var{pattern} arguments and prints the names of the files that match -each @var{pattern}. With no @var{pattern}s, @code{lid} lists every -entry in the database. - -The name ``lid'' stands for ``lookup identifier''. - -@xref{Common query arguments}, for a description of the command-line -options and @var{pattern} arguments. - -By default, each line of output consists of an identifier and all the -files containing that identifier. - -Here is an example showing a search for a single identifier (omitting -some output to keep lines short): - -@example -prompt$ lid FILE -FILE extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c -@end example - -This example shows a regular expression search: - -@example -prompt$ lid 'FILE$' -AF_FILE mkid.c -AF_IDFILE mkid.c -FILE extern.h @{fid,gets0,getsFF,idx,init,lid,mkid,@dots{}@}.c -IDFILE id.h @{fid,lid,mkid@}.c -IdFILE @{fid,lid@}.c -@dots{} -@end example - -@noindent As you can see, when a regular expression is used, it is -possible to get more than one line of output. To merge multiple lines -into one, use @samp{-m}: - -@example -prompt$ lid -m ^get -^get extern.h @{bitsvec,fid,gets0,getsFF,getscan,idx,lid,@dots{}@}.c -@end example - - -@node aid invocation -@section @code{aid}: Matching strings - -@pindex aid - -Synopsis: - -@example -aid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-mewdoxaskgnc] @c -@var{string}@dots{} -@end example - -@cindex case-insensitive searching -@cindex string searching -@code{aid} searches the database for identifiers containing the given -@var{string} arguments. The search is case-insensitive. - -@flindex whatis -The name ``aid'' stands for ``apropos identifier'', @code{apropros} -being a command that does a similar search of the @code{whatis} database -of @code{man} descriptions. - -For example, @samp{aid get} matches the identifiers @code{fgets}, -@code{GETLINE}, and @code{getchar}. - -The default output format is the same as @code{lid}; see the previous -section. - -@xref{Common query arguments}, for a description of the command-line -options and @var{pattern} arguments. - - +@c ************* gkm ********************************************************* @node eid invocation -@section @code{eid}: Invoking an editor on matches +@section @code{eid}: Invoking an Editor on Query Results @pindex eid -Synopsis: - -@example -eid [-f@var{file}] [-u@var{n}] [-r@var{dir}] [-doxasc] [@var{pattern}]@dots{} -@end example - -@code{eid} runs the usual search (@pxref{lid invocation}) on the given -arguments, shows you the output, and then asks: +@samp{lid -R edit} is an editing interface for the ID utilities that is +most commonly used with @file{vi}. Emacs users should use the interface +defined in @code{id-utils.el} (@pxref{Emacs gid interface}). The ID +utilities include an alias called @file{eid}, and for the sake of +brevity, we'll use this alias for the remainder of this section. +@file{eid} performs a @file{lid}-style, then asks if you wish to edit +the files. If your query yields more than one line of output, you will +be prompted after each line. This is the prompt you'll see: @example -Edit? [y1-9^S/nq] +Edit? [y1-9^S/nq] @end example @noindent -You can respond with: +You may respond with: @table @samp + @item y Edit all files listed. @item 1@dots{}9 Edit all files starting at the @math{@var{n} + 1}'st file. -@item /@var{string} @r{or} @kbd{CTRL-S}@var{string} -Edit all files whose name contains @var{string}. +@item /@var{string} @r{or} @kbd{CTRL-S}@var{regexp} +Search into the file list, and begin editing with the first file name +that matches the regular expression @var{regexp}. @item n -Go on to the next @var{pattern}, i.e., edit no files for this one. +Don't edit any files. If another line of query output is pending, +advance to that line, for which another @samp{Edit?} prompt will appear. @item q -Quit @code{eid}. +Quit---don't edit any files, and don't process any more lines of query +output. @end table -@code{eid} invokes an editor once per @var{pattern}; all the specified -files are given to the editor for you to edit simultaneously. +Here is an example: -@code{eid} invokes the editor defined by the @samp{EDITOR} environment -variable. If the editor can accept an initial search argument on the -command line, @code{eid} moves automatically to the location of the -match, via the environment variables below. +@example +prompt$ eid FILE \^print +FILE @{ansi2knr,fid,filenames,idfile,idx,lid,misc,@dots{}@}.c +Edit? [y1-9^S/nq] n +^print @{ansi2knr,fid,getopt,getopt1,lid,mkid,regex,scanners@}.c +Edit? [y1-9^S/nq] 2 +@end example -@xref{Common query arguments}, for a description of the command-line -options and @var{pattern} arguments. +@noindent This will start editing at @file{getopt}.c. -Here are the environment variables relevant to @code{eid}: +@code{eid} invokes the editor defined by the environment variable +@samp{VISUAL}. If @samp{VISUAL} is undefined, it uses the environment +variable @samp{EDITOR} instead. If @samp{EDITOR} is undefined, it +defaults to @file{vi}. It is possible for @file{eid} to pass the editor +an initial search pattern so that your cursor will immediately alight on +the token of interest. This feature is controlled by the following +environment variables: @table @samp -@item EDITOR -@vindex EDITOR -The name of the editor program to invoke. - @item EIDARG @vindex EIDARG -@cindex search for identifier, initial -The argument to pass to the editor to search for the matching -identifier. For @code{vi}, this should be @samp{+/%s/'}. +@cindex search for token, initial +A printf(3) format string for the editor argument to search for the +matching token. For @code{vi}, this should be @samp{+/%s/}. @item EIDLDEL @vindex EIDLDEL @cindex left delimiter editor argument @cindex beginning-of-word editor argument -A regular expression to force a match at the beginning of a word (``left -delimiter). @code{eid} inserts this in front of the matching identifier -when composing the search argument. For @code{vi}, this should be -@samp{\<}. +The regular-expression meta-character(s) for delimiting the beginning of +a word (the `@file{eid} Left DELimiter'). @code{eid} inserts this in +front of the matching token when a word-search is desired. For +@file{vi}, this should be @samp{\<}. @item EIDRDEL @vindex EIDRDEL @cindex right delimiter editor argument @cindex end-of-word editor argument -The end-of-word regular expression. For @code{vi}, this should be -@samp{\>}. +The regular-expression meta-character(s) for delimiting the end of +a word (the `@file{eid} Right DELimiter'). @code{eid} inserts this in +end of the matching token when a word-search is desired. For +@file{vi}, this should be @samp{\>}. @end table -For Emacs users, the interface in @code{gid.el} is probably preferable -to @code{eid}. @xref{GNU Emacs gid interface}. - - -Here is an example: - -@example -prompt$ eid FILE \^print -FILE @{ansi2knr,fid,filenames,idfile,idx,lid,misc,@dots{}@}.c -Edit? [y1-9^S/nq] n -^print @{ansi2knr,fid,getopt,getopt1,lid,mkid,regex,scanners@}.c -Edit? [y1-9^S/nq] 2 -@end example - -@noindent This will start editing at @file{getopt}.c. - - +@c ************* gkm ********************************************************* @node fid invocation -@section @code{fid}: Listing a file's identifiers +@chapter @code{fid}: Listing a file's tokens @pindex fid -@cindex identifiers in a file - -@code{fid} lists the identifiers found in a given file. Synopsis: - -@example -fid [-f@var{dbfile}] @var{file1} [@var{file2}] -@end example - -@table @samp - -@item -f@var{dbfile} -Read the database from @var{dbfile} instead of @file{ID}. - -@item @var{file1} -List all the identifiers contained in @var{file1}. +@cindex tokens in a file +@cindex tokens common to two files -@item @var{file2} -With a second file argument, list only the identifiers both files have -in common. +@file{fid} prints the tokens found in a given file. If two file names +are passed on the command line, @file{fid} prints the tokens that are +common to both files (i.e., the @emph{set intersection} of the two token +sets). -@end table - -The output is simply one identifier (or number) per line. +@file{lid} reads the ID database, therefore it accepts the @samp{--file} +option, and consults the @samp{IDPATH} environment variable, as +described in @ref{Reading options}. +If the standard output is attached to a terminal, the printed tokens are +separated by spaces. Otherwise, the tokens are printed one per line. -@node pid invocation -@chapter @code{pid}: Looking up filenames +@c ************* gkm ********************************************************* +@node fnid invocation +@chapter @code{fnid}: Looking up filenames -@pindex pid +@pindex fnid @cindex filenames, matching @cindex matching filenames -@code{pid} matches the filenames stored in the ID database, rather than -the identifiers. Synopsis: +@code{fnid} queries the list of file names stored in the ID database. +It accepts shell @emph{wildcard} patterns on the command line. If no +pattern is supplied, @file{*} is implied. @file{fnid} prints the +file names that match the given patterns. -@example -pid [-f@var{dbfile}] [-r@var{dir}] [-ebkgnc] @var{wildcard}@dots{} -@end example - -By default, the @var{wildcard} patterns are treated as shell globbing -patterns, rather than the regular expressions the other utilities -accept. See the section below for details. - -Besides the standard options given in the synopsis (@pxref{Query -options}), @code{pid} accepts the following: - -@table @samp - -@item -e -@opindex -e -Do the usual regular expression matching (@pxref{Patterns}), instead -of shell wildcard matching. - -@item -b -@opindex -b -@cindex basename match -Match the basenames of the files in the database. For example, -@samp{pid -b foo} will match the stored filename @file{dir/foo}, but not -@file{foo/file}. - -@end table +@code{fnid} prints file names, and as such accepts the +@samp{--separator} option as described in @ref{File listing options}. For example, the command: @example -pid \*.c +fnid \*.c @end example @noindent lists all the @file{.c} files in the database. (The @samp{\} here protects the @samp{*} from being expanded by the shell.) -@menu -* Wildcard patterns:: Shell-style globbing patterns. -@end menu +@c ************* gkm ********************************************************* +@node xtokid invocation +@chapter @file{xtokid}: Testing Language Scanners +@file{xtokid} accepts the names of files and/or directories on the +command line, then extracts and prints a stream of tokens from those +files for which it has a valid, enabled scanner. This is useful +primarily for debugging new @file{mkid} scanners (@pxref{Defining +scanners}). -@node Wildcard patterns -@section Wildcard patterns +@file{xtokid} extracts tokens from source files, therefore it accepts +the @samp{--lang-map}, @samp{--include}, @samp{--exclude}, and +@samp{--lang-option} options, as well as the language-specific scanner +options, all of which are described in @ref{Extraction options}. +@file{xtokid} walks file trees, therefore it handles file and directory +names on its command line and the @samp{--prune} option as described in +@ref{Walker options}. -@cindex globbing patterns -@cindex shell wildcard patterns -@cindex wildcard wildcard patterns +The name @samp{xtokid} indicates that it is the ``eXtract TOKens ID +utility''. -@code{pid} does simplified shell wildcard matching (unless the @samp{-e} -option is specified), rather than the regular expression matching done -by the other utilities. Here is a description of wildcard matching, -also called @dfn{globbing}: +@c ************* gkm ********************************************************* +@node Past and Future +@chapter Past and Future -@itemize @bullet +@cindex history -@item -@kindex * @r{in globbing} -@samp{*} matches zero or more characters. +@pindex look @r{and @file{mkid} 1} +@cindex McGary, Greg +Greg McGary conceived of the ideas behind the ID utilities when he +began working on the Unix kernel in 1984. He needed a navigation tool +to help him find his way around the expansive, unfamiliar landscape. +The first @code{id-utils}-like tools were shell scripts, and produced an +ASCII database that looks much like the output of @samp{lid ".*"}. It +took over an hour on a @sc{vax 11/750} to build a database for a +@sc{4.1bsd} derived kernel. The first version of @file{lid} used the +@sc{unix} system utility @code{look}, modified to handle very long +lines. + +In 1986, Greg rewrote the shell scripts in C to improve performance. +Build times for the ID file were shortened by an order of magnitude. +The ID utilities were first posted to @samp{comp.sources.unix} in +September 1987 under the name @code{id}. -@item -@kindex ? @r{in globbing} -@samp{?} matches any single character. +@cindex Horsley, Tom +@cindex Scofield, Doug +@cindex Leonard, Bill +@cindex Berry, Karl +Over the next few years, several versions diverged from the original +source. Tom Horsley at Harris Computer Systems Division stepped forward +to take over maintenance and integrated some of the fixes from divergent +versions. A first release of the renamed @file{mkid} @w{version 2} was +posted to @file{alt.sources} near the end of 1990. At that time, Tom +wrote a Texinfo manual with the encouragement the net community. +(Tom especially thanks Doug Scofield and Bill Leonard whom he dragooned +into helping poorfraed and edit---they found several problems in the +initial version.) Karl Berry revamped the manual for Texinfo style, +indexing, and organization in 1995. + +In January 1995, Greg McGary reemerged as the primary maintainer and +launched development of @file{mkid} version 3, whose primary new feature +is an efficient algorithm for building databases that is linear in both +time and space over the size of the input text. (The old algorithm was +quadratic in space so it was incapable of handling very large source +trees.) For the first time, the code was released under the GNU Public +License. + +In June 1996, the package was renamed again to @code{id-utils} and was +released for the first time under FSF copyright as part of the GNU +system. All programs had their command-line arguments completely +revised. The @file{mkid} and @file{xtokid} programs also gained a +file-tree walker, so that directory names can be passed on the command +line instead of the names of every individual file. Greg reorganized +and rewrote most of the Texinfo manual to reflect these changes. -@item -@kindex \ @r{in globbing} -@samp{\} forces the next character to be taken literally. +@pindex cscope +@pindex grep +@cindex future +Future releases of @code{id-utils} might include: -@item -@kindex [@dots{}] @r{in globbing} -@samp{[@var{chars}]} matches any single character listed in @var{chars}. +@table @bullet -@item -@kindex [!@dots{}] @r{in globbing} -@samp{[!@var{chars}]} matches any character @emph{not} listed in @var{chars}. +an optional coupling with GNU @code{grep}, so that @code{grep} can use +an ID database for hints -@end itemize +a @code{cscope} work-alike query interface -Most shells treat @samp{/} and leading @samp{.} characters -specially. @code{pid} does not do this. It simply matches the filename -in the database against the wildcard pattern. +incremental update of the ID database. +@end table +@c *************************************************************************** @node Index @unnumbered Index |