diff options
Diffstat (limited to 'doc/id-utils.info')
-rw-r--r-- | doc/id-utils.info | 1890 |
1 files changed, 910 insertions, 980 deletions
diff --git a/doc/id-utils.info b/doc/id-utils.info index dcc76e8..abbc871 100644 --- a/doc/id-utils.info +++ b/doc/id-utils.info @@ -2,20 +2,17 @@ This is Info file ../../doc/id-utils.info, produced by Makeinfo-1.63 from the input file ../../doc/id-utils.texi. START-INFO-DIR-ENTRY -* ID database: (id). Identifier database utilities. -* aid: (id)aid invocation. Matching strings. -* eid: (id)eid invocation. Invoking an editor on matches. -* fid: (id)fid invocation. Listing a file's identifiers. -* gid: (id)gid invocation. Listing all matching lines. -* idx: (id)idx invocation. Testing mkid scanners. -* lid: (id)lid invocation. Matching patterns. -* mkid: (id)mkid invocation. Creating an ID database. -* pid: (id)pid invocation. Looking up filenames. +* ID database: (id-utils). Identifier database utilities. +* mkid: (id-utils)mkid invocation. Creating an ID database. +* lid: (id-utils)lid invocation. Matching words and patterns. +* fid: (id-utils)fid invocation. Listing a file's tokens. +* fnid: (id-utils)fnid invocation. Looking up file names. +* xtokid: (id-utils)xtokid invocation. Testing mkid scanners. END-INFO-DIR-ENTRY - This file documents the `mkid' identifier database utilities. + This file documents the `id-utils' database utilities. - Copyright (C) 1991, 1995 Tom Horsley. + Copyright (C) 1996 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are @@ -34,773 +31,738 @@ translation. File: id-utils.info, Node: Top, Next: Introduction, Up: (dir) -ID database utilities -********************* +ID utilities +************ - This manual documents version 3.0.9 of the ID database utilities. + This manual documents version 3.1 of the ID utilities. * Menu: -* Introduction:: Overview of the tools, and authors. +* Introduction:: Overview of the tools with tutorial. +* Quick start:: Quick start procedure. +* Common options:: Common command-line options. * mkid invocation:: Creating an ID database. -* Common query arguments:: Common lookup options and search patterns. -* gid invocation:: Listing all matching lines. -* Looking up identifiers:: lid, aid, eid, and fid. -* pid invocation:: Looking up filenames. +* lid invocation:: Querying an ID database by token. +* fid invocation:: Listing a file's tokens. +* fnid invocation:: Looking up file names. +* xtokid invocation:: Testing language scanners. +* Past and Future:: History and future directions. * Index:: General index. -File: id-utils.info, Node: Introduction, Next: mkid invocation, Prev: Top, Up: Top +File: id-utils.info, Node: Introduction, Next: Quick start, Prev: Top, Up: Top Introduction ************ - An "ID database" is a binary file containing a list of filenames, a -list of identifiers, and a matrix indicating which identifiers appear in -which files. With this database and some tools to manipulate it -(described in this manual), a host of tasks become simpler and faster. -For example, you can list all files containing a particular `#include' -throughout a huge source hierarchy, search for all the memos containing -references to a project, or automatically invoke an editor on all files -containing references to some function. Anyone with a large software -project to maintain, or a large set of text files to organize, can -benefit from an ID database. + An "ID database" is a binary file containing a list of file names, a +list of tokens, and a sparse matrix indicating which tokens appear in +which files. - Although the ID utilities are most commonly used with identifiers, -numeric constants are also stored in the database, and can be searched -for in the same way (independent of radix, if desired). + With this database and some tools to query it (described in this +manual), many text-searching tasks become simpler and faster. For +example, you can list all files that reference a particular `#include' +file throughout a huge source hierarchy, search for all the memos +containing references to a project, or automatically invoke an editor +on all files containing references to some function or variable. +Anyone with a large software project to maintain, or a large set of text +files to organize, can benefit from the ID utilities. - There are a number of programs in the ID family: + Although the name `ID' is short for `identifier', the ID utilities +handle more than just identifiers; they also treat other kinds of +tokens, most notably numeric constants, and the contents of certain +character strings. Thus, this manual will use the word "token" as a +term that is inclusive of identifiers, numbers and strings. -`mkid' - scans files for identifiers and numeric constants and builds the ID - database file. + There are several programs in the ID utilities family: -`gid' - lists all lines that match given patterns. +`mkid' + scans files for tokens and builds the ID database file. `lid' - lists the filenames containing identifiers that match given - patterns. + queries the ID database for tokens, then reports matching file + names or matching lines. -`aid' - lists the filenames containing identifiers that contain given - strings, independent of case. +`fid' + lists all tokens recorded in the database for given files, or + tokens common to two files. -`eid' - invokes an editor on each file containing identifiers that match - given patterns. +`fnid' + matches the file names in the database, rather than the tokens. -`fid' - lists all identifiers recorded in the database for given files, or - identifiers common to two files. +`xtokid' + extracts raw tokens--helps with testing of new `mkid' scanners. -`pid' - matches the filenames in the database, rather than the identifiers. + In addition, the ID utilities have historically provided several +query programs which are specializations of `lid': + +`gid' + (alias for `lid -R grep') lists all lines containing the requested + pattern. -`idx' - helps with testing of new `mkid' scanners. +`eid' + (alias for `lid -R edit') invokes an editor on all files + containing the requested pattern, and if possible, initiates a + text search for that pattern. + +`aid' + (alias for `lid -ils') treats the requested pattern as a + case-insensitive literal substring. - Please report bugs to `gkm@magilla.cichlid.com'. Remember to + Please report bugs to `bug-gnu-utils@gnu.ai.mit.edu'. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. *Note Bugs: (gcc)Bugs. -* Menu: - -* Past and future:: How the ID tools came about, and where they're going. - - -File: id-utils.info, Node: Past and future, Up: Introduction - -Past and future -=============== - - Greg McGary conceived of the ideas behind mkid when he began hacking -the Unix kernel in 1984. He needed a navigation tool to help him find -his way around the expansive, unfamiliar landscape. The first -`mkid'-like tools were shell scripts, and produced an ASCII database -that looks much like the output of `lid' with no arguments. It took -over an hour on a VAX 11/750 to build a database for a 4.1BSD-ish -kernel. Lookups were done with the system utility `look', modified to -handle very long lines. - - In 1986, Greg rewrote `mkid', `lid', `fid' and `idx' in C to improve -performance. Database-build times were shortened by an order of -magnitude. The `mkid' tools were first posted to `comp.sources.unix' -in September 1987. - - Over the next few years, several versions diverged from the original -source. Tom Horsley at Harris Computer Systems Division stepped forward -to take over maintenance and integrated some of the fixes from divergent -versions. A first release of `mkid' version 2 was posted to -`alt.sources' near the end of 1990. At that time, Tom wrote this -Texinfo manual with the encouragement the net community. (Tom -especially thanks Doug Scofield and Bill Leonard whom he dragooned into -helping poorfraed and edit--they found several problems in the initial -version.) Karl Berry revamped the manual for Texinfo style, indexing, -and organization in 1995. - - In January 1995, Greg McGary reemerged as the primary maintaner and -launched development of `mkid' version 3, whose primary new feature is -an efficient algorithm for building databases that is linear in both -time and space over the size of the input text. (The old algorithm was -quadratic in space and therefore choked on very large source trees.) -The code is released under the GNU Public License, and might become a -part of the GNU system. `mkid' 3 is an interim release, since several -significant enhancements are still in the works: an optional coupling -with GNU `grep', so that `grep' can use an ID database for hints; a -`cscope' work-alike query interface; incremental update of the ID -database; and an automatic file-tree walker so you need not explicitly -supply every filename argument to the `mkid' program. - - -File: id-utils.info, Node: mkid invocation, Next: Common query arguments, Prev: Introduction, Up: Top - -`mkid': Creating ID databases -***************************** - - The `mkid' program builds an ID database. To do this, it must scan -each file you tell it to include in the database. This takes some time, -but once the work is done the query programs run very rapidly. (You can -run `mkid' as a `cron' job to regularly update your databases.) - - The `mkid' program knows how to extract identifiers from various -types of files. For example, it can recognize and skip over comments -and string constants in a C program. - - Identifiers are not the only thing included in the database. Numbers -are also recognized and included in the database indexed by their binary -value. This feature allows you to find uses of constants without regard -to the radix used to specify them, since the same number can frequently -be written in many different ways (for instance, `47', `0x2f', `057' in -C). - - All the places in this document which mention identifiers should -really mention both identifiers and numbers, but that gets fairly -clumsy after a while, so you just need to keep in mind that numbers are -included in the database as well as identifiers. - - The ID files that `mkid' creates are architecture- and -byte-order-independent; you can share them at will across systems. - -* Menu: - -* mkid options:: Command-line options to mkid. -* Scanners:: Built-in and defining your own. -* mkid examples:: Examples of mkid usage. - -File: id-utils.info, Node: mkid options, Next: Scanners, Up: mkid invocation +File: id-utils.info, Node: Quick start, Next: Common options, Prev: Introduction, Up: Top -`mkid' options -============== - - By default, `mkid' scans the files you specify and writes the -database to a file named `ID' in the current directory. - - mkid [-v] [-SSCANARG] [-aARGFILE] [-] [-fIDFILE] FILES... +Quick Start Procedure +********************* - The program accepts the following options. + Unpack the distribution. -`-v' - Verbose. `mkid' tells you as it scans each file and indicates - which scanner it is using. It also summarizes some statistics - about the database at the end. + Type `./configure' -`-SSCANARG' - Specify options regarding `mkid''s scanners. *Note Scanner option - formats::. + Type `make' -`-aARGFILE' - Read additional command line arguments from ARGFILE. This is - typically used to specify lists of filenames longer than will fit - on a command line; some systems have severe limitations on the - total length of a command line. + Type `make install' as a user with the appropriate privileges + (e.g., `bin' or perhaps even `root'). -`-' - Read additional command line arguments from standard input. + Type `cd /usr/include; mkid' to build an ID database covering all + of the system header files. -`-fIDFILE' - Write the database to the file IDFILE, instead of `ID'. The - database stores filenames relative to the directory containing the - database, so if you move the database to a different directory - after creating it, you may have trouble finding files. + Type `lid FILE', then `gid strtok', then `aid stdout'. - The remaining arguments FILES are the files to be scanned and -included in the database. If no files are given at all (either on -command line or via `-a' or `-'), `mkid' does nothing. + You have just built, installed and used the most common commands of +the GNU ID utilities. If you ever need help remembering which system +header files contain a particular declaration, or reference a +particular symbol, you'll want to keep the ID file you built in +`/usr/include' for later use. If your working directory is elsewhere +at the time, simply provide the `-f /usr/include' option to `lid' +(*note Reading options::.). -File: id-utils.info, Node: Scanners, Next: mkid examples, Prev: mkid options, Up: mkid invocation +File: id-utils.info, Node: Common options, Next: mkid invocation, Prev: Quick start, Up: Top -Scanners -======== +Common command-line options +*************************** - To determine which identifiers to extract from a file and store in -the database, `mkid' calls a "scanner"; we say a scanner "recognizes" a -particular language. Scanners for several languages are built-in to -`mkid'; you can add your own scanners as well, as explained in the -sections below. - - `mkid' determines which scanner to use for a particular file by -looking at the suffix of the filename. This "suffix" is everything -after and including the last `.' in a filename; for example, the suffix -of `foo.c' is `.c'. `mkid' has a built-in list of bindings from some -suffixes to corresponding scanners; for example, `.c' files are (not -surprisingly) scanned by the predefined C language scanner. - - If `mkid' cannot determine what scanner to use for a particular -file, either because the file has no suffix (e.g., `foo') or because -`mkid' has no binding for the file's suffix (e.g., `foo.bar'), it uses -the scanner bound to the `.default' suffix. By default, this is the -plain text scanner (*note Plain text scanner::.), but you can change -this with the `-S' option, as explained below. + Certain options, and regular expression syntax, are shared by various +groupings of the ID utilities. We describe these in the sections below, +rather than repeating them for each program. * Menu: -* Scanner option formats:: Overview of the -S option. -* Predefined scanners:: The C, plain text, and assembler scanners. -* Defining new scanners:: Either in source code or at runtime with -S. -* idx invocation:: Testing mkid scanners. +* Universal options:: Options common to all programs. +* Extraction options:: Options for programs that extract tokens from source files. +* Walker options:: Options for programs that walk file and directory trees. +* Reading options:: Options for programs that read ID databases. +* Writing options:: Options for programs that write ID databases. +* File listing options:: Options for programs that list file names. -File: id-utils.info, Node: Scanner option formats, Next: Predefined scanners, Up: Scanners - -Scanner option formats ----------------------- - - With the `-S' option, you can change which language scanner to use -for which files, give language-specific options, and get some limited -online help about scanner options. - - Here are the different forms of the `-S' option: +File: id-utils.info, Node: Universal options, Next: Extraction options, Up: Common options -`-S.SUFFIX=SCANNER' - Use SCANNER for a file with the given `.SUFFIX'. For example, - `-S.yacc=c' tells `mkid' to use the `c' language scanner for all - files ending in `.yacc'. +Options Common to All Programs +============================== -`-S.SUFFIX=?' - Display which scanner is used for the given `.SUFFIX'. +`--help' + Print a usage message listing all available options, then exit + successfully. -`-S?=SCANNER' - Display which suffixes SCANNER is used for. - -`-S?=?' - Display the scanner binding for every known suffix. - -`-SSCANNER+ARG' -`-SSCANNER-ARG' - Each scanner accepts certain scanner-dependent arguments. These - options all have one of these forms. *Note Predefined scanners::. - -`-SSCANNER?' - Display the scanner-specific options accepted by SCANNER. - -`-SNEW-SCANNER/OLD-SCANNER/FILTER-COMMAND' - Define NEW-SCANNER in terms of OLD-SCANNER and FILTER-COMMAND. - *Note Defining scanners with options::. +`--version' + Print the version number, then exit successfully. -File: id-utils.info, Node: Predefined scanners, Next: Defining new scanners, Prev: Scanner option formats, Up: Scanners +File: id-utils.info, Node: Reading options, Next: Writing options, Prev: Walker options, Up: Common options -Predefined scanners -------------------- +Options for Programs that Read ID Databases +=========================================== - `mkid' has built-in scanners for several types of languages; you can -get the list by running `mkid -S?=?'. The supported languages are -documented below(1). +`-f FILENAME' +`--file=FILENAME' + FILENAME is the ID database to read when processing queries. At + present, only a single `--file' option is processed, but in future + releases, more than one ID database may be named on the command + line. -* Menu: +`$IDPATH' + `IDPATH' is an environment variable that contains a + colon-separated list of ID database names. If this variable is + present, and no `--file' options are presented on the command + line, the ID databases named in `IDPATH' are implied.(1) -* C scanner:: For the C programming language. -* Plain text scanner:: For documents or other non-source code. -* Assembler scanner:: For assembly language. + If no ID databases are specified either on the command line or via +the `IDPATH' environment variable, then the ID utilities search for a +file named `ID' in the current working directory, and then in +successive parent directories. ---------- Footnotes ---------- - (1) This is not strictly true: `vhil' is a supported language, but -it is an obsolete and arcane dialect of C and should be ignored. + (1) At present, this feature is fully implemented, since only the +first of a list of ID database names is processed. -File: id-utils.info, Node: C scanner, Next: Plain text scanner, Up: Predefined scanners - -C scanner -......... +File: id-utils.info, Node: Writing options, Next: File listing options, Prev: Reading options, Up: Common options - The C scanner is the most commonly used. Files with the usual `.c' -and `.h' suffixes, and the `.y' (yacc) and `.l' (lex) suffixes, are -processed with this scanner (by default). +Options for Programs that Write ID Databases +============================================ - Scanner-specific options: +`-o FILENAME' +`--output=FILENAME' + The `--output' option names the file in which to write a new ID + database. If no `--output' (or `--file') option is present, an + output file named `ID' is implied. -`-Sc-sCHARACTER' - Allow the specified CHARACTER in identifiers. For example, if you - use `$' in identifiers, you'll want to use `-Sc-s$'. - -`-Sc+u' - Strip leading underscores from identifiers. You might to do this in - peculiar circumstances, such as trying to parse the output from - `nm' or some other system utility. - -`-Sc-u' - Don't strip leading underscores from identifiers; this is the - default. +`-f FILENAME' +`--file=FILENAME' + This is a synonym for `--output' -File: id-utils.info, Node: Plain text scanner, Next: Assembler scanner, Prev: C scanner, Up: Predefined scanners - -Plain text scanner -.................. - - The plain text scanner is intended for scanning most non-source-code -files. This is typically the scanner used when adding custom scanners -via `-S' (*note Defining scanners with options::.). +File: id-utils.info, Node: Walker options, Next: Reading options, Prev: Extraction options, Up: Common options - Scanner-specific options: +Options for Programs that Walk File and Directory Trees. +======================================================== -`-Stext+aCHARACTER' - Include CHARACTER in identifiers. By default, letters (a-z and - A-Z) and underscore are included. + The programs `mkid' and `xtokid' accept the names of files and +directories on the command line. Files are scanned if there is a +scanner available and enabled for the file's source language. +Directories are recursively descended, searching for files whose names +match the rules listed in the *language map* file (*note Language +map::.). -`-Stext-aCHARACTER' - Exclude CHARACTER from identifiers. + The following option controls the file tree walker: -`-Stext+sCHARACTER' - Squeeze CHARACTER from identifiers, i.e., do not terminate an - identifier when CHARACTER is seen. By default, the characters - `'', `-', and `.' are squeezed out of identifiers. For example, - the input `fred's' leads to the identifier `freds'. - -`-Stext-sCHARACTER' - Do not squeeze CHARACTER. +`-p NAMES' +`--prune=NAMES' + One or more file or directory names may appear in NAMES. The file + tree walker will stop short at these files and directories and + their contents will not be scanned. -File: id-utils.info, Node: Assembler scanner, Prev: Plain text scanner, Up: Predefined scanners - -Assembler scanner -................. - - Since assembly languages come in several flavors, this scanner has a -number of options: +File: id-utils.info, Node: File listing options, Prev: Writing options, Up: Common options -`-Sasm-cCHARACTER' - Define CHARACTER as starting a comment that extends to the end of - the input line; no default. In many assemblers this is `;' or `#'. +Options for Programs that List File Names +========================================= -`-Sasm+u' -`-Sasm-u' - Strip (`+u') or do not strip (`-u') leading underscores from - identifiers. The default is to strip them. + The programs `lid' and `fnid' can print lists of file names as the +result of queries. The following option controls how these lists are +formatted: -`-Sasm+aCHARACTER' - Allow CHARACTER in identifiers. +`-S STYLE' +`--separator=STYLE' + STYLE may be one of `braces', `space' or `newline'. -`-Sasm-aCHARACTER' - Allow CHARACTER in identifiers, but if an identifier contains - CHARACTER, ignore it. This is useful to ignore temporary labels, - which can be generated in great profusion; these often contain `.' - or `@'. + The STYLE of `braces' means that file names with common directory + prefix and common suffix are printed using the shell's brace + notation in order to compress the output. For example, + `../src/foo.c ../src/bar.c' can be printed in brace notation as + `../src/{foo,bar}.c'. -`-Sasm+p' -`-Sasm-p' - Recognize (`+p') or do not recognize (`-p') C preprocessor - directives in assembler source. The default is to recognize them. + The STYLEs of `space' and `newline' mean that file names are + separated spaces or by newlines, respectively. -`-Sasm+C' -`-Sasm-C' - Skip over (`+C') or do not skip over (`-C') C style comments in - assembler source. The default is to skip them. + If the list of files is being printed on a terminal, brace + notation is the default. If not, file names are separated by + spaces if the KEY is included in the output, and by newlines the + KEY STYLE is `none' (*note lid invocation::.). -File: id-utils.info, Node: Defining new scanners, Next: idx invocation, Prev: Predefined scanners, Up: Scanners - -Defining new scanners ---------------------- - - You can add new scanners to `mkid' in two ways: modify the source -code and recompile, or at runtime via the `-S' option. Each has their -advantages and disadvantages, as explained below. +File: id-utils.info, Node: Extraction options, Next: Walker options, Prev: Universal options, Up: Common options + +Options for Programs that Scan Source Files +=========================================== + + `mkid' and `xtokid' walk file trees, select source files by name, +and extract tokens from source files. They accept the following +options: + +`-m MAPFILE' +`--lang-map=MAPFILE' + MAPFILE contains rules for determining the source languages from + file names. *Note Language map:: + +`-i LANGUAGES' +`--include=LANGUAGES' + The `--include' option names LANGUAGES whose source files should + be scanned and incorporated into the ID database. By default, all + languages known to the ID utilities are enabled. + +`-x LANGUAGES' +`--exclude=LANGUAGES' + The `--exclude' option names LANGUAGES whose source files should + NOT be scanned. The default list of excluded languages is empty. + Note that only one of `--include' or `--exclude' may be specified + on the command line for a single run. + +`-l LANGUAGE:OPTIONS' +`--lang-option=LANGUAGE:OPTIONS' + Language-specific scanners also accept options. LANGUAGE denotes + the desired scanner, and OPTION are the command-line options that + should be passed through to it. For example, to pass the -X + -COKE-BOTTLE options to the scanner for the language SWIZZLE, pass + this: -L SWIZZLE:"-X -COKE-BOTTLE", or this: + -LANG-OPTION=SWIZZLE:"-X -COKE-BOTTLE", or this: -L SWIZZLE-X -L + SWIZZLE:-COKE-BOTTLE. Use the `--help' option to see the + command-line option summary for + + To determine which tokens to extract from a file and store in the +database, `mkid' calls a "scanner"; we say a scanner "recognizes" a +particular language. Scanners for several languages are built-in to +`mkid'; you can add your own scanners as well, as explained in *Note +Defining scanners::. - If you create a new scanner that would be of use to others, please -consider sending it back to the maintainer, `gkm@magilla.cichlid.com', -for inclusion in future releases of `mkid'. + The ID utilities determine which scanner to use for a particular +file by consulting the language-map file. Scanners for several are +already built-in to the ID utilities. You can see which languages have +built-in scanners, and examine their language-specific options by +invoking `mkid --help' or `xtokid --help'. * Menu: -* Defining scanners in source code:: -* Defining scanners with options:: +* Language map:: Mapping file names to source languages. +* C/C++ scanner:: For the C and C++ programming language. +* Assembler scanner:: For assembly language. +* Text scanner:: For documents or other non-source code. +* Defining scanners:: Defining new scanners in the source code. -File: id-utils.info, Node: Defining scanners in source code, Next: Defining scanners with options, Up: Defining new scanners - -Defining scanners in source code -................................ - - To add a new scanner in source code, you should add a new section to -the file `scanners.c'. Copy one of the existing scanners (most likely -either C or plain text), and modify as necessary. Also add the new -scanner to the `languages_0' and `suffixes_0' tables near the beginning -of the file. - - This is not a terribly difficult programming task, but it requires -recompiling and installing the new version of `mkid', which may be -inconvenient. - - This method leads to scanners which operate much more quickly than -ones that depend on external programmers. It is also likely the -easiest way to define scanners for new programming languages. +File: id-utils.info, Node: Language map, Next: C/C++ scanner, Up: Extraction options + +Mapping file names to source languages +-------------------------------------- + + The file `id-lang.map', installed by default in +`$(prefix)/share/id-lang.map', contains rules for mapping file names to +source languages. Each rule comprises three parts: a shell GLOB +pattern, a language name, and language-specific scanner options. + + The special pattern `**' denotes the default source language. This +is the language that's assigned to file names that don't match any other +pattern. + + The special pattern `***' should be followed by a file name. The +named file should contain more language-map rules and is included at +this point. + + The order in which rules are presented in a language-map file is +significant. This order influences the order in which files are +displayed as the result of queries. For example, the distributed +language-map file places all rules for C .H files ahead of .C files, so +that in general, declarations will precede definitions in query output. +The same thing is done for C++ and its many different source file name +extensions. + + Here is a pared-down version of the `id-lang.map' file distributed +with the ID utilities: + + + # Default language + ** IGNORE # Although this is listed first, + # the default language pattern is + # logically matched last. + + # Backup files + *~ IGNORE + *.bak IGNORE + *.bk[0-9] IGNORE + + # SCCS files + [sp].* IGNORE + + # list header files before code files + *.h C + *.h.in C + *.H C++ + *.hh C++ + *.hpp C++ + *.hxx C++ + + # list C `meta' files next + *.l C + *.lex C + *.y C + *.yacc C + + # list C code files after header files + *.c C + *.C C++ + *.cc C++ + *.cpp C++ + *.cxx C++ + + # list assembly language after C + *.[sS] asm --comment=; + *.asm asm --comment=; + + # [nt]roff + *.[0-9] roff + *.ms roff + *.me roff + *.mm roff + + # TeX and friends + *.tex TeX + *.ltx TeX + *.texi texinfo + *.texinfo texinfo -File: id-utils.info, Node: Defining scanners with options, Prev: Defining scanners in source code, Up: Defining new scanners - -Defining scanners with options -.............................. - - You can use the `-S' option on the command line to define a new -language scanner: - - -SNEW-SCANNER/EXISTING-SCANNER/FILTER - -Here, NEW-SCANNER is the name of the new scanner being defined, -EXISTING-SCANNER is the name of an existing scanner, and FILTER is a -shell command or pipeline. +File: id-utils.info, Node: C/C++ scanner, Next: Assembler scanner, Prev: Language map, Up: Extraction options - The new scanner works by passing the input file to FILTER, and then -arranging for the result to be passed through EXISTING-SCANNER. -Typically, EXISTING-SCANNER is `text'. - - Somewhere within FILTER, the string`%s' should occur. This `%s' is -replaced by the name of the source file being scanned. - - For example, `mkid' has no built-in scanner for Texinfo files (like -this one). In indexing a Texinfo file, you most likely would want to -ignore the Texinfo @-commands. Here's one way to specify a new scanner -to do this: - - -S/texinfo/text/sed s,@[a-z]*,,g %s - - This defines a new language scanner (`texinfo') defined in terms of -a `sed' command to strip out Texinfo directives (an `@' character -followed by letters). Once the directives are stripped, the remaining -text is run through the plain text scanner. +C/C++ Language Scanner +---------------------- - This is a minimal example; to do a complete job, you would need to -completely delete some lines, such as those beginning with `@end' or -@node. + The C scanner is the most commonly used. Files that match the glob +pattern `*.h', `*.c', as well as `yacc' files that match `*.y' or +`*.yacc', and `lex' files that match `*.l' or `*.lex', are processed +with this scanner. + + Scanner-specific options (Note, these options are presented WITHOUT +the required `-l' or `--lang-option=' prefix): + +`-k CHARACTER-CLASS' +`--keep=CHARACTER-CLASS' + Consider the characters in CHARACTER-CLASS as valid constituents of + identifier names. For example, if you are indexing C code that + contains `$' in some of its identifiers, you can include these by + using `--lang-option=C:--keep=$', or `-l C:"-k $"' (if you don't + like to type so much). + +`-i CHARACTER-CLASS' +`--ignore=CHARACTER-CLASS' + x mkiConsider the characters in CHARACTER-CLASS as valid + constituents of identifier names, but discard all tokens + containing these characters. For example, if some C code has + identifiers containing `$', but you don't want these cluttering up + your ID database, use `--lang-option=C:--ignore=$', or the terser + equivalent `-l C:"-i $"'. + +`-u' +`--strip-underscore' + Strip one leading underscore from C identifiers encapsulated as + character strings. This option is useful if you are indexing C + code that contains symbol-table name strings for systems that + prepend an underscore to external symbols. By default, the + leading underscore is retained. -File: id-utils.info, Node: idx invocation, Prev: Defining new scanners, Up: Scanners - -`idx': Testing `mkid' scanners ------------------------------- - - `idx' prints the identifiers found in the files you specify to -standard output. This is useful in debugging new `mkid' scanners (*note -Scanners::.). Synopsis: +File: id-utils.info, Node: Assembler scanner, Next: Text scanner, Prev: C/C++ scanner, Up: Extraction options + +Assembly Language Scanner +------------------------- + + Assembly languages use a variety of commenting conventions, and +allow a variety of special characters to *dirty up* local symbols, +preventing name space conflicts with symbols defined by higher-level +languages. Also, some compilation systems prepend an underscore to +external symbols. The options listed below are designed to address +these differences. + +`-c CHARACTER-CLASS' +`--comment=CHARACTER-CLASS' + The characters in CHARACTER-CLASS are considered left delimiters + for comments that extend until the end of the current line. + +`-k CHARACTER-CLASS' +`--keep=CHARACTER-CLASS' + Consider the characters of CHARACTER-CLASS as valid constituents of + identifier names. For example, if you are indexing assembly code + that prepends `.' to assembler directives, and prepends `%' to + register names, you can keep these characters in the tokens by + specifying `--lang-option=asm:--keep=.%', or `-l asm:"-k .%"'. + +`-i CHARACTER-CLASS' +`--ignore=CHARACTER-CLASS' + Consider the characters of CHARACTER-CLASS as valid consituents of + identifier names, but discard all tokens containing these + characters. For example, if you don't want to clutter your ID + database with assembler directives that begin with a leading `.' + or with assembler labels that contain `@', use + `--lang-option=asm:--ignore=.@', or `-l asm:"-i .@"'. + +`-u' +`--strip-underscore' + Strip one leading underscore from identifiers. This option is + useful if your compilation system prepends an underscore to + external symbols. By stripping the underscore, you can + canonicalize such names and bring them into conformance the way + they are expressed in the C language. By default, the leading + underscore is retained. - idx [-SSCANARG] FILES... - - `idx' accepts the same `-S' options as `mkid'. *Note Scanner option -formats::. - - The name "idx" stands for "ID eXtract". The name may change in -future releases, since this is such an infrequently used program. +`-n' +`--no-cpp' + Do not recognize C preprocessor directives. By default, such + lines are handled in the same way as they are by the C language + scanner. -File: id-utils.info, Node: mkid examples, Prev: Scanners, Up: mkid invocation - -`mkid' examples -=============== - - The simplest example of `mkid' is something like: - - mkid *.[chy] - - This will build an ID database indexing identifiers and numbers in -the all the `.c', `.h', and `.y' files in the current directory. -Because `mkid' already knows how to scan files with those suffixes, no -additional options are needed. +File: id-utils.info, Node: Text scanner, Next: Defining scanners, Prev: Assembler scanner, Up: Extraction options - Here's a more complex example. Suppose you want to build a database -indexing the contents of all the `man' pages, and furthur suppose that -your system is using `gzip' (*note Top: (gzip)Top.) to store compressed -`cat' versions of the `man' pages in the directory `/usr/catman'. The -`gzip' program creates files with a `.gz' suffix, so you must tell -`mkid' how to scan `.gz' files. Here are the commands to do the job: +Text Scanner +------------ - cd /usr/catman - find . -name \*.gz -print | mkid '-Sman/text/gzip <%s' -S.gz=man - + The plain text scanner is intended for human-language documents, or +as the scanner of last resort for files that have no scanner that is +more specific. It is customizable to the extent that character classes +can be designated as token constituents or as token delimiters. The +default token constituents are the alpha-numerics; all other characters +are considered token delimiters. -Explanation: +`-i CHARACTER-CLASS' +`--include=CHARACTER-CLASS' + Include characters belonging to CHARACTER-CLASS in tokens. - 1. We first `cd' to `/usr/catman' so the ID database will store the - correct relative filenames. +`-x CHARACTER-CLASS' +`--exclude=CHARACTER-CLASS' + Exclude characters belonging to CHARACTER-CLASS from tokens, i.e., + treat them as token delimiters. - 2. The `find' command prints the names of all `.gz' files under the - current directory. *Note find invocation: (sh-utils)find - invocation. - - 3. This list is piped to `mkid'; the `-' option (at the end of the - line) tells `mkid' to read arguments (in this case, as is typical, - the list of filenames) from standard input. *Note mkid options::. + +File: id-utils.info, Node: Defining scanners, Prev: Text scanner, Up: Extraction options - 4. The `-Sman/text/gzip ...' defines a new language `man' in terms of - the `gzip' program and `mkid''s existing text scanner. *Note - Defining scanners with options::. +Defining New Scanners in the Source Code +---------------------------------------- - 5. The `-S.gz=man' tells `mkid' to treat all `.gz' files as this new - language `man'. *Note Scanner option formats::. + To add a new scanner in source code, you should add a new section to +the file `scanners.c'. It might be easiest to clone one of the +existing scanners and modify it as necessary. For the hypothetical +language FOO, you must define the functions `get_token_foo', +`parse_args_foo', `help_me_foo', as well as the tables +`long_options_foo' and `args_foo'. If your scanner is modelled after +one of the existing scanners, you'll also need a character-attribute +table `ctype_foo'. + This is not a terribly difficult programming task, but it requires +recompiling and installing the new version of `mkid' and `xtokid'. You +should use `xtokid' to test the operation of the new scanner. - As a further complication, `cat' pages typically contain underlining -and backspace sequences, which will confuse `mkid'. To handle this, -the `gzip' command becomes a pipeline, like this: + Once these functions and tables are ready, add function prototypes +and an entry to to the `languages_0' table near the beginning of the +file. - mkid '-Sman/text/gzip <%s | col -b' -S.gz=man - + Be warned that the existing scanners are built for speed, not +elegance or readability. You might wish to create a new scanner that's +easier to read and understand if you don't feel that speed is so +important. -File: id-utils.info, Node: Common query arguments, Next: gid invocation, Prev: mkid invocation, Up: Top - -Common query arguments -********************** - - Certain options, and regular expression syntax, are shared by the ID -query tools. So we describe those things in the sections below, instead -of repeating the description for each tool. +File: id-utils.info, Node: mkid invocation, Next: lid invocation, Prev: Common options, Up: Top + +`mkid': Creating an ID Database +******************************* + + `mkid' builds an ID database. It accepts the names of files and/or +directories on the command line, selects files that have an enabled +scanner, then extracts and stores tokens from those files. The +resulting ID database is architecture- and byte-order-independent so it +can be shared among all systems. + + The primary virtues of `mkid' are speed and high capacity. The size +of the source trees it can index is limited only by available system +memory. `mkid''s indexing algorithm is very space-efficient and +exhibits excellent locality-of-reference, and so is capable of +operating with a working-set size that is only half the size of its +virtual address space. A typical UNIX-like operating system with 16 +megabytes of system memory should be able to build an ID database +covering approximately 12,000-14,000 source files totalling +approximately 50-100 Megabytes. A 66 Mhz 486 computer can build such a +large ID database in approximately 10-15 minutes. + + In a future release, `mkid' will be able to incrementally update an +ID database much faster than it can build one from scratch. Until this +feature becomes available, it might be a good idea to schedule a `cron' +job to regularly update large ID databases during off-hours. + + `mkid' writes the ID file, therefore it accepts the `--output' (and +`--file') options as described in *Note Writing options::. `mkid' +extracts tokens from source files, therefore it accepts the +`--lang-map', `--include', `--exclude', and `--lang-option' options, as +well as the language-specific scanner options, all of which are +described in *Note Extraction options::. `mkid' walks file trees, +therefore it handles file and directory names on its command line and +the `--prune' option as described in *Note Walker options::. + + In addition, `mkid' accepts the following command-line options: + +`-s' +`--statistics' + `mkid' reports statistics about resource usage at the end of its + run. -* Menu: - -* Query options:: -f -r -c -ew -kg -n -doxa -m -F -u. -* Patterns:: Regular expression syntax for searches. -* Examples: Query examples. Some common uses. +`-v' +`--verbose' + `mkid' reports statistics about each file as it is scanned, and + about the resource usage of its indexing algorithm at regular + intervals. -File: id-utils.info, Node: Query options, Next: Patterns, Up: Common query arguments - -Query options -============= - - The ID query tools (*not* `mkid') share certain command line -options. Not all of these options are recognized by all programs, but -if an option is used by more than one program, it is described below. -The description of each program gives the options that program uses. - -`-fIDFILE' - Read the database from IDFILE, in the current directory or in any - directory above the current directory. The default database name - is `ID'. Searching parent directories lets you have a single ID - database at the root of a large source tree and then use the query - tools from anywhere within that tree. - -`-rDIRECTORY' - Find files relative to DIRECTORY, instead of the directory in - which the ID database was found. This is useful if the ID - database was moved after its creation. - -`-c' - Equivalent to `-r`pwd`', i.e., find files relative to the current - directory, instead of the directory in which the ID database was - found. - -`-e' -`-w' - `-e' forces pattern arguments to be treated as regular expressions, - and `-w' forces pattern arguments to be treated as constant - strings. By default, the query tools guess whether a pattern is - regular expressions or constant strings by looking for special - characters. *Note Patterns::. - -`-k' -`-g' - `-k' suppresses use of shell brace notation in the output. By - default, the query tools that generate lists of filenames attempt - to compress the lists using the usual shell brace notation, e.g., - `{foo,bar}.c' to mean `foo.c' and `bar.c'. (This is useful if you - use `ksh' or the original (not GNU) `sh' and want to feed the list - of names to another command, since those shells do not support - this brace notation; the name of the `-k' option comes from the - `k' in `ksh'). - - `-g' turns on use of brace notation; this is only needed if the - query tools were compiled with `-k' as the default behavior. +File: id-utils.info, Node: lid invocation, Next: fid invocation, Prev: mkid invocation, Up: Top + +`lid': Querying an ID Database by Token +*************************************** + + The `lid' program accepts PATTERNS on the command line which it +matches against the tokens stored in an ID database. The +interpretation of a PATTERN is determined by the makeup of the PATTERN +string itself, or can be overridden by command-line options. If a +PATTERN contains regular expression meta-characters, it is used to +perform a regular-expression substring search. If no such +meta-characters are present, PATTERN is used to perform a literal word +search. (By default, all searches are sensitive to alphabetic case.) +If no PATTERN is supplied on the command line, `lid' lists every entry +in the ID database. + + `lid' reads the ID database, therefore it accepts the `--file' +option, and consults the `IDPATH' environment variable, as described in +*Note Reading options::. `lid' lists file names, therefore it accepts +the `--separator' option, as described in *Note File listing options::. + + In addition, `lid' accepts the following command-line options: + +`-i' +`--ignore-case' + Ignoring differences in alphabetic case between the PATTERN and + the tokens in the ID database. + +`-l' +`--literal' + Match PATTERN as a literal string. Use this option if PATTERN + contains regular-expression meta-characters, but you don't wish to + perform a regular-expression search. + +`-r' +`--regexp' + Match PATTERN as an *extended* regular expression(1). Use this + option if no regular-expression expression meta-characters are + present in PATTERN, but you wish to force a regular-expression + search (note: in this case, a *literal substring* search might be + faster). -`-n' - Suppress the matching identifier before each list of filenames - that the query tools output by default. This is useful if you want - a list of just the names to feed to another command. +`-w' +`--word' + Match PATTERN using a word-delimited (non substring) search. This + is the default for literal searches. + +`-s' +`--substring' + Match PATTERN using a substring (non word-delimited) search. This + is the default for regular expression searches. + +`-k STYLE' +`--key=STYLE' + STYLE can be one of `token', `pattern' or `none'. This option + controls how the subject of the query is presented. This is best + illustrated by example: + + $ lid --key=token '^dest.' + destaddr libsys/memcpy.c + destination libsys/regex.c + destlst libsys/rx.c + destpos libsys/rx.c + destset libsys/rx.h libsys/rx.c + + $ lid --key=pattern '^dest.' + ^dest. libsys/rx.h libsys/{memcpy,regex,rx}.c + + $ lid --key=none '^dest.' + libsys/rx.h libsys/{memcpy,regex,rx}.c + + When `--key' is either `token' or `pattern', the first column of + output is a TOKEN or PATTERN, respectively. When `--key' is + `none', neither of these is printed, and the file name list begins + immediately. The default is `token'. + +`-R STYLE' +`--result=STYLE' + STYLE can be one of `filenames', `grep', `edit' or `none'. This + option controls how the value associated with the query's KEY + presented. When STYLE is `filenames', a list of file names is + printed (this is the default). When STYLE is `grep', the lines + that match PATTERN are printed in the same format as `egrep -n'. + When STYLE is `edit', the file names are passed to an editor, and + if possible PATTERN is passed as an initial search string (*note + eid invocation::.). When STYLE is `none', the file names are not + processed in any way. This can be useful if you wish to see what + tokens match a PATTERN, but don't care about where they reside. `-d' `-o' `-x' -`-a' These options may be used in any combination to specify the radix of numeric matches. `-d' allows matching on decimal numbers, `-o' - on octal numbers, and `-x' on hexadecimal numbers. The `-a' - option is equivalent to specifying all three; this is the default. - Any combination of these options may be used. - -`-m' - Merge multiple lines of output into a single line. If your query - matches more than one identifier, the default is to generate a - separate line of output for each matching identifier. - -`-F-' -`-FN' -`-F-M' -`-FN-M' - Show identifiers matching at least N and at most M times. `-F-' - is equivalent to `-F1', i.e., find identifiers that appear only - once in the database. (This is useful to locate identifiers that - are defined but never used, or used once and never defined.) - -`-uNUMBER' - List identifiers that conflict in the first NUMBER characters. - This could be in useful porting programs to brain-dead computers - that refuse to support long identifiers, but your best long term - option is to set such computers on fire. + on octal numbers, and `-x' on hexadecimal numbers. Any + combination of these options may be used. The default is to match + all three radixes. + +`-F RANGE' +`--frequency=RANGE' + Match tokens whose occurrence count falls in RANGE. RANGE may be + expressed as a single number N, or as a range N`..'M. Either + limit of the range may be omitted (e.g., `..'M, or N..`..'). If + the lower limit N is omitted, it defaults to `1'. If the upper + limit is omitted, it defaults in the present implementation to + `65535', the maximum value of an unsigned 16-bit integer. + + Particularly useful queries are `lid -F1', which helps locate + identifiers that are defined but never used, or are used but never + defined. Similarly, `lid -F2' can help find functions that possess + a prototype declaration and a definition, but are never called. + +`-a NUMBER' +`--ambiguous=NUMBER' + List identifiers (not numbers) that are ambiguous for the first + NUMBER characters. This feature might be in useful when porting + programs to ancient pea-brained compilers that don't support long + identifier names. However, the best long-term option is to set + such systems on fire. - -File: id-utils.info, Node: Patterns, Next: Query examples, Prev: Query options, Up: Common query arguments - -Patterns -======== - - "Patterns", also called "regular expressions", allow you to match -many different identifiers in a single query. - - The same regular expression syntax is recognized by all the query -tools that handle regular expressions. The exact syntax depends on how -the ID tools were compiled, but the following constructs should always -be supported: - -`.' - Match any single character. - -`[CHARS]' - Match any of the characters specified within the brackets. You can - match any characters *except* the ones in brackets by typing `^' - as the first character. A range of characters can be specified - using `-'. For example, `[abc]' and `[a-c]' both match `a', `b', - or `c', and `[^abc]' matches anything *except* `a', `b', or `c'. - -`*' - Match the previous construct zero or more times. - -`^' -`$' - `^' (`$') at the beginning (end) of a pattern anchors the match to - the first (last) character of the identifier. - - The query programs use either the `regex'/`regcmp' or -`re_comp'/`re_exec' functions, depending on which are available in the -library on your system. These do not always support the exact same -regular expression syntax, so consult your local `man' pages to find -out. - - -File: id-utils.info, Node: Query examples, Prev: Patterns, Up: Common query arguments - -Query examples -============== - - Here are some examples of the options described in the previous -sections. - - To restrict searches to exact matches, use `^...$'. For example: - - prompt$ gid '^FILE$' - ansi2knr.c:144: { FILE *in, *out; - ansi2knr.c:315: FILE *out; - fid.c:38: FILE *id_FILE; - filenames.c:576: FILE * - ... - - To show identifiers not unique in the first 16 characters: - - prompt$ lid -u16 - RE_CONTEXT_INDEP_ANCHORS regex.c - RE_CONTEXT_INDEP_OPS regex.c - RE_SYNTAX_POSIX_BASIC regex.c - RE_SYNTAX_POSIX_EXTENDED regex.c - ... - - Numbers are searched for numerically rather than textually. For -example: - - prompt$ lid 0xff - 0377 {lid,regex}.c - 0xff {bitops,fid,lid,mkid}.c - 255 regex.c - - On the other hand, you can restrict a numeric search to a particular -radix if you want: +* Menu: - laurie$ lid -x 0xff - 0xff {bitops,fid,lid,mkid}.c +* lid aliases:: Aliases for specialized lid queries +* Emacs gid interface:: GNU Emacs query interface +* eid invocation:: Invoking an editor on query results - Filenames in the output are always adjusted to be correct for the -correct working directory. For example: + ---------- Footnotes ---------- - prompt$ lid bdevsw - bdevsw sys/conf.h cf/conf.c io/bio.c os/{fio,main,prf,sys3}.c - prompt$ cd io - prompt$ lid bdevsw - bdevsw ../sys/conf.h ../cf/conf.c bio.c ../os/{fio,main,prf,sys3}.c + (1) Extended regular expressions are the same as those accepted by +`egrep'. -File: id-utils.info, Node: gid invocation, Next: Looking up identifiers, Prev: Common query arguments, Up: Top - -`gid': Listing matching lines -***************************** - - Synopsis: - - gid [-fFILE] [-uN] [-rDIR] [-doxasc] [PATTERN...] - - `gid' finds the identifiers in the database that match the specified -PATTERNs, then searches for all occurrences of those identifiers, in -only the files containing matches. In a large source tree, this saves -an enormous amount of time (compared to searching every source file). - - With no PATTERN arguments, `gid' prints every line of every source -file. +File: id-utils.info, Node: lid aliases, Next: Emacs gid interface, Up: lid invocation - The name "gid" stands for "grep for identifiers", `grep' being the -standard utility to search regular files. +Aliases for Specialized `lid' Queries +===================================== - *Note Common query arguments::, for a description of the command-line -options and PATTERN arguments. + Historically, the ID utilities have provided several query interfaces +which are specializations of `lid' (*note lid invocation::.). - `gid' uses the standard GNU output format for identifying source -lines: - - FILENAME:LINENUM: TEXT - - Here is an example: - - prompt$ gid FILE - ansi2knr.c:144: { FILE *in, *out; - ansi2knr.c:315: FILE *out; - fid.c:38: FILE *id_FILE; - ... +`gid' + (alias for `lid -R grep') lists all lines containing the requested + pattern. -* Menu: +`eid' + (alias for `lid -R edit') invokes an editor on all files + containing the requested pattern, and optionally initiates a text + search for that pattern. -* GNU Emacs gid interface:: Using next-error with gid. +`aid' + (alias for `lid -ils') treats the requested pattern as a + case-insensitive literal substring. -File: id-utils.info, Node: GNU Emacs gid interface, Up: gid invocation +File: id-utils.info, Node: Emacs gid interface, Next: eid invocation, Prev: lid aliases, Up: lid invocation -GNU Emacs `gid' interface +GNU Emacs query interface ========================= - The `mkid' source distribution comes with a file `gid.el', which -defines a GNU Emacs interface to `gid'. To install it, put `gid.el' -somewhere that Emacs will find it (i.e., in your `load-path') and put + The `id-utils' source distribution comes with a file `id-utils.el', +which defines a GNU Emacs interface to `gid'. To install it, put +`id-utils.el' somewhere that Emacs will find it (i.e., in your +`load-path') and put (autoload 'gid "gid" nil t) @@ -817,108 +779,23 @@ the places the identifier is found (*note Compilation: (emacs)Compilation.). -File: id-utils.info, Node: Looking up identifiers, Next: pid invocation, Prev: gid invocation, Up: Top - -Looking up identifiers -********************** - - These commands look up identifiers in the ID database and operate on -the files containing matches. - -* Menu: - -* lid invocation:: Matching patterns. -* aid invocation:: Matching strings. -* eid invocation:: Invoking an editor on matches. -* fid invocation:: Listing a file's identifiers. - - -File: id-utils.info, Node: lid invocation, Next: aid invocation, Up: Looking up identifiers - -`lid': Matching patterns -======================== - - Synopsis: - - lid [-fFILE] [-uN] [-rDIR] [-mewdoxaskgnc] PATTERN... - - `lid' searches the database for identifiers matching the given -PATTERN arguments and prints the names of the files that match each -PATTERN. With no PATTERNs, `lid' lists every entry in the database. - - The name "lid" stands for "lookup identifier". - - *Note Common query arguments::, for a description of the command-line -options and PATTERN arguments. +File: id-utils.info, Node: eid invocation, Prev: Emacs gid interface, Up: lid invocation - By default, each line of output consists of an identifier and all the -files containing that identifier. +`eid': Invoking an Editor on Query Results +========================================== - Here is an example showing a search for a single identifier (omitting -some output to keep lines short): - - prompt$ lid FILE - FILE extern.h {fid,gets0,getsFF,idx,init,lid,mkid,...}.c - - This example shows a regular expression search: - - prompt$ lid 'FILE$' - AF_FILE mkid.c - AF_IDFILE mkid.c - FILE extern.h {fid,gets0,getsFF,idx,init,lid,mkid,...}.c - IDFILE id.h {fid,lid,mkid}.c - IdFILE {fid,lid}.c - ... - -As you can see, when a regular expression is used, it is possible to -get more than one line of output. To merge multiple lines into one, -use `-m': - - prompt$ lid -m ^get - ^get extern.h {bitsvec,fid,gets0,getsFF,getscan,idx,lid,...}.c - - -File: id-utils.info, Node: aid invocation, Next: eid invocation, Prev: lid invocation, Up: Looking up identifiers - -`aid': Matching strings -======================= - - Synopsis: - - aid [-fFILE] [-uN] [-rDIR] [-mewdoxaskgnc] STRING... - - `aid' searches the database for identifiers containing the given -STRING arguments. The search is case-insensitive. - - The name "aid" stands for "apropos identifier", `apropros' being a -command that does a similar search of the `whatis' database of `man' -descriptions. - - For example, `aid get' matches the identifiers `fgets', `GETLINE', -and `getchar'. - - The default output format is the same as `lid'; see the previous -section. - - *Note Common query arguments::, for a description of the command-line -options and PATTERN arguments. - - -File: id-utils.info, Node: eid invocation, Next: fid invocation, Prev: aid invocation, Up: Looking up identifiers - -`eid': Invoking an editor on matches -==================================== - - Synopsis: - - eid [-fFILE] [-uN] [-rDIR] [-doxasc] [PATTERN]... - - `eid' runs the usual search (*note lid invocation::.) on the given -arguments, shows you the output, and then asks: + `lid -R edit' is an editing interface for the ID utilities that is +most commonly used with `vi'. Emacs users should use the interface +defined in `id-utils.el' (*note Emacs gid interface::.). The ID +utilities include an alias called `eid', and for the sake of brevity, +we'll use this alias for the remainder of this section. `eid' performs +a `lid'-style, then asks if you wish to edit the files. If your query +yields more than one line of output, you will be prompted after each +line. This is the prompt you'll see: Edit? [y1-9^S/nq] -You can respond with: +You may respond with: `y' Edit all files listed. @@ -926,46 +803,17 @@ You can respond with: `1...9' Edit all files starting at the N + 1'st file. -`/STRING or `CTRL-S'STRING' - Edit all files whose name contains STRING. +`/STRING or `CTRL-S'REGEXP' + Search into the file list, and begin editing with the first file + name that matches the regular expression REGEXP. `n' - Go on to the next PATTERN, i.e., edit no files for this one. + Don't edit any files. If another line of query output is pending, + advance to that line, for which another `Edit?' prompt will appear. `q' - Quit `eid'. - - `eid' invokes an editor once per PATTERN; all the specified files -are given to the editor for you to edit simultaneously. - - `eid' invokes the editor defined by the `EDITOR' environment -variable. If the editor can accept an initial search argument on the -command line, `eid' moves automatically to the location of the match, -via the environment variables below. - - *Note Common query arguments::, for a description of the command-line -options and PATTERN arguments. - - Here are the environment variables relevant to `eid': - -`EDITOR' - The name of the editor program to invoke. - -`EIDARG' - The argument to pass to the editor to search for the matching - identifier. For `vi', this should be `+/%s/''. - -`EIDLDEL' - A regular expression to force a match at the beginning of a word - ("left delimiter). `eid' inserts this in front of the matching - identifier when composing the search argument. For `vi', this - should be `\<'. - -`EIDRDEL' - The end-of-word regular expression. For `vi', this should be `\>'. - - For Emacs users, the interface in `gid.el' is probably preferable to -`eid'. *Note GNU Emacs gid interface::. + Quit--don't edit any files, and don't process any more lines of + query output. Here is an example: @@ -977,270 +825,352 @@ options and PATTERN arguments. This will start editing at `getopt'.c. - -File: id-utils.info, Node: fid invocation, Prev: eid invocation, Up: Looking up identifiers - -`fid': Listing a file's identifiers -=================================== + `eid' invokes the editor defined by the environment variable +`VISUAL'. If `VISUAL' is undefined, it uses the environment variable +`EDITOR' instead. If `EDITOR' is undefined, it defaults to `vi'. It +is possible for `eid' to pass the editor an initial search pattern so +that your cursor will immediately alight on the token of interest. +This feature is controlled by the following environment variables: - `fid' lists the identifiers found in a given file. Synopsis: - - fid [-fDBFILE] FILE1 [FILE2] - -`-fDBFILE' - Read the database from DBFILE instead of `ID'. - -`FILE1' - List all the identifiers contained in FILE1. +`EIDARG' + A printf(3) format string for the editor argument to search for the + matching token. For `vi', this should be `+/%s/'. -`FILE2' - With a second file argument, list only the identifiers both files - have in common. +`EIDLDEL' + The regular-expression meta-character(s) for delimiting the + beginning of a word (the ``eid' Left DELimiter'). `eid' inserts + this in front of the matching token when a word-search is desired. + For `vi', this should be `\<'. - The output is simply one identifier (or number) per line. +`EIDRDEL' + The regular-expression meta-character(s) for delimiting the end of + a word (the ``eid' Right DELimiter'). `eid' inserts this in end + of the matching token when a word-search is desired. For `vi', + this should be `\>'. -File: id-utils.info, Node: pid invocation, Next: Index, Prev: Looking up identifiers, Up: Top +File: id-utils.info, Node: fid invocation, Next: fnid invocation, Prev: lid invocation, Up: Top -`pid': Looking up filenames -*************************** +`fid': Listing a file's tokens +****************************** - `pid' matches the filenames stored in the ID database, rather than -the identifiers. Synopsis: + `fid' prints the tokens found in a given file. If two file names +are passed on the command line, `fid' prints the tokens that are common +to both files (i.e., the *set intersection* of the two token sets). - pid [-fDBFILE] [-rDIR] [-ebkgnc] WILDCARD... + `lid' reads the ID database, therefore it accepts the `--file' +option, and consults the `IDPATH' environment variable, as described in +*Note Reading options::. - By default, the WILDCARD patterns are treated as shell globbing -patterns, rather than the regular expressions the other utilities -accept. See the section below for details. + If the standard output is attached to a terminal, the printed tokens +are separated by spaces. Otherwise, the tokens are printed one per +line. - Besides the standard options given in the synopsis (*note Query -options::.), `pid' accepts the following: + +File: id-utils.info, Node: fnid invocation, Next: xtokid invocation, Prev: fid invocation, Up: Top + +`fnid': Looking up filenames +**************************** -`-e' - Do the usual regular expression matching (*note Patterns::.), - instead of shell wildcard matching. + `fnid' queries the list of file names stored in the ID database. It +accepts shell *wildcard* patterns on the command line. If no pattern +is supplied, `*' is implied. `fnid' prints the file names that match +the given patterns. -`-b' - Match the basenames of the files in the database. For example, - `pid -b foo' will match the stored filename `dir/foo', but not - `foo/file'. + `fnid' prints file names, and as such accepts the `--separator' +option as described in *Note File listing options::. For example, the command: - pid \*.c + fnid \*.c lists all the `.c' files in the database. (The `\' here protects the `*' from being expanded by the shell.) -* Menu: + +File: id-utils.info, Node: xtokid invocation, Next: Past and Future, Prev: fnid invocation, Up: Top + +`xtokid': Testing Language Scanners +*********************************** + + `xtokid' accepts the names of files and/or directories on the +command line, then extracts and prints a stream of tokens from those +files for which it has a valid, enabled scanner. This is useful +primarily for debugging new `mkid' scanners (*note Defining +scanners::.). + + `xtokid' extracts tokens from source files, therefore it accepts the +`--lang-map', `--include', `--exclude', and `--lang-option' options, as +well as the language-specific scanner options, all of which are +described in *Note Extraction options::. `xtokid' walks file trees, +therefore it handles file and directory names on its command line and +the `--prune' option as described in *Note Walker options::. -* Wildcard patterns:: Shell-style globbing patterns. + The name `xtokid' indicates that it is the "eXtract TOKens ID +utility". -File: id-utils.info, Node: Wildcard patterns, Up: pid invocation +File: id-utils.info, Node: Past and Future, Next: Index, Prev: xtokid invocation, Up: Top -Wildcard patterns -================= +Past and Future +*************** - `pid' does simplified shell wildcard matching (unless the `-e' -option is specified), rather than the regular expression matching done -by the other utilities. Here is a description of wildcard matching, -also called "globbing": + Greg McGary conceived of the ideas behind the ID utilities when he +began working on the Unix kernel in 1984. He needed a navigation tool +to help him find his way around the expansive, unfamiliar landscape. +The first `id-utils'-like tools were shell scripts, and produced an +ASCII database that looks much like the output of `lid ".*"'. It took +over an hour on a VAX 11/750 to build a database for a 4.1BSD derived +kernel. The first version of `lid' used the UNIX system utility +`look', modified to handle very long lines. - * `*' matches zero or more characters. + In 1986, Greg rewrote the shell scripts in C to improve performance. +Build times for the ID file were shortened by an order of magnitude. +The ID utilities were first posted to `comp.sources.unix' in September +1987 under the name `id'. + + Over the next few years, several versions diverged from the original +source. Tom Horsley at Harris Computer Systems Division stepped forward +to take over maintenance and integrated some of the fixes from divergent +versions. A first release of the renamed `mkid' version 2 was posted +to `alt.sources' near the end of 1990. At that time, Tom wrote a +Texinfo manual with the encouragement the net community. (Tom +especially thanks Doug Scofield and Bill Leonard whom he dragooned into +helping poorfraed and edit--they found several problems in the initial +version.) Karl Berry revamped the manual for Texinfo style, indexing, +and organization in 1995. + + In January 1995, Greg McGary reemerged as the primary maintainer and +launched development of `mkid' version 3, whose primary new feature is +an efficient algorithm for building databases that is linear in both +time and space over the size of the input text. (The old algorithm was +quadratic in space so it was incapable of handling very large source +trees.) For the first time, the code was released under the GNU Public +License. - * `?' matches any single character. + In June 1996, the package was renamed again to `id-utils' and was +released for the first time under FSF copyright as part of the GNU +system. All programs had their command-line arguments completely +revised. The `mkid' and `xtokid' programs also gained a file-tree +walker, so that directory names can be passed on the command line +instead of the names of every individual file. Greg reorganized and +rewrote most of the Texinfo manual to reflect these changes. - * `\' forces the next character to be taken literally. + Future releases of `id-utils' might include: - * `[CHARS]' matches any single character listed in CHARS. + an optional coupling with GNU `grep', so that `grep' can use an ID + database for hints - * `[!CHARS]' matches any character *not* listed in CHARS. + a `cscope' work-alike query interface - Most shells treat `/' and leading `.' characters specially. `pid' -does not do this. It simply matches the filename in the database -against the wildcard pattern. + incremental update of the ID database. -File: id-utils.info, Node: Index, Prev: pid invocation, Up: Top +File: id-utils.info, Node: Index, Prev: Past and Future, Up: Top Index ***** * Menu: -* $ in identifiers: C scanner. -* * in globbing: Wildcard patterns. -* *scratch* Emacs buffer: GNU Emacs gid interface. -* -: mkid options. -* -a: Query options. -* -aARGFILE: mkid options. -* -b: pid invocation. -* -c: Query options. -* -d: Query options. -* -e <1>: pid invocation. -* -e: Query options. -* -F: Query options. -* -fIDFILE: Query options. -* -g: Query options. -* -k: Query options. -* -m: Query options. -* -n: Query options. -* -o: Query options. -* -rDIRECTORY: Query options. -* -S scanner option: Scanner option formats. -* -S.: Scanner option formats. -* -S?: Scanner option formats. -* -SSCANARG: mkid options. -* -Sasm+a: Assembler scanner. -* -Sasm+C: Assembler scanner. -* -Sasm+p: Assembler scanner. -* -Sasm+u: Assembler scanner. -* -Sasm-c: Assembler scanner. -* -Sc+u: C scanner. -* -Sc-s: C scanner. -* -Sc-u: C scanner. -* -Stext+a: Plain text scanner. -* -Stext+s: Plain text scanner. -* -Stext-a: Plain text scanner. -* -u: Query options. -* -v: mkid options. -* -w: Query options. -* -x: Query options. -* .[chly] files, scanning: C scanner. -* .default scanner: Scanners. -* ? in globbing: Wildcard patterns. -* [!...] in globbing: Wildcard patterns. -* [...] in globbing: Wildcard patterns. -* \ in globbing: Wildcard patterns. -* aid: aid invocation. +* *compilation* Emacs buffer: Emacs gid interface. +* -ambiguous: lid invocation. +* -comment: Assembler scanner. +* -exclude <1>: Text scanner. +* -exclude: Extraction options. +* -file <1>: Writing options. +* -file: Reading options. +* -frequency: lid invocation. +* -help: Universal options. +* -ignore <1>: Assembler scanner. +* -ignore: C/C++ scanner. +* -ignore-case: lid invocation. +* -include <1>: Text scanner. +* -include: Extraction options. +* -keep <1>: Assembler scanner. +* -keep: C/C++ scanner. +* -lang-map: Extraction options. +* -lang-option: Extraction options. +* -lang-option=asm:-comment: Assembler scanner. +* -lang-option=asm:-ignore: Assembler scanner. +* -lang-option=asm:-keep: Assembler scanner. +* -lang-option=asm:-no-cpp: Assembler scanner. +* -lang-option=asm:-strip-underscore: Assembler scanner. +* -lang-option=asm:-c: Assembler scanner. +* -lang-option=asm:-i: Assembler scanner. +* -lang-option=asm:-k: Assembler scanner. +* -lang-option=asm:-n: Assembler scanner. +* -lang-option=asm:-u: Assembler scanner. +* -lang-option=C:-ignore: C/C++ scanner. +* -lang-option=C:-keep: C/C++ scanner. +* -lang-option=C:-strip-underscore: C/C++ scanner. +* -lang-option=C:-i: C/C++ scanner. +* -lang-option=C:-k: C/C++ scanner. +* -lang-option=C:-u: C/C++ scanner. +* -lang-option=text:-exclude: Text scanner. +* -lang-option=text:-include: Text scanner. +* -lang-option=text:-i: Text scanner. +* -lang-option=text:-x: Text scanner. +* -literal: lid invocation. +* -no-cpp: Assembler scanner. +* -output: Writing options. +* -prune: Walker options. +* -regexp: lid invocation. +* -result: lid invocation. +* -separator: File listing options. +* -statistics: mkid invocation. +* -strip-underscore <1>: Assembler scanner. +* -strip-underscore: C/C++ scanner. +* -substring: lid invocation. +* -verbose: mkid invocation. +* -version: Universal options. +* -word: lid invocation. +* -a: lid invocation. +* -c: Assembler scanner. +* -d: lid invocation. +* -F: lid invocation. +* -f <1>: Writing options. +* -f: Reading options. +* -i <1>: lid invocation. +* -i <1>: Text scanner. +* -i <1>: Assembler scanner. +* -i <1>: C/C++ scanner. +* -i: Extraction options. +* -k <1>: lid invocation. +* -k <1>: Assembler scanner. +* -k: C/C++ scanner. +* -l <1>: lid invocation. +* -l: Extraction options. +* -l asm:-comment: Assembler scanner. +* -l asm:-ignore: Assembler scanner. +* -l asm:-keep: Assembler scanner. +* -l asm:-no-cpp: Assembler scanner. +* -l asm:-strip-underscore: Assembler scanner. +* -l asm:-c: Assembler scanner. +* -l asm:-i: Assembler scanner. +* -l asm:-k: Assembler scanner. +* -l asm:-n: Assembler scanner. +* -l asm:-u: Assembler scanner. +* -l C:-ignore: C/C++ scanner. +* -l C:-keep: C/C++ scanner. +* -l C:-strip-underscore: C/C++ scanner. +* -l C:-i: C/C++ scanner. +* -l C:-k: C/C++ scanner. +* -l C:-u: C/C++ scanner. +* -l text:-exclude: Text scanner. +* -l text:-include: Text scanner. +* -l text:-i: Text scanner. +* -l text:-x: Text scanner. +* -m: Extraction options. +* -n: Assembler scanner. +* -o <1>: lid invocation. +* -o: Writing options. +* -p: Walker options. +* -r: lid invocation. +* -s <1>: lid invocation. +* -s: mkid invocation. +* -S: File listing options. +* -u <1>: Assembler scanner. +* -u: C/C++ scanner. +* -v: mkid invocation. +* -w: lid invocation. +* -x <1>: lid invocation. +* -x <1>: Text scanner. +* -x: Extraction options. +* mkid progress: mkid invocation. +* alphabetic case, ignoring differences in: lid invocation. +* ambiguous identifier names, finding: lid invocation. * architecture-independence: mkid invocation. * assembler scanner: Assembler scanner. -* basename match: pid invocation. +* assembly language scanner: Assembler scanner. * beginning-of-word editor argument: eid invocation. -* Berry, Karl: Past and future. -* brace notation in filename lists: Query options. +* Berry, Karl: Past and Future. * bugs, reporting: Introduction. -* C scanner, predefined: C scanner. -* case-insensitive searching: aid invocation. -* comments in assembler: Assembler scanner. -* common query arguments: Common query arguments. -* common query options: Query options. -* compressed files, building ID from: mkid examples. -* conflicting identifiers, finding: Query options. -* constant strings, forcing evaluation as: Query options. +* C scanner, predefined: C/C++ scanner. +* common command-line options: Common options. * creating databases: mkid invocation. * cron: mkid invocation. -* cscope: Past and future. -* database name, specifying: Query options. +* cscope: Past and Future. * databases, creating: mkid invocation. -* EDITOR: eid invocation. * eid: eid invocation. * EIDARG: eid invocation. * EIDLDEL: eid invocation. * EIDRDEL: eid invocation. -* Emacs interface to gid: GNU Emacs gid interface. +* Emacs interface to gid: Emacs gid interface. * end-of-word editor argument: eid invocation. -* examples of mkid: mkid examples. -* examples, queries: Query examples. +* exclude languages: Extraction options. * fid: fid invocation. -* filenames, matching: pid invocation. -* future: Past and future. -* gid Emacs function: GNU Emacs gid interface. -* gid.el interface to Emacs: GNU Emacs gid interface. -* globbing patterns: Wildcard patterns. -* grep: Past and future. -* history: Past and future. -* Horsley, Tom: Past and future. +* file name separator: File listing options. +* file tree pruning: Walker options. +* filenames, matching: fnid invocation. +* fnid: fnid invocation. +* future: Past and Future. +* gid Emacs function: Emacs gid interface. +* grep: Past and Future. +* help, online: Universal options. +* history: Past and Future. +* Horsley, Tom: Past and Future. +* ID database file name <1>: Writing options. +* ID database file name: Reading options. * ID database, definition of: Introduction. * ID file format: mkid invocation. -* identifiers in a file: fid invocation. +* id-utils.el interface to Emacs: Emacs gid interface. +* ignoring differences in alphabetic case: lid invocation. +* include languages: Extraction options. * introduction: Introduction. -* languages_0: Defining scanners in source code. +* language map file: Extraction options. +* language-specific option: Extraction options. +* languages_0: Defining scanners. * left delimiter editor argument: eid invocation. -* Leonard, Bill: Past and future. -* lid: lid invocation. -* load-path: GNU Emacs gid interface. -* look and mkid 1: Past and future. -* man pages, compressed: mkid examples. -* matching filenames: pid invocation. -* McGary, Greg: Past and future. -* mkid: mkid invocation. -* mkid options: mkid options. -* multiple lines, merging: Query options. -* numbers, in databases: mkid invocation. -* numeric matches, specifying radix of: Query options. -* numeric searches: Query examples. -* options for mkid: mkid options. +* Leonard, Bill: Past and Future. +* load-path: Emacs gid interface. +* look and mkid 1: Past and Future. +* matching filenames: fnid invocation. +* McGary, Greg: Past and Future. +* numeric matches, specifying radix of: lid invocation. * overview: Introduction. -* parent directories, searched for ID: Query options. -* patterns: Patterns. -* pid: pid invocation. -* plain text scanner: Plain text scanner. -* predefined scanners: Predefined scanners. -* query examples: Query examples. -* query options, common: Query options. -* radix of numeric matches, specifying: Query options. -* regular expression syntax: Patterns. -* regular expressions, forcing evaluation as: Query options. +* radix of numeric matches, specifying: lid invocation. * right delimiter editor argument: eid invocation. -* scanner options: Scanner option formats. -* scanners: Scanners. -* scanners, adding new: Defining new scanners. -* scanners, defining in source code: Defining scanners in source code. -* scanners, defining with options: Defining scanners with options. -* scanners, predefined: Predefined scanners. -* scanners.c: Defining scanners in source code. -* Scofield, Doug: Past and future. -* search for identifier, initial: eid invocation. +* scanners: Extraction options. +* scanners, defining in source code: Defining scanners. +* scanners.c: Defining scanners. +* Scofield, Doug: Past and Future. +* search for token, initial: eid invocation. * sharing ID files: mkid invocation. -* shell brace notation in filename lists: Query options. -* shell wildcard patterns: Wildcard patterns. -* single matches, showing: Query options. -* squeezing characters from identifiers: Plain text scanner. -* statistics: mkid options. -* string searching: aid invocation. -* strings, forcing evaluation as: Query options. -* suffixes of filenames: Scanners. -* suffixes_0: Defining scanners in source code. -* suppressing matching identifier: Query options. -* Texinfo, scanning example of: Defining scanners with options. -* whatis: aid invocation. -* wildcard wildcard patterns: Wildcard patterns. +* single matches, showing: lid invocation. +* statistics: mkid invocation. +* text scanner: Text scanner. +* tokens common to two files: fid invocation. +* tokens in a file: fid invocation. +* version number, finding: Universal options. Tag Table: -Node: Top1540 -Node: Introduction2150 -Node: Past and future4367 -Node: mkid invocation6671 -Node: mkid options8241 -Node: Scanners9659 -Node: Scanner option formats11154 -Node: Predefined scanners12330 -Node: C scanner13033 -Node: Plain text scanner13788 -Node: Assembler scanner14699 -Node: Defining new scanners15828 -Node: Defining scanners in source code16451 -Node: Defining scanners with options17296 -Node: idx invocation18750 -Node: mkid examples19316 -Node: Common query arguments21295 -Node: Query options21843 -Node: Patterns25238 -Node: Query examples26578 -Node: gid invocation27965 -Node: GNU Emacs gid interface29132 -Node: Looking up identifiers29996 -Node: lid invocation30492 -Node: aid invocation31926 -Node: eid invocation32712 -Node: fid invocation34854 -Node: pid invocation35412 -Node: Wildcard patterns36510 -Node: Index37280 +Node: Top1298 +Node: Introduction2051 +Node: Quick start4580 +Node: Common options5505 +Node: Universal options6303 +Node: Reading options6628 +Node: Writing options7745 +Node: Walker options8241 +Node: File listing options9080 +Node: Extraction options10171 +Node: Language map12721 +Node: C/C++ scanner14927 +Node: Assembler scanner16542 +Node: Text scanner18638 +Node: Defining scanners19446 +Node: mkid invocation20668 +Node: lid invocation22949 +Node: lid aliases28334 +Node: Emacs gid interface29012 +Node: eid invocation29929 +Node: fid invocation32513 +Node: fnid invocation33200 +Node: xtokid invocation33875 +Node: Past and Future34814 +Node: Index37506 End Tag Table |