diff options
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r-- | doc/gawktexi.in | 3426 |
1 files changed, 1785 insertions, 1641 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 03f58c3c..d378b02d 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -54,7 +54,7 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH July, 2019 +@set UPDATE-MONTH September, 2019 @set VERSION 5.1 @set PATCHLEVEL 0 @@ -1312,6 +1312,7 @@ October 2014 </prefaceinfo> @end docbook +@cindex @command{awk} Several kinds of tasks occur repeatedly when working with text files. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the @@ -1319,6 +1320,7 @@ rest of the file alone. Such jobs are often easy with @command{awk}. The @command{awk} utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs. +@cindex @command{gawk} The GNU implementation of @command{awk} is called @command{gawk}; if you invoke it with the proper options or environment variables, it is fully compatible with @@ -1332,12 +1334,12 @@ properly written @command{awk} programs should work with @command{gawk}. So most of the time, we don't distinguish between @command{gawk} and other @command{awk} implementations. -@cindex @command{awk}, POSIX and, See Also POSIX @command{awk} -@cindex @command{awk}, POSIX and -@cindex POSIX, @command{awk} and -@cindex @command{gawk}, @command{awk} and -@cindex @command{awk}, @command{gawk} and -@cindex @command{awk}, uses for +@cindex @command{awk} @subentry POSIX and @seealso{POSIX @command{awk}} +@cindex @command{awk} @subentry POSIX and +@cindex POSIX @subentry @command{awk} and +@cindex @command{gawk} @subentry @command{awk} and +@cindex @command{awk} @subentry @command{gawk} and +@cindex @command{awk} @subentry uses for Using @command{awk} you can: @itemize @value{BULLET} @@ -1358,9 +1360,9 @@ Experiment with algorithms that you can adapt later to other computer languages @end itemize -@cindex @command{awk}, See Also @command{gawk} -@cindex @command{gawk}, See Also @command{awk} -@cindex @command{gawk}, uses for +@cindex @command{awk} @seealso{@command{gawk}} +@cindex @command{gawk} @seealso{@command{awk}} +@cindex @command{gawk} @subentry uses for In addition, @command{gawk} provides facilities that make it easy to: @@ -1390,7 +1392,7 @@ Unix-based systems. If you are using some other operating system, you still need be familiar with the ideas of I/O redirection and pipes.} as well as basic shell facilities, such as input/output (I/O) redirection and pipes. -@cindex GNU @command{awk}, See @command{gawk} +@cindex GNU @command{awk} @seeentry{@command{gawk}} Implementations of the @command{awk} language are available for many different computing environments. This @value{DOCUMENT}, while describing the @command{awk} language in general, also describes the particular @@ -1439,7 +1441,7 @@ more parts C. Document very well and release. @cindex Aho, Alfred @cindex Weinberger, Peter @cindex Kernighan, Brian -@cindex @command{awk}, history of +@cindex @command{awk} @subentry history of The name @command{awk} comes from the initials of its designers: Alfred V.@: Aho, Peter J.@: Weinberger, and Brian W.@: Kernighan. The original version of @command{awk} was written in 1977 at AT&T Bell Laboratories. @@ -1486,7 +1488,7 @@ for a full list of those who have made important contributions to @command{gawk} @node Names @unnumberedsec A Rose by Any Other Name -@cindex @command{awk}, new vs.@: old +@cindex @command{awk} @subentry new vs.@: old The @command{awk} language has evolved over the years. Full details are provided in @ref{Language History}. The language described in this @value{DOCUMENT} @@ -1521,7 +1523,7 @@ specific to the GNU implementation, we use the term @command{gawk}. @node This Manual @unnumberedsec Using This Book -@cindex @command{awk}, terms describing +@cindex @command{awk} @subentry terms describing The term @command{awk} refers to a particular program as well as to the language you use to tell this program what to do. When we need to be careful, we call @@ -1533,8 +1535,8 @@ run the @command{awk} utility. The term ``@command{awk} program'' refers to a program written by you in the @command{awk} programming language. -@cindex @command{gawk}, @command{awk} and -@cindex @command{awk}, @command{gawk} and +@cindex @command{gawk} @subentry @command{awk} and +@cindex @command{awk} @subentry @command{gawk} and @cindex POSIX @command{awk} Primarily, this @value{DOCUMENT} explains the features of @command{awk} as defined in the POSIX standard. It does so in the context of the @@ -1893,7 +1895,7 @@ you illuminate, there's always a smaller but darker one.} @author Brian Kernighan @end quotation -@cindex d.c., See dark corner +@cindex d.c. @seeentry{dark corner} @cindex dark corner Until the POSIX standard (and @cite{@value{TITLE}}), many features of @command{awk} were either poorly documented or not @@ -1913,7 +1915,7 @@ They also appear in the index under the heading ``dark corner.'' But, as noted by the opening quote, any coverage of dark corners is by definition incomplete. -@cindex c.e., See common extensions +@cindex c.e. @seeentry{common extensions} Extensions to the standard @command{awk} language that are supported by more than one @command{awk} implementation are marked @ifclear FOR_PRINT @@ -1937,8 +1939,9 @@ Emacs editor. GNU Emacs is the most widely used version of Emacs today. @cindex GNU Project @cindex GPL (General Public License) -@cindex General Public License, See GPL -@cindex documentation, online +@cindex GNU General Public License @seeentry{GPL} +@cindex General Public License @seeentry{GPL} +@cindex documentation @subentry online The GNU@footnote{GNU stands for ``GNU's Not Unix.''} Project is an ongoing effort on the part of the Free Software Foundation to create a complete, freely distributable, POSIX-compliant @@ -1968,9 +1971,9 @@ freely available. The GNU operating system kernel (the HURD), has been released but remains in an early stage of development. -@cindex Linux +@cindex Linux @seeentry{GNU/Linux} @cindex GNU/Linux -@cindex operating systems, BSD-based +@cindex operating systems @subentry BSD-based Until the GNU operating system is more fully developed, you should consider using GNU/Linux, a freely distributable, Unix-like operating system for Intel, @@ -2134,10 +2137,10 @@ convincing me @emph{not} to title this @value{DOCUMENT} @cite{How to Gawk Politely}. Karl Berry helped significantly with the @TeX{} part of Texinfo. -@cindex Hartholz, Marshall -@cindex Hartholz, Elaine -@cindex Schreiber, Bert -@cindex Schreiber, Rita +@cindex Hartholz @subentry Marshall +@cindex Hartholz @subentry Elaine +@cindex Schreiber @subentry Bert +@cindex Schreiber @subentry Rita I would like to thank Marshall and Elaine Hartholz of Seattle and Dr.@: Bert and Rita Schreiber of Detroit for large amounts of quiet vacation time in their homes, which allowed me to make significant progress on @@ -2251,9 +2254,9 @@ and for being a role model to me for close to 30 years! Having him as a reviewer is an exciting privilege. It has also been extremely humbling@enddots{} -@cindex Robbins, Miriam -@cindex Robbins, Jean -@cindex Robbins, Harry +@cindex Robbins @subentry Miriam +@cindex Robbins @subentry Jean +@cindex Robbins @subentry Harry @cindex G-d I must thank my wonderful wife, Miriam, for her patience through the many versions of this project, for her proofreading, @@ -2320,7 +2323,7 @@ following chapters: @c @cindex rule, definition of @c @cindex program, definition of @c @cindex basic function of @command{awk} -@cindex @command{awk}, function of +@cindex @command{awk} @subentry function of The basic function of @command{awk} is to search files for lines (or other units of text) that contain certain patterns. When a line matches one @@ -2328,8 +2331,8 @@ of the patterns, @command{awk} performs specified actions on that line. @command{awk} continues to process input lines in this way until it reaches the end of the input files. -@cindex @command{awk}, uses for -@cindex programming languages@comma{} data-driven vs.@: procedural +@cindex @command{awk} @subentry uses for +@cindex programming languages @subentry data-driven vs.@: procedural @cindex @command{awk} programs Programs in @command{awk} are different from programs in most other languages, because @command{awk} programs are @dfn{data driven} (i.e., you describe @@ -2382,7 +2385,7 @@ program looks like this: @node Running gawk @section How to Run @command{awk} Programs -@cindex @command{awk} programs, running +@cindex @command{awk} programs @subentry running There are several ways to run an @command{awk} program. If the program is short, it is easiest to include it in the command that runs @command{awk}, like this: @@ -2391,7 +2394,7 @@ like this: awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} @end example -@cindex command line, formats +@cindex command line @subentry formats When the program is long, it is usually more convenient to put it in a file and run it with a command like this: @@ -2440,8 +2443,8 @@ characters. The quotes also cause the shell to treat all of @var{program} as a single argument for @command{awk}, and allow @var{program} to be more than one line long. -@cindex shells, scripts -@cindex @command{awk} programs, running, from shell scripts +@cindex shells @subentry scripts +@cindex @command{awk} programs @subentry running @subentry from shell scripts This format is also useful for running short or medium-sized @command{awk} programs from shell scripts, because it avoids the need for a separate file for the @command{awk} program. A self-contained shell script is more @@ -2459,8 +2462,8 @@ self-contained programs. @subsection Running @command{awk} Without Input Files @cindex standard input -@cindex input, standard -@cindex input files, running @command{awk} without +@cindex input @subentry standard +@cindex input files @subentry running @command{awk} without You can also run @command{awk} without any input files. If you type the following command line: @@ -2474,9 +2477,9 @@ which usually means whatever you type on the keyboard. This continues until you indicate end-of-file by typing @kbd{Ctrl-d}. (On non-POSIX operating systems, the end-of-file character may be different.) -@cindex files, input, See input files -@cindex input files, running @command{awk} without -@cindex @command{awk} programs, running, without input files +@cindex files @subentry input @seeentry{input files} +@cindex input files @subentry running @command{awk} without +@cindex @command{awk} programs @subentry running @subentry without input files As an example, the following program prints a friendly piece of advice (from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}), to keep you from worrying about the complexities of computer @@ -2522,9 +2525,9 @@ $ @kbd{awk '@{ print @}'} @node Long @subsection Running Long Programs -@cindex @command{awk} programs, running -@cindex @command{awk} programs, lengthy -@cindex files, @command{awk} programs in +@cindex @command{awk} programs @subentry running +@cindex @command{awk} programs @subentry lengthy +@cindex files @subentry @command{awk} programs in Sometimes @command{awk} programs are very long. In these cases, it is more convenient to put the program into a separate file. In order to tell @command{awk} to use that file for its program, you type: @@ -2534,7 +2537,7 @@ awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{} @end example @cindex @option{-f} option -@cindex command line, option @option{-f} +@cindex command line @subentry option @option{-f} The @option{-f} instructs the @command{awk} utility to get the @command{awk} program from the file @var{source-file} (@pxref{Options}). Any @value{FN} can be used for @var{source-file}. For example, you @@ -2558,7 +2561,7 @@ does the same thing as this one: awk 'BEGIN @{ print "Don\47t Panic!" @}' @end example -@cindex quoting, in @command{gawk} command lines +@cindex quoting @subentry in @command{gawk} command lines @noindent This was explained earlier (@pxref{Read Terminal}). @@ -2570,8 +2573,8 @@ for programs that are provided on the @command{awk} command line. (Also, placing the program in a file allows us to use a literal single quote in the program text, instead of the magic @samp{\47}.) -@cindex single quote (@code{'}) in @command{gawk} command lines -@cindex @code{'} (single quote) in @command{gawk} command lines +@cindex single quote (@code{'}) @subentry in @command{gawk} command lines +@cindex @code{'} (single quote) @subentry in @command{gawk} command lines If you want to clearly identify an @command{awk} program file as such, you can add the extension @file{.awk} to the @value{FN}. This doesn't affect the execution of the @command{awk} program but it does make @@ -2580,9 +2583,9 @@ affect the execution of the @command{awk} program but it does make @node Executable Scripts @subsection Executable @command{awk} Programs @cindex @command{awk} programs -@cindex @code{#} (number sign), @code{#!} (executable scripts) -@cindex Unix, @command{awk} scripts and -@cindex number sign (@code{#}), @code{#!} (executable scripts) +@cindex @code{#} (number sign) @subentry @code{#!} (executable scripts) +@cindex Unix @subentry @command{awk} scripts and +@cindex number sign (@code{#}) @subentry @code{#!} (executable scripts) Once you have learned @command{awk}, you may want to write self-contained @command{awk} scripts, using the @samp{#!} script mechanism. You can do @@ -2618,7 +2621,7 @@ program that users can invoke without their having to know that the program is written in @command{awk}. @sidebar Understanding @samp{#!} -@cindex portability, @code{#!} (executable scripts) +@cindex portability @subentry @code{#!} (executable scripts) @command{awk} is an @dfn{interpreted} language. This means that the @command{awk} utility reads your program and then processes your data @@ -2647,9 +2650,9 @@ treats the rest of the line as a single argument and passes it to @command{awk}. Doing this leads to confusing behavior---most likely a usage diagnostic of some sort from @command{awk}. -@cindex @code{ARGC}/@code{ARGV} variables, portability and -@cindex portability, @code{ARGV} variable -@cindex dark corner, @code{ARGV} variable, value of +@cindex @code{ARGC}/@code{ARGV} variables @subentry portability and +@cindex portability @subentry @code{ARGV} variable +@cindex dark corner @subentry @code{ARGV} variable, value of Finally, the value of @code{ARGV[0]} (@pxref{Built-in Variables}) varies depending upon your operating system. @@ -2662,10 +2665,10 @@ to provide your script name. @node Comments @subsection Comments in @command{awk} Programs -@cindex @code{#} (number sign), commenting -@cindex number sign (@code{#}), commenting +@cindex @code{#} (number sign) @subentry commenting +@cindex number sign (@code{#}) @subentry commenting @cindex commenting -@cindex @command{awk} programs, documenting +@cindex @command{awk} programs @subentry documenting A @dfn{comment} is some text that is included in a program for the sake of human readers; it is not really an executable part of the program. Comments @@ -2690,9 +2693,9 @@ programs, but this usually isn't very useful; the purpose of a comment is to help you or another person understand the program when reading it at a later time. -@cindex quoting, for small awk programs -@cindex single quote (@code{'}), vs.@: apostrophe -@cindex @code{'} (single quote), vs.@: apostrophe +@cindex quoting @subentry for small awk programs +@cindex single quote (@code{'}) @subentry vs.@: apostrophe +@cindex @code{'} (single quote) @subentry vs.@: apostrophe @quotation CAUTION As mentioned in @ref{One-shot}, @@ -2748,7 +2751,7 @@ the shell prompt, or writing it as part of a larger shell script: awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{} @end example -@cindex shells, quoting, rules for +@cindex shells @subentry quoting @subentry rules for @cindex Bourne shell, quoting rules for Once you are working with the shell, it is helpful to have a basic knowledge of shell quoting rules. The following rules apply only to @@ -2787,10 +2790,10 @@ that character. The shell removes the backslash and passes the quoted character on to the command. @item -@cindex @code{\} (backslash), in shell commands -@cindex backslash (@code{\}), in shell commands -@cindex single quote (@code{'}), in shell commands -@cindex @code{'} (single quote), in shell commands +@cindex @code{\} (backslash) @subentry in shell commands +@cindex backslash (@code{\}) @subentry in shell commands +@cindex single quote (@code{'}) @subentry in shell commands +@cindex @code{'} (single quote) @subentry in shell commands Single quotes protect everything between the opening and closing quotes. The shell does no interpretation of the quoted text, passing it on verbatim to the command. @@ -2800,8 +2803,8 @@ Refer back to for an example of what happens if you try. @item -@cindex double quote (@code{"}), in shell commands -@cindex @code{"} (double quote), in shell commands +@cindex double quote (@code{"}) @subentry in shell commands +@cindex @code{"} (double quote) @subentry in shell commands Double quotes protect most things between the opening and closing quotes. The shell does at least variable and command substitution on the quoted text. Different shells may do additional kinds of processing on double-quoted text. @@ -2829,8 +2832,8 @@ $ @kbd{awk "BEGIN @{ print \"Don't Panic!\" @}"} @print{} Don't Panic! @end example -@cindex single quote (@code{'}), with double quotes -@cindex @code{'} (single quote), with double quotes +@cindex single quote (@code{'}) @subentry with double quotes +@cindex @code{'} (single quote) @subentry with double quotes Note that the single quote is not special within double quotes. @item @@ -2844,7 +2847,7 @@ awk -F "" '@var{program}' @var{files} # correct @end example @noindent -@cindex null strings in @command{gawk} arguments, quoting and +@cindex null strings @subentry in @command{gawk} arguments, quoting and Don't use this: @example @@ -2857,7 +2860,7 @@ as the value of @code{FS}, and the first @value{FN} as the text of the program! This results in syntax errors at best, and confusing behavior at worst. @end itemize -@cindex quoting, in @command{gawk} command lines, tricks for +@cindex quoting @subentry in @command{gawk} command lines @subentry tricks for Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this: @@ -3019,7 +3022,7 @@ double-quote don't need duplication. @node Sample Data Files @section @value{DDF}s for the Examples -@cindex input files, examples +@cindex input files @subentry examples @cindex @code{mail-list} file Many of the examples in this @value{DOCUMENT} take their input from two sample @value{DF}s. The first, @file{mail-list}, represents a list of peoples' names @@ -3132,21 +3135,21 @@ $ @kbd{awk '/li/ @{ print $0 @}' mail-list} @print{} Samuel 555-3430 samuel.lanceolis@@shu.edu A @end example -@cindex actions, default -@cindex patterns, default +@cindex actions @subentry default +@cindex patterns @subentry default In an @command{awk} rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for @emph{every} input line. If the action is omitted, the default action is to print all lines that match the pattern. -@cindex actions, empty +@cindex actions @subentry empty Thus, we could leave out the action (the @code{print} statement and the braces) in the previous example and the result would be the same: @command{awk} prints all lines matching the pattern @samp{li}. By comparison, omitting the @code{print} statement but retaining the braces makes an empty action that does nothing (i.e., no lines are printed). -@cindex @command{awk} programs, one-line examples +@cindex @command{awk} programs @subentry one-line examples Many practical @command{awk} programs are just a line or two long. Following is a collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. (The description @@ -3351,7 +3354,7 @@ the file was last modified. Its output looks like this: @end example @noindent -@cindex line continuations, with C shell +@cindex line continuations @subentry with C shell The first field contains read-write permissions, the second field contains the number of links to the file, and the third field identifies the file's owner. The fourth field identifies the file's group. @@ -3397,7 +3400,7 @@ awk '/12/ @{ print $0 @} /21/ @{ print $0 @}' mail-list inventory-shipped @end example -@cindex @command{gawk}, newlines in +@cindex @command{gawk} @subentry newlines in However, @command{gawk} ignores newlines after any of the following symbols and keywords: @@ -3414,8 +3417,8 @@ Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk} extension; if @option{--posix} is specified (@pxref{Options}), then this extension is disabled.} -@cindex @code{\} (backslash), continuing lines and -@cindex backslash (@code{\}), continuing lines and +@cindex @code{\} (backslash) @subentry continuing lines and +@cindex backslash (@code{\}) @subentry continuing lines and If you would like to split a single statement into two lines at a point where a newline would terminate it, you can @dfn{continue} it by ending the first line with a backslash character (@samp{\}). The backslash must be @@ -3429,7 +3432,7 @@ awk '/This regular expression is too long, so continue it\ @end example @noindent -@cindex portability, backslash continuation and +@cindex portability @subentry backslash continuation and We have generally not used backslash continuation in our sample programs. @command{gawk} places no limit on the length of a line, so backslash continuation is never strictly necessary; @@ -3447,8 +3450,8 @@ lines in the middle of a regular expression or a string. @c solaris 2.7 nawk does not. Solaris /usr/xpg4/bin/awk does though! sigh. @cindex @command{csh} utility -@cindex backslash (@code{\}), continuing lines and, in @command{csh} -@cindex @code{\} (backslash), continuing lines and, in @command{csh} +@cindex backslash (@code{\}) @subentry continuing lines and @subentry in @command{csh} +@cindex @code{\} (backslash) @subentry continuing lines and @subentry in @command{csh} @quotation CAUTION @emph{Backslash continuation does not work as described with the C shell.} It works for @command{awk} programs in files and @@ -3486,9 +3489,9 @@ begin on the same line as the pattern. To have the pattern and action on separate lines, you @emph{must} use backslash continuation; there is no other option. -@cindex backslash (@code{\}), continuing lines and, comments and -@cindex @code{\} (backslash), continuing lines and, comments and -@cindex commenting, backslash continuation and +@cindex backslash (@code{\}) @subentry continuing lines and @subentry comments and +@cindex @code{\} (backslash) @subentry continuing lines and @subentry comments and +@cindex commenting @subentry backslash continuation and Another thing to keep in mind is that backslash continuation and comments do not mix. As soon as @command{awk} sees the @samp{#} that starts a comment, it ignores @emph{everything} on the rest of the @@ -3510,11 +3513,11 @@ next line. However, the backslash-newline combination is never even noticed because it is ``hidden'' inside the comment. Thus, the @code{BEGIN} is noted as a syntax error. -@cindex statements, multiple -@cindex @code{;} (semicolon), separating statements in actions -@cindex semicolon (@code{;}), separating statements in actions -@cindex @code{;} (semicolon), separating rules -@cindex semicolon (@code{;}), separating rules +@cindex statements @subentry multiple +@cindex @code{;} (semicolon) @subentry separating statements in actions +@cindex semicolon (@code{;}) @subentry separating statements in actions +@cindex @code{;} (semicolon) @subentry separating rules +@cindex semicolon (@code{;}) @subentry separating rules When @command{awk} statements within one rule are short, you might want to put more than one of them on a line. This is accomplished by separating the statements with a semicolon (@samp{;}). @@ -3557,7 +3560,7 @@ systematically in @ref{Built-in Variables} and in @node When @section When to Use @command{awk} -@cindex @command{awk}, uses for +@cindex @command{awk} @subentry uses for Now that you've seen some of what @command{awk} can do, you might wonder how @command{awk} could be useful for you. By using utility programs, advanced patterns, field separators, arithmetic @@ -3588,7 +3591,7 @@ computer. The original @command{awk}'s capabilities were strained by tasks of such complexity, but modern versions are more capable. -@cindex @command{awk} programs, complex +@cindex @command{awk} programs @subentry complex If you find yourself writing @command{awk} scripts of more than, say, a few hundred lines, you might consider using a different programming language. The shell is good at string and pattern matching; in addition, @@ -3668,10 +3671,10 @@ things in this @value{CHAPTER} that don't interest you right now. @node Command Line @section Invoking @command{awk} -@cindex command line, invoking @command{awk} from -@cindex @command{awk}, invoking -@cindex arguments, command-line, invoking @command{awk} -@cindex options, command-line, invoking @command{awk} +@cindex command line @subentry invoking @command{awk} from +@cindex @command{awk} @subentry invoking +@cindex arguments @subentry command-line @subentry invoking @command{awk} +@cindex options @subentry command-line @subentry invoking @command{awk} There are two ways to run @command{awk}---with an explicit program or with one or more program files. Here are templates for both of them; items @@ -3684,12 +3687,12 @@ enclosed in [@dots{}] in these templates are optional: @cindex GNU long options @cindex long options -@cindex options, long +@cindex options @subentry long In addition to traditional one-letter POSIX-style options, @command{gawk} also supports GNU long options. -@cindex dark corner, invoking @command{awk} -@cindex lint checking, empty programs +@cindex dark corner @subentry invoking @command{awk} +@cindex lint checking @subentry empty programs It is possible to invoke @command{awk} with an empty program: @example @@ -3697,7 +3700,7 @@ awk '' datafile1 datafile2 @end example @cindex @option{--lint} option -@cindex dark corner, empty programs +@cindex dark corner @subentry empty programs @noindent Doing so makes little sense, though; @command{awk} exits silently when given an empty program. @@ -3708,10 +3711,10 @@ warning that the program is empty. @node Options @section Command-Line Options -@cindex options, command-line -@cindex command line, options +@cindex options @subentry command-line +@cindex command line @subentry options @cindex GNU long options -@cindex options, long +@cindex options @subentry long Options begin with a dash and consist of a single character. GNU-style long options consist of two dashes and a keyword. @@ -3723,7 +3726,7 @@ by whitespace. If a particular option with a value is given more than once, it is the last value that counts. -@cindex POSIX @command{awk}, GNU long options and +@cindex POSIX @command{awk} @subentry GNU long options and Each long option for @command{gawk} has a corresponding POSIX-style short option. The long and short options are @@ -3735,7 +3738,7 @@ The following list describes options mandated by the POSIX standard: @itemx --field-separator @var{fs} @cindex @option{-F} option @cindex @option{--field-separator} option -@cindex @code{FS} variable, @code{--field-separator} option and +@cindex @code{FS} variable @subentry @code{--field-separator} option and Set the @code{FS} variable to @var{fs} (@pxref{Field Separators}). @@ -3743,7 +3746,7 @@ Set the @code{FS} variable to @var{fs} @itemx --file @var{source-file} @cindex @option{-f} option @cindex @option{--file} option -@cindex @command{awk} programs, location of +@cindex @command{awk} programs @subentry location of Read the @command{awk} program source from @var{source-file} instead of in the first nonoption argument. This option may be given multiple times; the @command{awk} @@ -3757,7 +3760,7 @@ at their beginning. @xref{Changing The Namespace}, for more information. @itemx --assign @var{var}=@var{val} @cindex @option{-v} option @cindex @option{--assign} option -@cindex variables, setting +@cindex variables @subentry setting Set the variable @var{var} to the value @var{val} @emph{before} execution of the program begins. Such variable values are available inside the @code{BEGIN} rule @@ -3767,8 +3770,8 @@ The @option{-v} option can only set one variable, but it can be used more than once, setting another variable each time, like this: @samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}. -@cindex predefined variables, @code{-v} option@comma{} setting with -@cindex variables, predefined, @code{-v} option@comma{} setting with +@cindex predefined variables @subentry @code{-v} option, setting with +@cindex variables @subentry predefined @subentry @code{-v} option, setting with @quotation CAUTION Using @option{-v} to set the values of the built-in variables may lead to surprising results. @command{awk} will reset the @@ -3787,15 +3790,15 @@ the abbreviations remain unique. The full list of @command{gawk}-specific options is provided next. @item -- -@cindex command line, options, end of -@cindex options, command-line, end of +@cindex command line @subentry options @subentry end of +@cindex options @subentry command-line @subentry end of Signal the end of the command-line options. The following arguments are not treated as options even if they begin with @samp{-}. This interpretation of @option{--} follows the POSIX argument parsing conventions. -@cindex @code{-} (hyphen), file names beginning with -@cindex hyphen (@code{-}), file names beginning with +@cindex @code{-} (hyphen) @subentry file names beginning with +@cindex hyphen (@code{-}) @subentry file names beginning with This is useful if you have @value{FN}s that start with @samp{-}, or in shell scripts, if you have @value{FN}s that will be specified by the user that could start with @samp{-}. @@ -3826,7 +3829,7 @@ multibyte characters. This option is an easy way to tell @command{gawk}, @itemx @option{--traditional} @cindex @option{-c} option @cindex @option{--traditional} option -@cindex compatibility mode (@command{gawk}), specifying +@cindex compatibility mode (@command{gawk}) @subentry specifying Specify @dfn{compatibility mode}, in which the GNU extensions to the @command{awk} language are disabled, so that @command{gawk} behaves just like BWK @command{awk}. @@ -3841,7 +3844,7 @@ Also see @itemx @option{--copyright} @cindex @option{-C} option @cindex @option{--copyright} option -@cindex GPL (General Public License), printing +@cindex GPL (General Public License) @subentry printing Print the short version of the General Public License and then exit. @item @option{-d}[@var{file}] @@ -3850,15 +3853,15 @@ Print the short version of the General Public License and then exit. @cindex @option{--dump-variables} option @cindex dump all variables of a program @cindex @file{awkvars.out} file -@cindex files, @file{awkvars.out} -@cindex variables, global, printing list of +@cindex files @subentry @file{awkvars.out} +@cindex variables @subentry global @subentry printing list of Print a sorted list of global variables, their types, and final values to @var{file}. If no @var{file} is provided, print this list to a file named @file{awkvars.out} in the current directory. No space is allowed between the @option{-d} and @var{file}, if @var{file} is supplied. -@cindex troubleshooting, typographical errors@comma{} global variables +@cindex troubleshooting @subentry typographical errors, global variables Having a list of all global variables is a good way to look for typographical errors in your programs. You would also use this option if you have a large program with a lot of @@ -3871,7 +3874,7 @@ names like @code{i}, @code{j}, etc.) @itemx @option{--debug}[@code{=}@var{file}] @cindex @option{-D} option @cindex @option{--debug} option -@cindex @command{awk} debugging, enabling +@cindex @command{awk} programs @subentry debugging, enabling Enable debugging of @command{awk} programs (@pxref{Debugging}). By default, the debugger reads commands interactively from the keyboard @@ -3885,7 +3888,7 @@ No space is allowed between the @option{-D} and @var{file}, if @itemx @option{--source} @var{program-text} @cindex @option{-e} option @cindex @option{--source} option -@cindex source code, mixing +@cindex source code @subentry mixing Provide program source code in the @var{program-text}. This option allows you to mix source code in files with source code that you enter on the command line. @@ -3920,7 +3923,7 @@ for more information. @itemx @option{--exec} @var{file} @cindex @option{-E} option @cindex @option{--exec} option -@cindex @command{awk} programs, location of +@cindex @command{awk} programs @subentry location of @cindex CGI, @command{awk} scripts for Similar to @option{-f}, read @command{awk} program text from @var{file}. There are two differences from @option{-f}: @@ -3957,8 +3960,8 @@ with @samp{#!} scripts (@pxref{Executable Scripts}), like so: @itemx @option{--gen-pot} @cindex @option{-g} option @cindex @option{--gen-pot} option -@cindex portable object files, generating -@cindex files, portable object, generating +@cindex portable object @subentry files @subentry generating +@cindex files @subentry portable object @subentry generating Analyze the source program and generate a GNU @command{gettext} portable object template file on standard output for all string constants that have been marked for translation. @@ -3969,9 +3972,9 @@ for information about this option. @itemx @option{--help} @cindex @option{-h} option @cindex @option{--help} option -@cindex GNU long options, printing list of -@cindex options, printing list of -@cindex printing, list of options +@cindex GNU long options @subentry printing list of +@cindex options @subentry printing list of +@cindex printing @subentry list of options Print a ``usage'' message summarizing the short- and long-style options that @command{gawk} accepts and then exit. @@ -3979,7 +3982,7 @@ that @command{gawk} accepts and then exit. @itemx @option{--include} @var{source-file} @cindex @option{-i} option @cindex @option{--include} option -@cindex @command{awk} programs, location of +@cindex @command{awk} programs @subentry location of Read an @command{awk} source library from @var{source-file}. This option is completely equivalent to using the @code{@@include} directive inside your program. It is very similar to the @option{-f} option, @@ -3999,7 +4002,7 @@ at their beginning. @xref{Changing The Namespace}, for more information. @itemx @option{--load} @var{ext} @cindex @option{-l} option @cindex @option{--load} option -@cindex loading, extensions +@cindex loading extensions Load a dynamic extension named @var{ext}. Extensions are stored as system shared libraries. This option searches for the library using the @env{AWKLIBPATH} @@ -4013,7 +4016,7 @@ a shared library. This advanced feature is described in detail in @ref{Dynamic @itemx @option{--lint}[@code{=}@var{value}] @cindex @option{-l} option @cindex @option{--lint} option -@cindex lint checking, issuing warnings +@cindex lint checking @subentry issuing warnings @cindex warnings, issuing Warn about constructs that are dubious or nonportable to other @command{awk} implementations. @@ -4048,9 +4051,9 @@ if @command{gawk} is not compiled to use the GNU MPFR and MP libraries @itemx @option{--non-decimal-data} @cindex @option{-n} option @cindex @option{--non-decimal-data} option -@cindex hexadecimal values@comma{} enabling interpretation of -@cindex octal values@comma{} enabling interpretation of -@cindex troubleshooting, @code{--non-decimal-data} option +@cindex hexadecimal values, enabling interpretation of +@cindex octal values, enabling interpretation of +@cindex troubleshooting @subentry @code{--non-decimal-data} option Enable automatic interpretation of octal and hexadecimal values in input data (@pxref{Nondecimal Data}). @@ -4103,7 +4106,7 @@ be used to cancel the effect of an earlier @option{-s} option @itemx @option{--profile}[@code{=}@var{file}] @cindex @option{-p} option @cindex @option{--profile} option -@cindex @command{awk} profiling, enabling +@cindex @command{awk} @subentry profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). Implies @option{--no-optimize}. @@ -4121,7 +4124,7 @@ in the left margin, and function call counts for each function. @cindex @option{-P} option @cindex @option{--posix} option @cindex POSIX mode -@cindex @command{gawk}, extensions@comma{} disabling +@cindex @command{gawk} @subentry extensions, disabling Operate in strict POSIX mode. This disables all @command{gawk} extensions (just like @option{--traditional}) and disables all extensions not allowed by POSIX. @@ -4134,13 +4137,13 @@ restrictions apply: @itemize @value{BULLET} @cindex newlines -@cindex whitespace, newlines as +@cindex whitespace @subentry newlines as @item Newlines are not allowed after @samp{?} or @samp{:} (@pxref{Conditional Exp}). -@cindex @code{FS} variable, TAB character as +@cindex @code{FS} variable @subentry TAB character as @item Specifying @samp{-Ft} on the command line does not set the value of @code{FS} to be a single TAB character @@ -4155,8 +4158,8 @@ data (@pxref{Locales}). @c @cindex automatic warnings @c @cindex warnings, automatic -@cindex @option{--traditional} option, @code{--posix} option and -@cindex @option{--posix} option, @code{--traditional} option and +@cindex @option{--traditional} option @subentry @code{--posix} option and +@cindex @option{--posix} option @subentry @code{--traditional} option and If you supply both @option{--traditional} and @option{--posix} on the command line, @option{--posix} takes precedence. @command{gawk} issues a warning if both options are supplied. @@ -4165,7 +4168,7 @@ issues a warning if both options are supplied. @itemx @option{--re-interval} @cindex @option{-r} option @cindex @option{--re-interval} option -@cindex regular expressions, interval expressions and +@cindex regular expressions @subentry interval expressions and Allow interval expressions (@pxref{Regexp Operators}) in regexps. @@ -4208,27 +4211,34 @@ Warn about constructs that are not available in the original version of @itemx @option{--version} @cindex @option{-V} option @cindex @option{--version} option -@cindex @command{gawk}, versions of, information about@comma{} printing +@cindex @command{gawk} @subentry version of @subentry printing information about Print version information for this particular copy of @command{gawk}. This allows you to determine if your copy of @command{gawk} is up to date with respect to whatever the Free Software Foundation is currently distributing. It is also useful for bug reports (@pxref{Bugs}). + +@cindex @code{-} (hyphen) @subentry @code{--} end of options marker +@cindex hyphen (@code{-}) @subentry @code{--} end of options marker +@item @code{--} +Mark the end of all options. +Any command-line arguments following @code{--} are placed in @code{ARGV}, +even if they start with a minus sign. @end table As long as program text has been supplied, any other options are flagged as invalid with a warning message but are otherwise ignored. -@cindex @option{-F} option, @option{-Ft} sets @code{FS} to TAB +@cindex @option{-F} option @subentry @option{-Ft} sets @code{FS} to TAB In compatibility mode, as a special case, if the value of @var{fs} supplied to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB character (@code{"\t"}). This is true only for @option{--traditional} and not for @option{--posix} (@pxref{Field Separators}). -@cindex @option{-f} option, multiple uses +@cindex @option{-f} option @subentry multiple uses The @option{-f} option may be used more than once on the command line. If it is, @command{awk} reads its program source from all of the named files, as if they had been concatenated together into one big file. This is @@ -4265,7 +4275,8 @@ the command line that follow the program text are entered into the command line looking for options. @cindex @env{POSIXLY_CORRECT} environment variable -@cindex lint checking, @env{POSIXLY_CORRECT} environment variable +@cindex environment variables @subentry @env{POSIXLY_CORRECT} +@cindex lint checking @subentry @env{POSIXLY_CORRECT} environment variable @cindex POSIX mode If the environment variable @env{POSIXLY_CORRECT} exists, then @command{gawk} behaves in strict POSIX mode, exactly as if @@ -4287,7 +4298,7 @@ POSIXLY_CORRECT=true export POSIXLY_CORRECT @end example -@cindex @command{csh} utility, @env{POSIXLY_CORRECT} environment variable +@cindex @command{csh} utility @subentry @env{POSIXLY_CORRECT} environment variable For a C shell-compatible shell,@footnote{Not recommended.} you would add this line to the @file{.login} file in your home directory: @@ -4296,15 +4307,15 @@ you would add this line to the @file{.login} file in your home directory: setenv POSIXLY_CORRECT true @end example -@cindex portability, @env{POSIXLY_CORRECT} environment variable +@cindex portability @subentry @env{POSIXLY_CORRECT} environment variable Having @env{POSIXLY_CORRECT} set is not recommended for daily use, but it is good for testing the portability of your programs to other environments. @node Other Arguments @section Other Command-Line Arguments -@cindex command line, arguments -@cindex arguments, command-line +@cindex command line @subentry arguments +@cindex arguments @subentry command-line Any additional arguments on the command line are normally treated as input files to be processed in the order specified. However, an @@ -4326,10 +4337,10 @@ a variable assignment), precede the file name with @samp{./}, like so: awk -f program.awk file1 ./count=1 file2 @end example -@cindex @command{gawk}, @code{ARGIND} variable in -@cindex @code{ARGIND} variable, command-line arguments +@cindex @command{gawk} @subentry @code{ARGIND} variable in +@cindex @code{ARGIND} variable @subentry command-line arguments @cindex @code{ARGV} array, indexing into -@cindex @code{ARGC}/@code{ARGV} variables, command-line arguments +@cindex @code{ARGC}/@code{ARGV} variables @subentry command-line arguments All the command-line arguments are made available to your @command{awk} program in the @code{ARGV} array (@pxref{Built-in Variables}). Command-line options and the program text (if present) are omitted from @code{ARGV}. @@ -4343,7 +4354,7 @@ Changing @code{ARGC} and @code{ARGV} in your @command{awk} program lets you control how @command{awk} processes the input files; this is described in more detail in @ref{ARGC and ARGV}. -@cindex input files, variable assignments and +@cindex input files @subentry variable assignments and @cindex variable assignments and input files The distinction between @value{FN} arguments and variable-assignment arguments is made when @command{awk} is about to open the next input file. @@ -4358,7 +4369,7 @@ variables assigned in this fashion are @emph{not} available inside a (@pxref{BEGIN/END}), because such rules are run before @command{awk} begins scanning the argument list. -@cindex dark corner, escape sequences +@cindex dark corner @subentry escape sequences The variable values given on the command line are processed for escape sequences (@pxref{Escape Sequences}). @value{DARKCORNER} @@ -4379,7 +4390,7 @@ output formats, before scanning the @value{DF}s. It is also useful for controlling state if multiple passes are needed over a @value{DF}. For example: -@cindex files, multiple passes over +@cindex files @subentry multiple passes over @example awk 'pass == 1 @{ @var{pass 1 stuff} @} pass == 2 @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata @@ -4424,7 +4435,7 @@ this @value{FN} itself.) @node Environment Variables @section The Environment Variables @command{gawk} Uses -@cindex environment variables used by @command{gawk} +@cindex environment variables @subentry used by @command{gawk} A number of environment variables influence how @command{gawk} behaves. @@ -4440,9 +4451,10 @@ behaves. @node AWKPATH Variable @subsection The @env{AWKPATH} Environment Variable @cindex @env{AWKPATH} environment variable -@cindex directories, searching for source files -@cindex search paths, for source files -@cindex differences in @command{awk} and @command{gawk}, @env{AWKPATH} environment variable +@cindex environment variables @subentry @env{AWKPATH} +@cindex directories @subentry searching @subentry for source files +@cindex search paths @subentry for source files +@cindex differences in @command{awk} and @command{gawk} @subentry @env{AWKPATH} environment variable @ifinfo The previous @value{SECTION} described how @command{awk} program files can be named on the command line with the @option{-f} option. @@ -4526,9 +4538,10 @@ found, and @command{gawk} no longer needs to use @env{AWKPATH}. @node AWKLIBPATH Variable @subsection The @env{AWKLIBPATH} Environment Variable @cindex @env{AWKLIBPATH} environment variable -@cindex directories, searching for loadable extensions -@cindex search paths, for loadable extensions -@cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable +@cindex environment variables @subentry @env{AWKLIBPATH} +@cindex directories @subentry searching @subentry for loadable extensions +@cindex search paths @subentry for loadable extensions +@cindex differences in @command{awk} and @command{gawk} @subentry @code{AWKLIBPATH} environment variable The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH} variable, but it is used to search for loadable extensions (stored as @@ -4682,9 +4695,11 @@ to @code{EXIT_FAILURE}. This @value{SECTION} describes a feature that is specific to @command{gawk}. -@cindex @code{@@include} directive +@cindex @code{@@} (at-sign) @subentry @code{@@include} directive +@cindex at-sign (@code{@@}) @subentry @code{@@include} directive @cindex file inclusion, @code{@@include} directive @cindex including files, @code{@@include} directive +@cindex @code{@@include} directive @sortas{include directive} The @code{@@include} keyword can be used to read external @command{awk} source files. This gives you the ability to split large @command{awk} source files into smaller, more manageable pieces, and also lets you reuse common @command{awk} @@ -4810,9 +4825,11 @@ at their beginning. @xref{Changing The Namespace}, for more information. This @value{SECTION} describes a feature that is specific to @command{gawk}. -@cindex @code{@@load} directive -@cindex loading extensions, @code{@@load} directive -@cindex extensions, loading, @code{@@load} directive +@cindex @code{@@} (at-sign) @subentry @code{@@load} directive +@cindex at-sign (@code{@@}) @subentry @code{@@load} directive +@cindex loading extensions @subentry @code{@@load} directive +@cindex extensions @subentry loadable @subentry loading, @code{@@load} directive +@cindex @code{@@load} directive @sortas{load directive} The @code{@@load} keyword can be used to read external @command{awk} extensions (stored as system shared libraries). This allows you to link in compiled code that may offer superior @@ -4855,8 +4872,8 @@ It also describes the @code{ordchr} extension. @c update this section for each release! -@cindex options, deprecated -@cindex features, deprecated +@cindex options @subentry deprecated +@cindex features @subentry deprecated @cindex obsolete features This @value{SECTION} describes features and/or command-line options from previous releases of @command{gawk} that either are not available in the @@ -4878,7 +4895,7 @@ in case some option becomes obsolete in a future version of @command{gawk}. @node Undocumented @section Undocumented Options and Features @cindex undocumented features -@cindex features, undocumented +@cindex features @subentry undocumented @cindex Skywalker, Luke @cindex Kenobi, Obi-Wan @cindex jedi knights @@ -4888,7 +4905,7 @@ in case some option becomes obsolete in a future version of @command{gawk}. @author Obi-Wan @end quotation -@cindex shells, sea +@cindex shells @subentry sea This @value{SECTION} intentionally left blank. @@ -5072,8 +5089,8 @@ set of strings. Because regular expressions are such a fundamental part of @command{awk} programming, their format and use deserve a separate @value{CHAPTER}. -@cindex forward slash (@code{/}) to enclose regular expressions -@cindex @code{/} (forward slash) to enclose regular expressions +@cindex forward slash (@code{/}) @subentry to enclose regular expressions +@cindex @code{/} (forward slash) @subentry to enclose regular expressions A regular expression enclosed in slashes (@samp{/}) is an @command{awk} pattern that matches every input record whose text belongs to that set. @@ -5105,8 +5122,8 @@ regular expressions work, we present more complicated instances. @node Regexp Usage @section How to Use Regular Expressions -@cindex patterns, regular expressions as -@cindex regular expressions, as patterns +@cindex patterns @subentry regexp constants as +@cindex regular expressions @subentry as patterns A regular expression can be used as a pattern by enclosing it in slashes. Then the regular expression is tested against the entire text of each record. (Normally, it only needs @@ -5122,18 +5139,18 @@ $ @kbd{awk '/li/ @{ print $2 @}' mail-list} @print{} 555-3430 @end example -@cindex regular expressions, operators -@cindex operators, string-matching +@cindex regular expressions @subentry operators +@cindex operators @subentry string-matching @c @cindex operators, @code{~} @cindex string-matching operators @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator @c @cindex operators, @code{!~} -@cindex @code{if} statement, use of regexps in -@cindex @code{while} statement, use of regexps in -@cindex @code{do}-@code{while} statement, use of regexps in +@cindex @code{if} statement @subentry use of regexps in +@cindex @code{while} statement @subentry use of regexps in +@cindex @code{do}-@code{while} statement @subentry use of regexps in @c @cindex statements, @code{if} @c @cindex statements, @code{while} @c @cindex statements, @code{do} @@ -5191,8 +5208,8 @@ $ @kbd{awk '$1 !~ /J/' inventory-shipped} @end example @cindex regexp constants -@cindex constant regexps -@cindex regular expressions, constants, See regexp constants +@cindex constants @subentry regexp +@cindex regular expressions, constants @seeentry{regexp constants} When a regexp is enclosed in slashes, such as @code{/foo/}, we call it a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @code{"foo"} is a string constant. @@ -5200,9 +5217,10 @@ a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @node Escape Sequences @section Escape Sequences -@cindex escape sequences, in strings -@cindex backslash (@code{\}), in escape sequences -@cindex @code{\} (backslash), in escape sequences +@cindex escape sequences +@cindex escape sequences @seealso{backslash} +@cindex backslash (@code{\}) @subentry in escape sequences +@cindex @code{\} (backslash) @subentry in escape sequences Some characters cannot be included literally in string constants (@code{"foo"}) or regexp constants (@code{/foo/}). Instead, they should be represented with @dfn{escape sequences}, @@ -5232,50 +5250,51 @@ all the escape sequences used in @command{awk} and what they represent. Unless noted otherwise, all these escape sequences apply to both string constants and regexp constants: +@cindex ASCII @table @code @item \\ A literal backslash, @samp{\}. @c @cindex @command{awk} language, V.4 version -@cindex @code{\} (backslash), @code{\a} escape sequence -@cindex backslash (@code{\}), @code{\a} escape sequence +@cindex @code{\} (backslash) @subentry @code{\a} escape sequence +@cindex backslash (@code{\}) @subentry @code{\a} escape sequence @item \a The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL). (This often makes some sort of audible noise.) -@cindex @code{\} (backslash), @code{\b} escape sequence -@cindex backslash (@code{\}), @code{\b} escape sequence +@cindex @code{\} (backslash) @subentry @code{\b} escape sequence +@cindex backslash (@code{\}) @subentry @code{\b} escape sequence @item \b Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS). -@cindex @code{\} (backslash), @code{\f} escape sequence -@cindex backslash (@code{\}), @code{\f} escape sequence +@cindex @code{\} (backslash) @subentry @code{\f} escape sequence +@cindex backslash (@code{\}) @subentry @code{\f} escape sequence @item \f Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF). -@cindex @code{\} (backslash), @code{\n} escape sequence -@cindex backslash (@code{\}), @code{\n} escape sequence +@cindex @code{\} (backslash) @subentry @code{\n} escape sequence +@cindex backslash (@code{\}) @subentry @code{\n} escape sequence @item \n Newline, @kbd{Ctrl-j}, ASCII code 10 (LF). -@cindex @code{\} (backslash), @code{\r} escape sequence -@cindex backslash (@code{\}), @code{\r} escape sequence +@cindex @code{\} (backslash) @subentry @code{\r} escape sequence +@cindex backslash (@code{\}) @subentry @code{\r} escape sequence @item \r Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR). -@cindex @code{\} (backslash), @code{\t} escape sequence -@cindex backslash (@code{\}), @code{\t} escape sequence +@cindex @code{\} (backslash) @subentry @code{\t} escape sequence +@cindex backslash (@code{\}) @subentry @code{\t} escape sequence @item \t Horizontal TAB, @kbd{Ctrl-i}, ASCII code 9 (HT). @c @cindex @command{awk} language, V.4 version -@cindex @code{\} (backslash), @code{\v} escape sequence -@cindex backslash (@code{\}), @code{\v} escape sequence +@cindex @code{\} (backslash) @subentry @code{\v} escape sequence +@cindex backslash (@code{\}) @subentry @code{\v} escape sequence @item \v Vertical TAB, @kbd{Ctrl-k}, ASCII code 11 (VT). -@cindex @code{\} (backslash), @code{\}@var{nnn} escape sequence -@cindex backslash (@code{\}), @code{\}@var{nnn} escape sequence +@cindex @code{\} (backslash) @subentry @code{\}@var{nnn} escape sequence +@cindex backslash (@code{\}) @subentry @code{\}@var{nnn} escape sequence @item \@var{nnn} The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits between @samp{0} and @samp{7}. For example, the code for the ASCII ESC @@ -5283,10 +5302,10 @@ between @samp{0} and @samp{7}. For example, the code for the ASCII ESC @c @cindex @command{awk} language, V.4 version @c @cindex @command{awk} language, POSIX version -@cindex @code{\} (backslash), @code{\x} escape sequence -@cindex backslash (@code{\}), @code{\x} escape sequence -@cindex common extensions, @code{\x} escape sequence -@cindex extensions, common@comma{} @code{\x} escape sequence +@cindex @code{\} (backslash) @subentry @code{\x} escape sequence +@cindex backslash (@code{\}) @subentry @code{\x} escape sequence +@cindex common extensions @subentry @code{\x} escape sequence +@cindex extensions @subentry common @subentry @code{\x} escape sequence @item \x@var{hh}@dots{} The hexadecimal value @var{hh}, where @var{hh} stands for a sequence of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F} @@ -5307,8 +5326,8 @@ As of @value{PVERSION} 4.2, only two digits are processed. @end quotation -@cindex @code{\} (backslash), @code{\/} escape sequence -@cindex backslash (@code{\}), @code{\/} escape sequence +@cindex @code{\} (backslash) @subentry @code{\/} escape sequence +@cindex backslash (@code{\}) @subentry @code{\/} escape sequence @item \/ A literal slash (necessary for regexp constants only). This sequence is used when you want to write a regexp @@ -5319,8 +5338,8 @@ Because the regexp is delimited by slashes, you need to escape any slash that is part of the pattern, in order to tell @command{awk} to keep processing the rest of the regexp. -@cindex @code{\} (backslash), @code{\"} escape sequence -@cindex backslash (@code{\}), @code{\"} escape sequence +@cindex @code{\} (backslash) @subentry @code{\"} escape sequence +@cindex backslash (@code{\}) @subentry @code{\"} escape sequence @item \" A literal double quote (necessary for string constants only). This sequence is used when you want to write a string @@ -5342,20 +5361,20 @@ means that the next character should be taken literally, even if it would normally be a regexp operator. For example, @code{/a\+b/} matches the three characters @samp{a+b}. -@cindex backslash (@code{\}), in escape sequences -@cindex @code{\} (backslash), in escape sequences +@cindex backslash (@code{\}) @subentry in escape sequences +@cindex @code{\} (backslash) @subentry in escape sequences @cindex portability For complete portability, do not use a backslash before any character not shown in the previous list or that is not an operator. @c 11/2014: Moved so as to not stack sidebars @sidebar Backslash Before Regular Characters -@cindex portability, backslash in escape sequences -@cindex POSIX @command{awk}, backslashes in string constants -@cindex backslash (@code{\}), in escape sequences, POSIX and -@cindex @code{\} (backslash), in escape sequences, POSIX and +@cindex portability @subentry backslash in escape sequences +@cindex POSIX @command{awk} @subentry backslashes in string constants +@cindex backslash (@code{\}) @subentry in escape sequences @subentry POSIX and +@cindex @code{\} (backslash) @subentry in escape sequences @subentry POSIX and -@cindex troubleshooting, backslash before nonspecial character +@cindex troubleshooting @subentry backslash before nonspecial character If you place a backslash in a string constant before something that is not one of the characters previously listed, POSIX @command{awk} purposely leaves what happens as undefined. There are two choices: @@ -5374,8 +5393,9 @@ surrounded by whitespace as the field separator. There should be two backslashes in the string: @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) @c I did this! This is why I added the warning. -@cindex @command{gawk}, escape sequences -@cindex Unix @command{awk}, backslashes in escape sequences +@cindex @command{gawk} @subentry escape sequences +@cindex @command{gawk} @subentry escape sequences @seealso{backslash} +@cindex Unix @command{awk} @subentry backslashes in escape sequences @cindex @command{mawk} utility @item Leave the backslash alone Some other @command{awk} implementations do this. @@ -5404,7 +5424,7 @@ literally. @end itemize @sidebar Escape Sequences for Metacharacters -@cindex metacharacters, escape sequences for +@cindex metacharacters @subentry escape sequences for Suppose you use an octal or hexadecimal escape to represent a regexp metacharacter. @@ -5412,7 +5432,7 @@ escape to represent a regexp metacharacter. Does @command{awk} treat the character as a literal character or as a regexp operator? -@cindex dark corner, escape sequences, for metacharacters +@cindex dark corner @subentry escape sequences @subentry for metacharacters Historically, such characters were taken literally. @value{DARKCORNER} However, the POSIX standard indicates that they should be treated @@ -5425,8 +5445,8 @@ escape sequences literally when used in regexp constants. Thus, @node Regexp Operators @section Regular Expression Operators -@cindex regular expressions, operators -@cindex metacharacters in regular expressions +@cindex regular expressions @subentry operators +@cindex metacharacters @subentry in regular expressions You can combine regular expressions with special characters, called @dfn{regular expression operators} or @dfn{metacharacters}, to @@ -5454,17 +5474,17 @@ sequences and that are not listed here stand for themselves: @c Use @asis so the docbook comes out ok. Sigh. @table @asis -@cindex backslash (@code{\}), regexp operator -@cindex @code{\} (backslash), regexp operator +@cindex backslash (@code{\}) @subentry regexp operator +@cindex @code{\} (backslash) @subentry regexp operator @item @code{\} This suppresses the special meaning of a character when matching. For example, @samp{\$} matches the character @samp{$}. -@cindex regular expressions, anchors in -@cindex Texinfo, chapter beginnings in files -@cindex @code{^} (caret), regexp operator -@cindex caret (@code{^}), regexp operator +@cindex regular expressions @subentry anchors in +@cindex Texinfo @subentry chapter beginnings in files +@cindex @code{^} (caret) @subentry regexp operator +@cindex caret (@code{^}) @subentry regexp operator @item @code{^} This matches the beginning of a string. @samp{^@@chapter} matches @samp{@@chapter} at the beginning of a string, @@ -5481,8 +5501,8 @@ The condition is not true in the following example: if ("line1\nLINE 2" ~ /^L/) @dots{} @end example -@cindex @code{$} (dollar sign), regexp operator -@cindex dollar sign (@code{$}), regexp operator +@cindex @code{$} (dollar sign) @subentry regexp operator +@cindex dollar sign (@code{$}) @subentry regexp operator @item @code{$} This is similar to @samp{^}, but it matches only at the end of a string. For example, @samp{p$} @@ -5506,7 +5526,8 @@ concatenation, we can make a regular expression such as @samp{U.A}, which matches any three-character sequence that begins with @samp{U} and ends with @samp{A}. -@cindex POSIX @command{awk}, period (@code{.})@comma{} using +@cindex POSIX mode +@cindex POSIX @command{awk} @subentry period (@code{.}), using In strict POSIX mode (@pxref{Options}), @samp{.} does not match the @sc{nul} character, which is a character with all bits equal to zero. @@ -5516,9 +5537,9 @@ may not be able to match the @sc{nul} character. @cindex @code{[]} (square brackets), regexp operator @cindex square brackets (@code{[]}), regexp operator @cindex bracket expressions -@cindex character sets, See Also bracket expressions -@cindex character lists, See bracket expressions -@cindex character classes, See bracket expressions +@cindex character sets (in regular expressions) @seeentry{bracket expressions} +@cindex character lists @seeentry{bracket expressions} +@cindex character classes @seeentry{bracket expressions} @item @code{[}@dots{}@code{]} This is called a @dfn{bracket expression}.@footnote{In other literature, you may see a bracket expression referred to as either a @@ -5530,7 +5551,7 @@ discussion of what can be inside the square brackets of a bracket expression is given in @ref{Bracket Expressions}. -@cindex bracket expressions, complemented +@cindex bracket expressions @subentry complemented @item @code{[^}@dots{}@code{]} This is a @dfn{complemented bracket expression}. The first character after the @samp{[} @emph{must} be a @samp{^}. It matches any characters @@ -5550,8 +5571,8 @@ a lowercase English vowel. The alternation applies to the largest possible regexps on either side. -@cindex @code{()} (parentheses), regexp operator -@cindex parentheses @code{()}, regexp operator +@cindex @code{()} (parentheses) @subentry regexp operator +@cindex parentheses @code{()} @subentry regexp operator @item @code{(}@dots{}@code{)} Parentheses are used for grouping in regular expressions, as in arithmetic. They can be used to concatenate regular expressions @@ -5566,8 +5587,8 @@ one literally, precede it with a backslash. However, the right or closing parenthesis is only special when paired with a left parenthesis; an unpaired right parenthesis is (silently) treated as a regular character. -@cindex @code{*} (asterisk), @code{*} operator, as regexp operator -@cindex asterisk (@code{*}), @code{*} operator, as regexp operator +@cindex @code{*} (asterisk) @subentry @code{*} operator @subentry as regexp operator +@cindex asterisk (@code{*}) @subentry @code{*} operator @subentry as regexp operator @item @code{*} This symbol means that the preceding regular expression should be repeated as many times as necessary to find a match. For example, @samp{ph*} @@ -5585,21 +5606,23 @@ Second, @samp{*} finds as many repetitions as possible. If the text to be matched is @samp{phhhhhhhhhhhhhhooey}, @samp{ph*} matches all of the @samp{h}s. -@cindex @code{+} (plus sign), regexp operator -@cindex plus sign (@code{+}), regexp operator +@cindex @code{+} (plus sign) @subentry regexp operator +@cindex plus sign (@code{+}) @subentry regexp operator @item @code{+} This symbol is similar to @samp{*}, except that the preceding expression must be matched at least once. This means that @samp{wh+y} would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas @samp{wh*y} would match all three. -@cindex @code{?} (question mark), regexp operator -@cindex question mark (@code{?}), regexp operator +@cindex @code{?} (question mark) @subentry regexp operator +@cindex question mark (@code{?}) @subentry regexp operator @item @code{?} This symbol is similar to @samp{*}, except that the preceding expression can be matched either once or not at all. For example, @samp{fe?d} matches @samp{fed} and @samp{fd}, but nothing else. +@cindex @code{@{@}} (braces) @subentry regexp operator +@cindex braces (@code{@{@}}) @subentry regexp operator @cindex interval expressions, regexp operator @item @code{@{}@var{n}@code{@}} @itemx @code{@{}@var{n}@code{,@}} @@ -5624,16 +5647,16 @@ Matches @samp{whhy}, @samp{whhhy}, and so on. @end table @end table -@cindex precedence, regexp operators -@cindex regular expressions, operators, precedence of +@cindex precedence @subentry regexp operators +@cindex regular expressions @subentry operators @subentry precedence of In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators, as well as the braces @samp{@{} and @samp{@}}, have the highest precedence, followed by concatenation, and finally by @samp{|}. As in arithmetic, parentheses can change how operators are grouped. -@cindex POSIX @command{awk}, regular expressions and -@cindex @command{gawk}, regular expressions, precedence +@cindex POSIX @command{awk} @subentry regular expressions and +@cindex @command{gawk} @subentry regular expressions @subentry precedence In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and @samp{?} operators stand for themselves when there is nothing in the regexp that precedes them. For example, @code{/+/} matches a literal @@ -5643,12 +5666,12 @@ usage as a syntax error. @node Interval Expressions @subsection Some Notes On Interval Expressions -@cindex POSIX @command{awk}, interval expressions in +@cindex POSIX @command{awk} @subentry interval expressions in Interval expressions were not traditionally available in @command{awk}. They were added as part of the POSIX standard to make @command{awk} and @command{egrep} consistent with each other. -@cindex @command{gawk}, interval expressions and +@cindex @command{gawk} @subentry interval expressions and Initially, because old programs may use @samp{@{} and @samp{@}} in regexp constants, @command{gawk} did @emph{not} match interval expressions @@ -5681,9 +5704,9 @@ when in compatibility mode (@pxref{Options}). @node Bracket Expressions @section Using Bracket Expressions @cindex bracket expressions -@cindex bracket expressions, range expressions +@cindex bracket expressions @subentry range expressions @cindex range expressions (regexps) -@cindex character lists in regular expressions +@cindex bracket expressions @subentry character lists As mentioned earlier, a bracket expression matches any character among those listed between the opening and closing square brackets. @@ -5705,12 +5728,12 @@ the range 0--256). To match a range of characters where the endpoints of the range are larger than 256, enter the multibyte encodings of the characters directly. -@cindex @code{\} (backslash), in bracket expressions -@cindex backslash (@code{\}), in bracket expressions -@cindex @code{^} (caret), in bracket expressions -@cindex caret (@code{^}), in bracket expressions -@cindex @code{-} (hyphen), in bracket expressions -@cindex hyphen (@code{-}), in bracket expressions +@cindex @code{\} (backslash) @subentry in bracket expressions +@cindex backslash (@code{\}) @subentry in bracket expressions +@cindex @code{^} (caret) @subentry in bracket expressions +@cindex caret (@code{^}) @subentry in bracket expressions +@cindex @code{-} (hyphen) @subentry in bracket expressions +@cindex hyphen (@code{-}) @subentry in bracket expressions To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a bracket expression, put a @samp{\} in front of it. For example: @@ -5724,7 +5747,7 @@ Additionally, if you place @samp{]} right after the opening @samp{[}, the closing bracket is treated as one of the characters to be matched. -@cindex POSIX @command{awk}, bracket expressions and +@cindex POSIX @command{awk} @subentry bracket expressions and @cindex Extended Regular Expressions (EREs) @cindex EREs (Extended Regular Expressions) @cindex @command{egrep} utility @@ -5736,8 +5759,8 @@ of the POSIX specification for Extended Regular Expressions (EREs). POSIX EREs are based on the regular expressions accepted by the traditional @command{egrep} utility. -@cindex bracket expressions, character classes -@cindex POSIX @command{awk}, bracket expressions and, character classes +@cindex bracket expressions @subentry character classes +@cindex POSIX @command{awk} @subentry bracket expressions and @subentry character classes @dfn{Character classes} are a feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but the @@ -5805,6 +5828,7 @@ depends on various factors out of our control. @c Thanks to @c Date: Tue, 01 Jul 2014 07:39:51 +0200 @c From: Hermann Peifer <peifer@gmx.eu> +@cindex ASCII Some utilities that match regular expressions provide a nonstandard @samp{[:ascii:]} character class; @command{awk} does not. However, you can simulate such a construct using @samp{[\x00-\x7F]}. This matches @@ -5819,8 +5843,8 @@ treat @code{[:blank:]} like @code{[:space:]}, incorrectly matching more characters than they should. Caveat Emptor. @end quotation -@cindex bracket expressions, collating elements -@cindex bracket expressions, non-ASCII +@cindex bracket expressions @subentry collating elements +@cindex bracket expressions @subentry non-ASCII @cindex collating elements Two additional special sequences can appear in bracket expressions. These apply to non-ASCII character sets, which can have single symbols @@ -5831,7 +5855,7 @@ and a grave-accented ``@`e'' are equivalent.) These sequences are: @table @asis -@cindex bracket expressions, collating symbols +@cindex bracket expressions @subentry collating symbols @cindex collating symbols @item Collating symbols Multicharacter collating elements enclosed between @@ -5839,7 +5863,7 @@ Multicharacter collating elements enclosed between then @samp{[[.ch.]]} is a regexp that matches this collating element, whereas @samp{[ch]} is a regexp that matches either @samp{c} or @samp{h}. -@cindex bracket expressions, equivalence classes +@cindex bracket expressions @subentry equivalence classes @item Equivalence classes Locale-specific names for a list of characters that are equal. The name is enclosed between @@ -5851,9 +5875,9 @@ that matches any of @samp{e}, @samp{@^e}, @samp{@'e}, or @samp{@`e}. These features are very valuable in non-English-speaking locales. -@cindex internationalization, localization, character classes -@cindex @command{gawk}, character classes and -@cindex POSIX @command{awk}, bracket expressions and, character classes +@cindex internationalization @subentry localization @subentry character classes +@cindex @command{gawk} @subentry character classes and +@cindex POSIX @command{awk} @subentry bracket expressions and @subentry character classes @quotation CAUTION The library functions that @command{gawk} uses for regular expression matching currently recognize only POSIX character classes; @@ -5868,7 +5892,7 @@ taken literally. This is also true of @samp{.} and @samp{*}. @node Leftmost Longest @section How Much Text Matches? -@cindex regular expressions, leftmost longest match +@cindex regular expressions @subentry leftmost longest match @c @cindex matching, leftmost longest Consider the following: @@ -5907,12 +5931,12 @@ and also @pxref{Field Separators}). @node Computed Regexps @section Using Dynamic Regexps -@cindex regular expressions, computed -@cindex regular expressions, dynamic +@cindex regular expressions @subentry computed +@cindex regular expressions @subentry dynamic @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator @c @cindex operators, @code{~} @c @cindex operators, @code{!~} The righthand side of a @samp{~} or @samp{!~} operator need not be a @@ -5943,11 +5967,11 @@ on the right. This is true of any string-valued expression (such as @code{digits_regexp}, shown in the previous example), not just string constants. @end quotation -@cindex regexp constants, slashes vs.@: quotes -@cindex @code{\} (backslash), in regexp constants -@cindex backslash (@code{\}), in regexp constants -@cindex @code{"} (double quote), in regexp constants -@cindex double quote (@code{"}), in regexp constants +@cindex regexp constants @subentry slashes vs.@: quotes +@cindex @code{\} (backslash) @subentry in regexp constants +@cindex backslash (@code{\}) @subentry in regexp constants +@cindex @code{"} (double quote) @subentry in regexp constants +@cindex double quote (@code{"}) @subentry in regexp constants What difference does it make if the string is scanned twice? The answer has to do with escape sequences, and particularly with backslashes. To get a backslash into a regular expression inside a @@ -5959,9 +5983,9 @@ you have to type @code{"\\*"}. The first backslash escapes the second one so that the string actually contains the two characters @samp{\} and @samp{*}. -@cindex troubleshooting, regexp constants vs.@: string constants -@cindex regexp constants, vs.@: string constants -@cindex string constants, vs.@: regexp constants +@cindex troubleshooting @subentry regexp constants vs.@: string constants +@cindex regexp constants @subentry vs.@: string constants +@cindex string @subentry constants @subentry vs.@: regexp constants Given that you can use both regexp and string constants to describe regular expressions, which should you use? The answer is ``regexp constants,'' for several reasons: @@ -5986,8 +6010,8 @@ intend a regexp match. @end itemize @sidebar Using @code{\n} in Bracket Expressions of Dynamic Regexps -@cindex regular expressions, dynamic, with embedded newlines -@cindex newlines, in dynamic regexps +@cindex regular expressions @subentry dynamic @subentry with embedded newlines +@cindex newlines @subentry in dynamic regexps Some older versions of @command{awk} do not allow the newline character to be used inside a bracket expression for a dynamic regexp: @@ -6001,7 +6025,7 @@ $ @kbd{awk '$0 ~ "[ \t\n]"'} @error{} $0 ~ "[ >>> \t\n]" <<< @end example -@cindex newlines, in regexp constants +@cindex newlines @subentry in regexp constants But a newline in a regexp constant works with no problem: @example @@ -6020,10 +6044,10 @@ occur often in practice, but it's worth noting for future reference. @c This section adapted (long ago) from the regex-0.12 manual -@cindex regular expressions, operators, @command{gawk} -@cindex @command{gawk}, regular expressions, operators -@cindex operators, GNU-specific -@cindex regular expressions, operators, for words +@cindex regular expressions @subentry operators @subentry @command{gawk} +@cindex @command{gawk} @subentry regular expressions @subentry operators +@cindex operators @subentry GNU-specific +@cindex regular expressions @subentry operators @subentry for words @cindex word, regexp definition of GNU software that deals with regular expressions provides a number of additional regexp operators. These operators are described in this @@ -6035,64 +6059,64 @@ or underscores (@samp{_}): @table @code @c @cindex operators, @code{\s} (@command{gawk}) -@cindex backslash (@code{\}), @code{\s} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\s} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\s} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\s} operator (@command{gawk}) @item \s Matches any whitespace character. Think of it as shorthand for @w{@samp{[[:space:]]}}. @c @cindex operators, @code{\S} (@command{gawk}) -@cindex backslash (@code{\}), @code{\S} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\S} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\S} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\S} operator (@command{gawk}) @item \S Matches any character that is not whitespace. Think of it as shorthand for @w{@samp{[^[:space:]]}}. @c @cindex operators, @code{\w} (@command{gawk}) -@cindex backslash (@code{\}), @code{\w} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\w} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\w} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\w} operator (@command{gawk}) @item \w Matches any word-constituent character---that is, it matches any letter, digit, or underscore. Think of it as shorthand for @w{@samp{[[:alnum:]_]}}. @c @cindex operators, @code{\W} (@command{gawk}) -@cindex backslash (@code{\}), @code{\W} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\W} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\W} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\W} operator (@command{gawk}) @item \W Matches any character that is not word-constituent. Think of it as shorthand for @w{@samp{[^[:alnum:]_]}}. @c @cindex operators, @code{\<} (@command{gawk}) -@cindex backslash (@code{\}), @code{\<} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\<} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\<} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\<} operator (@command{gawk}) @item \< Matches the empty string at the beginning of a word. For example, @code{/\<away/} matches @samp{away} but not @samp{stowaway}. @c @cindex operators, @code{\>} (@command{gawk}) -@cindex backslash (@code{\}), @code{\>} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\>} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\>} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\>} operator (@command{gawk}) @item \> Matches the empty string at the end of a word. For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}. @c @cindex operators, @code{\y} (@command{gawk}) -@cindex backslash (@code{\}), @code{\y} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\y} operator (@command{gawk}) -@cindex word boundaries@comma{} matching +@cindex backslash (@code{\}) @subentry @code{\y} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\y} operator (@command{gawk}) +@cindex word boundaries, matching @item \y Matches the empty string at either the beginning or the end of a word (i.e., the word boundar@strong{y}). For example, @samp{\yballs?\y} matches either @samp{ball} or @samp{balls}, as a separate word. @c @cindex operators, @code{\B} (@command{gawk}) -@cindex backslash (@code{\}), @code{\B} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\B} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\B} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\B} operator (@command{gawk}) @item \B Matches the empty string that occurs between two word-constituent characters. For example, @@ -6100,9 +6124,9 @@ word-constituent characters. For example, @samp{\B} is essentially the opposite of @samp{\y}. @end table -@cindex buffers, operators for -@cindex regular expressions, operators, for buffers -@cindex operators, string-matching, for buffers +@cindex buffers @subentry operators for +@cindex regular expressions @subentry operators @subentry for buffers +@cindex operators @subentry string-matching @subentry for buffers There are two other operators that work on buffers. In Emacs, a @dfn{buffer} is, naturally, an Emacs buffer. Other GNU programs, including @command{gawk}, @@ -6112,31 +6136,31 @@ The operators are: @table @code @item \` @c @cindex operators, @code{\`} (@command{gawk}) -@cindex backslash (@code{\}), @code{\`} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\`} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\`} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\`} operator (@command{gawk}) Matches the empty string at the beginning of a buffer (string) @c @cindex operators, @code{\'} (@command{gawk}) -@cindex backslash (@code{\}), @code{\'} operator (@command{gawk}) -@cindex @code{\} (backslash), @code{\'} operator (@command{gawk}) +@cindex backslash (@code{\}) @subentry @code{\'} operator (@command{gawk}) +@cindex @code{\} (backslash) @subentry @code{\'} operator (@command{gawk}) @item \' Matches the empty string at the end of a buffer (string) @end table -@cindex @code{^} (caret), regexp operator -@cindex caret (@code{^}), regexp operator -@cindex @code{?} (question mark), regexp operator -@cindex question mark (@code{?}), regexp operator +@cindex @code{^} (caret) @subentry regexp operator +@cindex caret (@code{^}) @subentry regexp operator +@cindex @code{?} (question mark) @subentry regexp operator +@cindex question mark (@code{?}) @subentry regexp operator Because @samp{^} and @samp{$} always work in terms of the beginning and end of strings, these operators don't add any new capabilities for @command{awk}. They are provided for compatibility with other GNU software. -@cindex @command{gawk}, word-boundary operator +@cindex @command{gawk} @subentry word-boundary operator @cindex word-boundary operator (@command{gawk}) -@cindex operators, word-boundary (@command{gawk}) +@cindex operators @subentry word-boundary (@command{gawk}) In other GNU software, the word-boundary operator is @samp{\b}. However, that conflicts with the @command{awk} language's definition of @samp{\b} as backspace, so @command{gawk} uses a different letter. @@ -6145,8 +6169,8 @@ GNU operators, but this was deemed too confusing. The current method of using @samp{\y} for the GNU @samp{\b} appears to be the lesser of two evils. -@cindex regular expressions, @command{gawk}, command-line options -@cindex @command{gawk}, command-line options, and regular expressions +@cindex regular expressions @subentry @command{gawk}, command-line options +@cindex @command{gawk} @subentry command-line options, regular expressions and The various command-line options (@pxref{Options}) control how @command{gawk} interprets characters in regexps: @@ -6189,8 +6213,8 @@ Otherwise, interval expressions are available by default. @node Case-sensitivity @section Case Sensitivity in Matching -@cindex regular expressions, case sensitivity -@cindex case sensitivity, regexps and +@cindex regular expressions @subentry case sensitivity +@cindex case sensitivity @subentry regexps and Case is normally significant in regular expressions, both when matching ordinary characters (i.e., not metacharacters) and inside bracket expressions. Thus, a @samp{w} in a regular expression matches only a lowercase @@ -6216,15 +6240,15 @@ tolower($1) ~ /foo/ @{ @dots{} @} converts the first field to lowercase before matching against it. This works in any POSIX-compliant @command{awk}. -@cindex @command{gawk}, regular expressions, case sensitivity -@cindex case sensitivity, @command{gawk} -@cindex differences in @command{awk} and @command{gawk}, regular expressions +@cindex @command{gawk} @subentry regular expressions @subentry case sensitivity +@cindex case sensitivity @subentry @command{gawk} +@cindex differences in @command{awk} and @command{gawk} @subentry regular expressions @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator -@cindex @code{IGNORECASE} variable, with @code{~} and @code{!~} operators -@cindex @command{gawk}, @code{IGNORECASE} variable in +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator +@cindex @code{IGNORECASE} variable @subentry with @code{~} and @code{!~} operators +@cindex @command{gawk} @subentry @code{IGNORECASE} variable in @c @cindex variables, @code{IGNORECASE} Another method, specific to @command{gawk}, is to set the variable @code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}). @@ -6329,7 +6353,7 @@ versions, use @code{tolower()} or @code{toupper()}. @chapter Reading Input Files @cindex reading input files -@cindex input files, reading +@cindex input files @subentry reading @cindex input files @cindex @code{FILENAME} variable In the typical @command{awk} program, @@ -6381,8 +6405,8 @@ used with it do not have to be named on the @command{awk} command line @node Records @section How Input Is Split into Records -@cindex input, splitting into records -@cindex records, splitting input into +@cindex input @subentry splitting into records +@cindex records @subentry splitting input into @cindex @code{NR} variable @cindex @code{FNR} variable @command{awk} divides the input for your program into records and fields. @@ -6407,7 +6431,7 @@ This mechanism is explained in greater detail shortly. @node awk split records @subsection Record Splitting with Standard @command{awk} -@cindex separators, for records +@cindex separators @subentry for records @cindex record separators Records are separated by a character called the @dfn{record separator}. By default, the record separator is the newline character. @@ -6415,8 +6439,8 @@ This is why records are, by default, single lines. To use a different character for the record separator, simply assign that character to the predefined variable @code{RS}. -@cindex record separators, newlines as -@cindex newlines, as record separators +@cindex record separators @subentry newlines as +@cindex newlines @subentry as record separators @cindex @code{RS} variable Like any other variable, the value of @code{RS} can be changed in the @command{awk} program @@ -6508,8 +6532,8 @@ the newline separating them in the output is the original newline in the @value{DF}, not the one added by @command{awk} when it printed the record! -@cindex record separators, changing -@cindex separators, for records +@cindex record separators @subentry changing +@cindex separators @subentry for records Another way to change the record separator is on the command line, using the variable-assignment feature (@pxref{Other Arguments}): @@ -6550,14 +6574,14 @@ variable @code{NF} is the number of fields in the current record. printing @samp{0} as the result. Most other versions of @command{awk} also act this way.) -@cindex dark corner, input files +@cindex dark corner @subentry input files Reaching the end of an input file terminates the current input record, even if the last character in the file is not the character in @code{RS}. @value{DARKCORNER} -@cindex empty strings +@cindex empty strings @seeentry{null strings} @cindex null strings -@cindex strings, empty, See null strings +@cindex strings @subentry empty @seeentry{null strings} The empty string @code{""} (a string without any characters) has a special meaning as the value of @code{RS}. It means that records are separated @@ -6569,15 +6593,15 @@ the new value is used to delimit subsequent records, but the record currently being processed, as well as records already processed, are not affected. -@cindex @command{gawk}, @code{RT} variable in +@cindex @command{gawk} @subentry @code{RT} variable in @cindex @code{RT} variable -@cindex records, terminating +@cindex records @subentry terminating @cindex terminating records -@cindex differences in @command{awk} and @command{gawk}, record separators -@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables -@cindex regular expressions, as record separators -@cindex record separators, regular expressions as -@cindex separators, for records, regular expressions as +@cindex differences in @command{awk} and @command{gawk} @subentry record separators +@cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables +@cindex regular expressions @subentry as record separators +@cindex record separators @subentry regular expressions as +@cindex separators @subentry for records @subentry regular expressions as After the end of the record has been determined, @command{gawk} sets the variable @code{RT} to the text in the input that matched @code{RS}. @@ -6585,8 +6609,8 @@ sets the variable @code{RT} to the text in the input that matched @node gawk split records @subsection Record Splitting with @command{gawk} -@cindex common extensions, @code{RS} as a regexp -@cindex extensions, common@comma{} @code{RS} as a regexp +@cindex common extensions @subentry @code{RS} as a regexp +@cindex extensions @subentry common @subentry @code{RS} as a regexp When using @command{gawk}, the value of @code{RS} is not limited to a one-character string. If it contains more than one character, it is treated as a regular expression @@ -6650,9 +6674,9 @@ that happens to contain newline characters. It is thus best to avoid anchor metacharacters in the value of @code{RS}. @end quotation -@cindex @command{gawk}, @code{RT} variable in +@cindex @command{gawk} @subentry @code{RT} variable in @cindex @code{RT} variable -@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables +@cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables The use of @code{RS} as a regular expression and the @code{RT} variable are @command{gawk} extensions; they are not available in compatibility mode @@ -6661,7 +6685,7 @@ In compatibility mode, only the first character of the value of @code{RS} determines the end of the record. @sidebar @code{RS = "\0"} Is Not Portable -@cindex portability, data files as single record +@cindex portability @subentry data files as single record There are times when you might want to treat an entire @value{DF} as a single record. The only way to make this happen is to give @code{RS} a value that you know doesn't occur in the input file. This is hard @@ -6676,7 +6700,7 @@ value to use for @code{RS} in this case: BEGIN @{ RS = "\0" @} # whole file becomes one record? @end example -@cindex differences in @command{awk} and @command{gawk}, strings, storing +@cindex differences in @command{awk} and @command{gawk} @subentry strings @subentry storing @command{gawk} in fact accepts this, and uses the @sc{nul} character for the record separator. This works for certain special files, such as @file{/proc/environ} on @@ -6684,7 +6708,7 @@ GNU/Linux systems, where the @sc{nul} character is in fact the record separator. However, this usage is @emph{not} portable to most other @command{awk} implementations. -@cindex dark corner, strings, storing +@cindex dark corner @subentry strings, storing Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the @sc{nul} character as the string terminator. In effect, this means that @@ -6696,7 +6720,7 @@ character as a record separator. However, this is a special case: @command{mawk} does not allow embedded @sc{nul} characters in strings. (This may change in a future version of @command{mawk}.) -@cindex records, treating files as +@cindex records @subentry treating files as @cindex treating files, as single records @cindex single records, treating files as @xref{Readfile Function} for an interesting way to read @@ -6710,7 +6734,7 @@ Readfile} for another option. @cindex examining fields @cindex fields @cindex accessing fields -@cindex fields, examining +@cindex fields @subentry examining When @command{awk} reads an input record, the record is automatically @dfn{parsed} or separated by the @command{awk} utility into chunks called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, @@ -6727,9 +6751,9 @@ operate on the whole record if you want---but fields are what make simple @command{awk} programs so powerful. @cindex field operator @code{$} -@cindex @code{$} (dollar sign), @code{$} field operator -@cindex dollar sign (@code{$}), @code{$} field operator -@cindex field operators@comma{} dollar sign as +@cindex @code{$} (dollar sign) @subentry @code{$} field operator +@cindex dollar sign (@code{$}) @subentry @code{$} field operator +@cindex field operators, dollar sign as You use a dollar sign (@samp{$}) to refer to a field in an @command{awk} program, followed by the number of the field you want. Thus, @code{$1} @@ -6750,7 +6774,7 @@ Here the first field, or @code{$1}, is @samp{This}, the second field, or field. @cindex @code{NF} variable -@cindex fields, number of +@cindex fields @subentry number of @code{NF} is a predefined variable whose value is the number of fields in the current record. @command{awk} automatically updates the value of @code{NF} each time it reads a record. No matter how many fields @@ -6789,7 +6813,7 @@ $ @kbd{awk '/li/ @{ print $1, $NF @}' mail-list} @node Nonconstant Fields @section Nonconstant Field Numbers -@cindex fields, numbers +@cindex fields @subentry numbers @cindex field numbers A field number need not be a constant. Any expression in @@ -6846,7 +6870,7 @@ evaluating @code{NF} and using its value as a field number. @node Changing Fields @section Changing the Contents of a Field -@cindex fields, changing contents of +@cindex fields @subentry changing contents of The contents of a field, as seen by @command{awk}, can be changed within an @command{awk} program; this changes what @command{awk} perceives as the current input record. (The actual input is untouched; @command{awk} @emph{never} @@ -6906,8 +6930,8 @@ $ @kbd{awk '@{ $6 = ($5 + $4 + $3 + $2)} @dots{} @end example -@cindex adding, fields -@cindex fields, adding +@cindex adding @subentry fields +@cindex fields @subentry adding @noindent We've just created @code{$6}, whose value is the sum of fields @code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign @@ -6921,8 +6945,8 @@ the appropriate number of field separators between it and the previously existing fields. @cindex @code{OFS} variable -@cindex output field separator, See @code{OFS} variable -@cindex field separators, See Also @code{OFS} +@cindex output field separator @seeentry{@code{OFS} variable} +@cindex field separator @seealso{@code{OFS}} This recomputation affects and is affected by @code{NF} (the number of fields; @pxref{Fields}). For example, the value of @code{NF} is set to the number of the highest @@ -6979,8 +7003,8 @@ The intervening field, @code{$5}, is created with an empty value (indicated by the second pair of adjacent colons), and @code{NF} is updated with the value six. -@cindex dark corner, @code{NF} variable, decrementing -@cindex @code{NF} variable, decrementing +@cindex dark corner @subentry @code{NF} variable, decrementing +@cindex @code{NF} variable @subentry decrementing Decrementing @code{NF} throws away the values of the fields after the new value of @code{NF} and recomputes @code{$0}. @value{DARKCORNER} @@ -6993,7 +7017,7 @@ $ @kbd{echo a b c d e f | awk '@{ print "NF =", NF;} @print{} a b c @end example -@cindex portability, @code{NF} variable@comma{} decrementing +@cindex portability @subentry @code{NF} variable, decrementing @quotation CAUTION Some versions of @command{awk} don't rebuild @code{$0} when @code{NF} is decremented. @@ -7055,9 +7079,9 @@ with a statement such as @samp{$1 = $1}, as described earlier. @end menu @cindex @code{FS} variable -@cindex fields, separating -@cindex field separators -@cindex fields, separating +@cindex fields @subentry separating +@cindex field separator +@cindex fields @subentry separating The @dfn{field separator}, which is either a single character or a regular expression, controls the way @command{awk} splits an input record into fields. @command{awk} scans the input record for character sequences that @@ -7076,13 +7100,13 @@ is split into three fields: @samp{m}, @samp{@bullet{}g}, and @samp{@bullet{}gai@bullet{}pan}. Note the leading spaces in the values of the second and third fields. -@cindex troubleshooting, @command{awk} uses @code{FS} not @code{IFS} +@cindex troubleshooting @subentry @command{awk} uses @code{FS} not @code{IFS} The field separator is represented by the predefined variable @code{FS}. Shell programmers take note: @command{awk} does @emph{not} use the name @code{IFS} that is used by the POSIX-compliant shells (such as the Unix Bourne shell, @command{sh}, or Bash). -@cindex @code{FS} variable, changing value of +@cindex @code{FS} variable @subentry changing value of The value of @code{FS} can be changed in the @command{awk} program with the assignment operator, @samp{=} (@pxref{Assignment Ops}). Often, the right time to do this is at the beginning of execution @@ -7109,9 +7133,9 @@ John Q. Smith, 29 Oak St., Walamazoo, MI 42139 this @command{awk} program extracts and prints the string @samp{@bullet{}29@bullet{}Oak@bullet{}St.}. -@cindex field separators, choice of -@cindex regular expressions, as field separators -@cindex field separators, regular expressions as +@cindex field separator @subentry choice of +@cindex regular expressions @subentry as field separators +@cindex field separator @subentry regular expression as Sometimes the input data contains separator characters that don't separate fields the way you thought they would. For instance, the person's name in the example we just used might have a title or @@ -7134,8 +7158,10 @@ can massage it first with a separate @command{awk} program.) @node Default Field Splitting @subsection Whitespace Normally Separates Fields -@cindex field separators, whitespace as -@cindex whitespace, as field separators +@cindex field separator @subentry whitespace as +@cindex whitespace @subentry as field separators +@cindex field separator @subentry @code{FS} variable and +@cindex separators @subentry field @subentry @code{FS} variable and Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not delimit an empty field. The default value of the field separator @code{FS} @@ -7156,8 +7182,8 @@ rules. @node Regexp Field Splitting @subsection Using Regular Expressions to Separate Fields -@cindex regular expressions, as field separators -@cindex field separators, regular expressions as +@cindex regular expressions @subentry as field separators +@cindex field separator @subentry regular expression as The previous @value{SUBSECTION} discussed the use of single characters or simple strings as the value of @code{FS}. @@ -7212,8 +7238,7 @@ $ @kbd{echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @}} @noindent @cindex null strings -@cindex strings, null -@cindex empty strings, See null strings +@cindex strings @subentry null In this case, the first field is null, or empty. The stripping of leading and trailing whitespace also comes into @@ -7234,9 +7259,9 @@ Because the leading whitespace was ignored when finding @code{$1}, it is not part of the new @code{$0}. Finally, the last @code{print} statement prints the new @code{$0}. -@cindex @code{FS}, containing @code{^} -@cindex @code{^} (caret), in @code{FS} -@cindex dark corner, @code{^}, in @code{FS} +@cindex @code{FS} variable @subentry containing @code{^} +@cindex @code{^} (caret) @subentry in @code{FS} +@cindex dark corner @subentry @code{^}, in @code{FS} There is an additional subtlety to be aware of when using regular expressions for field splitting. It is not well specified in the POSIX standard, or anywhere else, what @samp{^} @@ -7264,11 +7289,11 @@ $ @kbd{echo 'xxAA xxBxx C' |} @node Single Character Fields @subsection Making Each Character a Separate Field -@cindex common extensions, single character fields -@cindex extensions, common@comma{} single character fields -@cindex differences in @command{awk} and @command{gawk}, single-character fields +@cindex common extensions @subentry single character fields +@cindex extensions @subentry common @subentry single character fields +@cindex differences in @command{awk} and @command{gawk} @subentry single-character fields @cindex single-character fields -@cindex fields, single-character +@cindex fields @subentry single-character There are times when you may want to examine each character of a record separately. This can be done in @command{gawk} by simply assigning the null string (@code{""}) to @code{FS}. @value{COMMONEXT} @@ -7287,8 +7312,8 @@ $ @kbd{echo a b | gawk 'BEGIN @{ FS = "" @}} @print{} Field 3 is b @end example -@cindex dark corner, @code{FS} as null string -@cindex @code{FS} variable, as null string +@cindex dark corner @subentry @code{FS} as null string +@cindex @code{FS} variable @subentry null string as Traditionally, the behavior of @code{FS} equal to @code{""} was not defined. In this case, most versions of Unix @command{awk} simply treat the entire record as only having one field. @@ -7300,10 +7325,10 @@ behaves this way. @node Command Line Field Separator @subsection Setting @code{FS} from the Command Line -@cindex @option{-F} option, command-line -@cindex field separator, on command line -@cindex command line, @code{FS} on@comma{} setting -@cindex @code{FS} variable, setting from command line +@cindex @option{-F} option @subentry command-line +@cindex field separator @subentry on command line +@cindex command line @subentry @code{FS} on, setting +@cindex @code{FS} variable @subentry setting from command line @code{FS} can be set on the command line. Use the @option{-F} option to do so. For example: @@ -7329,9 +7354,9 @@ awk -F\\\\ '@dots{}' files @dots{} @end example @noindent -@cindex field separator, backslash (@code{\}) as -@cindex @code{\} (backslash), as field separator -@cindex backslash (@code{\}), as field separator +@cindex field separator @subentry backslash (@code{\}) as +@cindex @code{\} (backslash) @subentry as field separator +@cindex backslash (@code{\}) @subentry as field separator Because @samp{\} is used for quoting in the shell, @command{awk} sees @samp{-F\\}. Then @command{awk} processes the @samp{\\} for escape characters (@pxref{Escape Sequences}), finally yielding @@ -7381,7 +7406,7 @@ separator, instead of the @samp{-} in the phone number that was originally intended. This demonstrates why you have to be careful in choosing your field and record separators. -@cindex Unix @command{awk}, password files@comma{} field separators and +@cindex Unix @command{awk} @subentry password files, field separators and Perhaps the most common use of a single character as the field separator occurs when processing the Unix system password file. On many Unix systems, each user has a separate entry in the system password file, with one @@ -7391,7 +7416,7 @@ encrypted or shadow password. (A shadow password is indicated by the presence of a single @samp{x} in the second field.) A password file entry might look like this: -@cindex Robbins, Arnold +@cindex Robbins @subentry Arnold @example arnold:x:2076:10:Arnold Robbins:/home/arnold:/bin/bash @end example @@ -7420,15 +7445,15 @@ When you do this, @code{$1} is the same as @code{$0}. @sidebar Changing @code{FS} Does Not Affect the Fields -@cindex POSIX @command{awk}, field separators and -@cindex field separator, POSIX and +@cindex POSIX @command{awk} @subentry field separators and +@cindex field separator @subentry POSIX and According to the POSIX standard, @command{awk} is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of @code{FS} after a record is read, the values of the fields (i.e., how they were split) should reflect the old value of @code{FS}, not the new one. -@cindex dark corner, field separators +@cindex dark corner @subentry field separators @cindex @command{sed} utility @cindex stream editors However, many older implementations of @command{awk} do not work this way. Instead, @@ -7528,7 +7553,7 @@ will take effect. @cindex data, fixed-width @cindex fixed-width data -@cindex advanced features, fixed-width data +@cindex advanced features @subentry fixed-width data @c O'Reilly doesn't like it as a note the first thing in the section. This @value{SECTION} discusses an advanced @@ -7561,10 +7586,10 @@ on @code{FS} does not work well in this case. Although a portable @code{$0} (@pxref{String Functions}), this is awkward and inefficient for a large number of fields. -@cindex troubleshooting, fatal errors, field widths@comma{} specifying +@cindex troubleshooting @subentry fatal errors @subentry field widths, specifying @cindex @command{w} utility @cindex @code{FIELDWIDTHS} variable -@cindex @command{gawk}, @code{FIELDWIDTHS} variable in +@cindex @command{gawk} @subentry @code{FIELDWIDTHS} variable in The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in variable @code{FIELDWIDTHS}. Each number specifies the width of the @@ -7734,7 +7759,7 @@ This @value{SECTION} discusses an advanced feature of @command{gawk}. If you are a novice @command{awk} user, you might want to skip it on the first reading. -@cindex advanced features, specifying field content +@cindex advanced features @subentry specifying field content Normally, when using @code{FS}, @command{gawk} defines the fields as the parts of the record that occur in between each field separator. In other words, @code{FS} defines what a field @emph{is not}, instead of what a field @@ -7760,7 +7785,7 @@ Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA @c endfile @end example -@cindex @command{gawk}, @code{FPAT} variable in +@cindex @command{gawk} @subentry @code{FPAT} variable in @cindex @code{FPAT} variable The @code{FPAT} variable offers a solution for cases like this. The value of @code{FPAT} should be a string that provides a regular expression. @@ -7858,7 +7883,7 @@ available for splitting regular strings (@pxref{String Functions}). @node Testing field creation @section Checking How @command{gawk} Is Splitting Records -@cindex @command{gawk}, splitting fields and +@cindex @command{gawk} @subentry splitting fields and As we've seen, @command{gawk} provides three independent methods to split input records into fields. The mechanism used is based on which of the three variables---@code{FS}, @code{FIELDWIDTHS}, or @code{FPAT}---was @@ -7897,15 +7922,15 @@ example of such a function). @section Multiple-Line Records @cindex multiple-line records -@cindex records, multiline -@cindex input, multiline records -@cindex files, reading, multiline records -@cindex input, files, See input files +@cindex records @subentry multiline +@cindex input @subentry multiline records +@cindex files @subentry reading @subentry multiline records +@cindex input, files @seeentry{input files} In some databases, a single line cannot conveniently hold all the information in one entry. In such cases, you can use multiline records. The first step in doing this is to choose your data format. -@cindex record separators, with multiline records +@cindex record separators @subentry with multiline records One technique is to use an unusual character or string to separate records. For example, you could use the formfeed character (written @samp{\f} in @command{awk}, as in C) to separate them, making each record @@ -7914,7 +7939,7 @@ a page of the file. To do this, just set the variable @code{RS} to other character could equally well be used, as long as it won't be part of the data in a record. -@cindex @code{RS} variable, multiline records and +@cindex @code{RS} variable @subentry multiline records and Another technique is to have blank lines separate records. By a special dispensation, an empty string as the value of @code{RS} indicates that records are separated by one or more blank lines. When @code{RS} is set @@ -7926,7 +7951,7 @@ all act as one record separator. whitespace do not count.) @cindex leftmost longest match -@cindex matching, leftmost longest +@cindex matching @subentry leftmost longest You can achieve the same effect as @samp{RS = ""} by assigning the string @code{"\n\n+"} to @code{RS}. This regexp matches the newline at the end of the record and one or more blank lines after the record. @@ -7937,7 +7962,7 @@ So, the next record doesn't start until the first nonblank line that follows---no matter how many blank lines appear in a row, they are considered one record separator. -@cindex dark corner, multiline records +@cindex dark corner @subentry multiline records However, there is an important difference between @samp{RS = ""} and @samp{RS = "\n\n+"}. In the first case, leading newlines in the input @value{DF} are ignored, and if a file ends without extra blank lines @@ -7945,8 +7970,8 @@ after the last record, the final newline is removed from the record. In the second case, this special processing is not done. @value{DARKCORNER} -@cindex field separator, in multiline records -@cindex @code{FS}, in multiline records +@cindex field separator @subentry in multiline records +@cindex @code{FS} variable @subentry in multiline records Now that the input is separated into records, the second step is to separate the fields in the records. One way to do this is to divide each of the lines into fields in the normal manner. This happens by default @@ -8069,9 +8094,9 @@ Leading and trailing matches of @var{regexp} delimit empty records. POSIX standard.) @end table -@cindex @command{gawk}, @code{RT} variable in +@cindex @command{gawk} @subentry @code{RT} variable in @cindex @code{RT} variable -@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables +@cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables If not in compatibility mode (@pxref{Options}), @command{gawk} sets @code{RT} to the input text that matched the value specified by @code{RS}. But if the input file ended without any text that matches @code{RS}, @@ -8080,8 +8105,8 @@ then @command{gawk} sets @code{RT} to the null string. @node Getline @section Explicit Input with @code{getline} -@cindex @code{getline} command, explicit input with -@cindex input, explicit +@cindex @code{getline} command @subentry explicit input with +@cindex input @subentry explicit So far we have been getting our input data from @command{awk}'s main input stream---either the standard input (usually your keyboard, sometimes the output from another program) or the @@ -8108,11 +8133,11 @@ Parts I and II @end ifnotinfo and have a good knowledge of how @command{awk} works. -@cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable, with @command{getline} command -@cindex differences in @command{awk} and @command{gawk}, @code{getline} command -@cindex @code{getline} command, return values -@cindex @option{--sandbox} option, input redirection with @code{getline} +@cindex @command{gawk} @subentry @code{ERRNO} variable in +@cindex @code{ERRNO} variable @subentry with @command{getline} command +@cindex differences in @command{awk} and @command{gawk} @subentry @code{getline} command +@cindex @code{getline} command @subentry return values +@cindex @option{--sandbox} option @subentry input redirection with @code{getline} The @code{getline} command returns 1 if it finds a record and 0 if it encounters the end of the file. If there is some error in getting @@ -8235,8 +8260,8 @@ rule in the program. @xref{Next Statement}. @node Getline/Variable @subsection Using @code{getline} into a Variable -@cindex @code{getline} into a variable -@cindex variables, @code{getline} command into@comma{} using +@cindex @code{getline} command @subentry into a variable +@cindex variables @subentry @code{getline} command into, using You can use @samp{getline @var{var}} to read the next record from @command{awk}'s input into the variable @var{var}. No other processing is @@ -8289,12 +8314,12 @@ the value of @code{NF} do not change. @node Getline/File @subsection Using @code{getline} from a File -@cindex @code{getline} from a file +@cindex @code{getline} command @subentry from a file @cindex input redirection -@cindex redirection of input -@cindex @code{<} (left angle bracket), @code{<} operator (I/O) -@cindex left angle bracket (@code{<}), @code{<} operator (I/O) -@cindex operators, input/output +@cindex redirection @subentry of input +@cindex @code{<} (left angle bracket) @subentry @code{<} operator (I/O) +@cindex left angle bracket (@code{<}) @subentry @code{<} operator (I/O) +@cindex operators @subentry input/output Use @samp{getline < @var{file}} to read the next record from @var{file}. Here, @var{file} is a string-valued expression that specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} @@ -8320,7 +8345,7 @@ the normal manner, so the values of @code{$0} and the other fields are changed, resulting in a new value of @code{NF}. @code{RT} is also set. -@cindex POSIX @command{awk}, @code{<} operator and +@cindex POSIX @command{awk} @subentry @code{<} operator and @c Thanks to Paul Eggert for initial wording here According to POSIX, @samp{getline < @var{expression}} is ambiguous if @var{expression} contains unparenthesized operators other than @@ -8331,7 +8356,7 @@ you want your program to be portable to all @command{awk} implementations. @node Getline/Variable/File @subsection Using @code{getline} into a Variable from a File -@cindex variables, @code{getline} command into@comma{} using +@cindex variables @subentry @code{getline} command into, using Use @samp{getline @var{var} < @var{file}} to read input from the file @@ -8385,11 +8410,11 @@ Failing that, attention to details would be useful.} @author Brian Kernighan @end quotation -@cindex @code{|} (vertical bar), @code{|} operator (I/O) -@cindex vertical bar (@code{|}), @code{|} operator (I/O) +@cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O) +@cindex vertical bar (@code{|}) @subentry @code{|} operator (I/O) @cindex input pipeline -@cindex pipe, input -@cindex operators, input/output +@cindex pipe @subentry input +@cindex operators @subentry input/output The output of a command can also be piped into @code{getline}, using @samp{@var{command} | getline}. In this case, the string @var{command} is run as a shell command and its output @@ -8436,9 +8461,9 @@ bletch @noindent the program might produce: -@cindex Robbins, Bill -@cindex Robbins, Miriam -@cindex Robbins, Arnold +@cindex Robbins @subentry Bill +@cindex Robbins @subentry Miriam +@cindex Robbins @subentry Arnold @example foo bar @@ -8459,7 +8484,7 @@ value of @code{NF}, and recomputes the value of @code{$0}. The values of @code{NR} and @code{FNR} are not changed. @code{RT} is set. -@cindex POSIX @command{awk}, @code{|} I/O operator and +@cindex POSIX @command{awk} @subentry @code{|} I/O operator and @c Thanks to Paul Eggert for initial wording here According to POSIX, @samp{@var{expression} | getline} is ambiguous if @var{expression} contains unparenthesized operators other than @@ -8485,7 +8510,7 @@ have to worry. @node Getline/Variable/Pipe @subsection Using @code{getline} into a Variable from a Pipe -@cindex variables, @code{getline} command into@comma{} using +@cindex variables @subentry @code{getline} command into, using When you use @samp{@var{command} | getline @var{var}}, the output of @var{command} is sent through a pipe to @@ -8517,12 +8542,12 @@ program to be portable to other @command{awk} implementations. @node Getline/Coprocess @subsection Using @code{getline} from a Coprocess -@cindex coprocesses, @code{getline} from -@cindex @code{getline} command, coprocesses@comma{} using from -@cindex @code{|} (vertical bar), @code{|&} operator (I/O) -@cindex vertical bar (@code{|}), @code{|&} operator (I/O) -@cindex operators, input/output -@cindex differences in @command{awk} and @command{gawk}, input/output operators +@cindex coprocesses @subentry @code{getline} from +@cindex @code{getline} command @subentry coprocesses, using from +@cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) +@cindex vertical bar (@code{|}) @subentry @code{|&} operator (I/O) +@cindex operators @subentry input/output +@cindex differences in @command{awk} and @command{gawk} @subentry input/output operators Reading input into @code{getline} from a pipe is a one-way operation. The command that is started with @samp{@var{command} | getline} only @@ -8558,7 +8583,7 @@ where coprocesses are discussed in more detail. @node Getline/Variable/Coprocess @subsection Using @code{getline} into a Variable from a Coprocess -@cindex variables, @code{getline} command into@comma{} using +@cindex variables @subentry @code{getline} command into, using When you use @samp{@var{command} |& getline @var{var}}, the output from the coprocess @var{command} is sent through a two-way pipe to @code{getline} @@ -8588,21 +8613,21 @@ When @code{getline} changes the value of @code{$0} and @code{NF}, program and start testing the new record against every pattern. However, the new record is tested against any subsequent rules. -@cindex differences in @command{awk} and @command{gawk}, implementation limitations -@cindex implementation issues, @command{gawk}, limits -@cindex @command{awk}, implementations, limits -@cindex @command{gawk}, implementation issues, limits +@cindex differences in @command{awk} and @command{gawk} @subentry implementation limitations +@cindex implementation issues, @command{gawk} @subentry limits +@cindex @command{awk} @subentry implementations @subentry limits +@cindex @command{gawk} @subentry implementation issues @subentry limits @item Some very old @command{awk} implementations limit the number of pipelines that an @command{awk} program may have open to just one. In @command{gawk}, there is no such limit. You can open as many pipelines (and coprocesses) as the underlying operating system permits. -@cindex side effects, @code{FILENAME} variable -@cindex @code{FILENAME} variable, @code{getline}@comma{} setting with -@cindex dark corner, @code{FILENAME} variable -@cindex @code{getline} command, @code{FILENAME} variable and -@cindex @code{BEGIN} pattern, @code{getline} and +@cindex side effects @subentry @code{FILENAME} variable +@cindex @code{FILENAME} variable @subentry @code{getline}, setting with +@cindex dark corner @subentry @code{FILENAME} variable +@cindex @code{getline} command @subentry @code{FILENAME} variable and +@cindex @code{BEGIN} pattern @subentry @code{getline} and @item An interesting side effect occurs if you use @code{getline} without a redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline} @@ -8669,7 +8694,7 @@ know that there is a string value to be assigned. @node Getline Summary @subsection Summary of @code{getline} Variants -@cindex @code{getline} command, variants +@cindex @code{getline} command @subentry variants @ref{table-getline-variants} summarizes the eight variants of @code{getline}, @@ -8696,7 +8721,7 @@ Note: for each variant, @command{gawk} sets the @code{RT} predefined variable. @section Reading Input with a Timeout @cindex timeout, reading input -@cindex differences in @command{awk} and @command{gawk}, read timeouts +@cindex differences in @command{awk} and @command{gawk} @subentry read timeouts This @value{SECTION} describes a feature that is specific to @command{gawk}. You may specify a timeout in milliseconds for reading input from the keyboard, @@ -8780,6 +8805,8 @@ worth of data the first time. Because of this, changing the value of timeout like in the preceding example is not very useful. @end quotation +@cindex @env{GAWK_READ_TIMEOUT} environment variable +@cindex environment variables @subentry @env{GAWK_READ_TIMEOUT} If the @code{PROCINFO} element is not present and the @env{GAWK_READ_TIMEOUT} environment variable exists, @command{gawk} uses its value to initialize the timeout value. @@ -8806,7 +8833,7 @@ indefinitely until some other process opens it for writing. @section Retrying Reads After Certain Input Errors @cindex retrying input -@cindex differences in @command{awk} and @command{gawk}, retrying input +@cindex differences in @command{awk} and @command{gawk} @subentry retrying input This @value{SECTION} describes a feature that is specific to @command{gawk}. When @command{gawk} encounters an error while reading input, by @@ -8834,9 +8861,9 @@ descriptor has been configured to behave in a non-blocking fashion. @node Command-line directories @section Directories on the Command Line -@cindex differences in @command{awk} and @command{gawk}, command-line directories -@cindex directories, command-line -@cindex command line, directories on +@cindex differences in @command{awk} and @command{gawk} @subentry command-line directories +@cindex directories @subentry command-line +@cindex command line @subentry directories on According to the POSIX standard, files named on the @command{awk} command line must be text files; it is a fatal error if they are not. @@ -8927,6 +8954,7 @@ from the default input stream, from a file, or from a pipe or coprocess. Use @code{PROCINFO[@var{file}, "READ_TIMEOUT"]} to cause reads to time out for @var{file}. +@cindex POSIX mode @item Directories on the command line are fatal for standard @command{awk}; @command{gawk} ignores them if not in POSIX mode. @@ -8952,7 +8980,7 @@ including abstentions, for each item. @chapter Printing Output @cindex printing -@cindex output, printing, See printing +@cindex output, printing @seeentry{printing} One of the most common programming actions is to @dfn{print}, or output, some or all of the input. Use the @code{print} statement for simple output, and the @code{printf} statement @@ -9013,8 +9041,8 @@ The items to print can be constant strings or numbers, fields of the current record (such as @code{$1}), variables, or any @command{awk} expression. Numeric values are converted to strings and then printed. -@cindex records, printing -@cindex lines, blank, printing +@cindex records @subentry printing +@cindex lines @subentry blank, printing @cindex text, printing The simple statement @samp{print} with no items is equivalent to @samp{print $0}: it prints the entire current record. To print a blank @@ -9037,7 +9065,7 @@ isn't limited to only one line. If an item value is a string containing a newline, the newline is output along with the rest of the string. A single @code{print} statement can make any number of lines this way. -@cindex newlines, printing +@cindex newlines @subentry printing The following is an example of printing a string that contains embedded @ifinfo newlines @@ -9064,7 +9092,7 @@ $ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'} @end group @end example -@cindex fields, printing +@cindex fields @subentry printing The next example, which is run on the @file{inventory-shipped} file, prints the first two fields of each input record, with a space between them: @@ -9077,8 +9105,8 @@ $ @kbd{awk '@{ print $1, $2 @}' inventory-shipped} @dots{} @end example -@cindex @code{print} statement, commas, omitting -@cindex troubleshooting, @code{print} statement@comma{} omitting commas +@cindex @code{print} statement @subentry commas, omitting +@cindex troubleshooting @subentry @code{print} statement, omitting commas A common mistake in using the @code{print} statement is to omit the comma between two items. This often has the effect of making the items run together in the output, with no space. The reason for this is that @@ -9093,7 +9121,7 @@ $ @kbd{awk '@{ print $1 $2 @}' inventory-shipped} @dots{} @end example -@cindex @code{BEGIN} pattern, headings@comma{} adding +@cindex @code{BEGIN} pattern @subentry headings, adding To someone unfamiliar with the @file{inventory-shipped} file, neither example's output makes much sense. A heading line at the beginning would make it clearer. Let's add some headings to our table of months @@ -9132,8 +9160,8 @@ awk 'BEGIN @{ print "Month Crates" @end group @end example -@cindex @code{printf} statement, columns@comma{} aligning -@cindex columns, aligning +@cindex @code{printf} statement @subentry columns, aligning +@cindex columns @subentry aligning Lining up columns this way can get pretty complicated when there are many columns to fix. Counting spaces for two or three columns is simple, but any more than this can take up @@ -9141,8 +9169,8 @@ a lot of time. This is why the @code{printf} statement was created (@pxref{Printf}); one of its specialties is lining up columns of data. -@cindex line continuations, in @code{print} statement -@cindex @code{print} statement, line continuations and +@cindex line continuations @subentry in @code{print} statement +@cindex @code{print} statement @subentry line continuations and @quotation NOTE You can continue either a @code{print} or @code{printf} statement simply by putting a newline after any comma @@ -9168,10 +9196,10 @@ then outputs a string called the @dfn{output record separator} (or (i.e., a newline character). Thus, each @code{print} statement normally makes a separate line. -@cindex output, records -@cindex output record separator, See @code{ORS} variable +@cindex output @subentry records +@cindex output record separator @seeentry{@code{ORS} variable} @cindex @code{ORS} variable -@cindex @code{BEGIN} pattern, @code{OFS}/@code{ORS} variables, assigning values to +@cindex @code{BEGIN} pattern @subentry @code{OFS}/@code{ORS} variables, assigning values to In order to change how output fields and records are separated, assign new values to the variables @code{OFS} and @code{ORS}. The usual place to do this is in the @code{BEGIN} rule @@ -9217,8 +9245,8 @@ runs together on a single line. @node OFMT @section Controlling Numeric Output with @code{print} -@cindex numeric, output format -@cindex formats@comma{} numeric output +@cindex numeric @subentry output format +@cindex formats, numeric output When printing numeric values with the @code{print} statement, @command{awk} internally converts each number to a string of characters and prints that string. @command{awk} uses the @code{sprintf()} function @@ -9233,7 +9261,7 @@ more fully in @cindexawkfunc{sprintf} @cindex @code{OFMT} variable -@cindex output, format specifier@comma{} @code{OFMT} +@cindex output @subentry format specifier, @code{OFMT} The predefined variable @code{OFMT} contains the format specification that @code{print} uses with @code{sprintf()} when it wants to convert a number to a string for printing. @@ -9250,9 +9278,9 @@ $ @kbd{awk 'BEGIN @{} @end example @noindent -@cindex dark corner, @code{OFMT} variable -@cindex POSIX @command{awk}, @code{OFMT} variable and -@cindex @code{OFMT} variable, POSIX @command{awk} and +@cindex dark corner @subentry @code{OFMT} variable +@cindex POSIX @command{awk} @subentry @code{OFMT} variable and +@cindex @code{OFMT} variable @subentry POSIX @command{awk} and According to the POSIX standard, @command{awk}'s behavior is undefined if @code{OFMT} contains anything but a floating-point conversion specification. @value{DARKCORNER} @@ -9261,8 +9289,8 @@ if @code{OFMT} contains anything but a floating-point conversion specification. @section Using @code{printf} Statements for Fancier Printing @cindex @code{printf} statement -@cindex output, formatted -@cindex formatting output +@cindex output @subentry formatted +@cindex formatting @subentry output For more precise control over the output format than what is provided by @code{print}, use @code{printf}. With @code{printf} you can @@ -9281,7 +9309,7 @@ after the decimal point). @node Basic Printf @subsection Introduction to the @code{printf} Statement -@cindex @code{printf} statement, syntax of +@cindex @code{printf} statement @subentry syntax of A simple @code{printf} statement looks like this: @example @@ -9329,8 +9357,8 @@ the output message. @node Control Letters @subsection Format-Control Letters -@cindex @code{printf} statement, format-control characters -@cindex format specifiers, @code{printf} statement +@cindex @code{printf} statement @subentry format-control characters +@cindex format specifiers @subentry @code{printf} statement A format specifier starts with the character @samp{%} and ends with a @dfn{format-control letter}---it tells the @code{printf} statement @@ -9365,8 +9393,8 @@ Print a number as a character; thus, @samp{printf "%c", 65} outputs the letter @samp{A}. The output for a string value is the first character of the string. -@cindex dark corner, format-control characters -@cindex @command{gawk}, format-control characters +@cindex dark corner @subentry format-control characters +@cindex @command{gawk} @subentry format-control characters @quotation NOTE The POSIX standard says the first character of a string is printed. In locales with multibyte characters, @command{gawk} attempts to @@ -9464,8 +9492,8 @@ This does not consume an argument and it ignores any modifiers. @end table -@cindex dark corner, format-control characters -@cindex @command{gawk}, format-control characters +@cindex dark corner @subentry format-control characters +@cindex @command{gawk} @subentry format-control characters @quotation NOTE When using the integer format-control letters for values that are outside the range of the widest C integer type, @command{gawk} switches to @@ -9490,8 +9518,8 @@ the IEEE standard. Further details are provided in @node Format Modifiers @subsection Modifiers for @code{printf} Formats -@cindex @code{printf} statement, modifiers -@cindex modifiers@comma{} in format specifiers +@cindex @code{printf} statement @subentry modifiers +@cindex modifiers, in format specifiers A format specification can also include @dfn{modifiers} that can control how much of the item's value is printed, as well as how much space it gets. The modifiers come between the @samp{%} and the format-control letter. @@ -9501,8 +9529,8 @@ spaces in the output. Here are the possible modifiers, in the order in which they may appear: @table @asis -@cindex differences in @command{awk} and @command{gawk}, @code{print}/@code{printf} statements -@cindex @code{printf} statement, positional specifiers +@cindex differences in @command{awk} and @command{gawk} @subentry @code{print}/@code{printf} statements +@cindex @code{printf} statement @subentry positional specifiers @c the code{} does NOT start a secondary @cindex positional specifiers, @code{printf} statement @item @code{@var{N}$} @@ -9687,8 +9715,8 @@ printf "%" w "." p "s\n", s This is not particularly easy to read, but it does work. @c @cindex lint checks -@cindex troubleshooting, fatal errors, @code{printf} format strings -@cindex POSIX @command{awk}, @code{printf} format strings and +@cindex troubleshooting @subentry fatal errors @subentry @code{printf} format strings +@cindex POSIX @command{awk} @subentry @code{printf} format strings and C programmers may be used to supplying additional modifiers (@samp{h}, @samp{j}, @samp{l}, @samp{L}, @samp{t}, and @samp{z}) in @code{printf} format strings. These are not valid in @command{awk}. Most @command{awk} @@ -9780,8 +9808,8 @@ awk 'BEGIN @{ format = "%-10s %s\n" @section Redirecting Output of @code{print} and @code{printf} @cindex output redirection -@cindex redirection of output -@cindex @option{--sandbox} option, output redirection with @code{print}, @code{printf} +@cindex redirection @subentry of output +@cindex @option{--sandbox} option @subentry output redirection with @code{print} @subentry @code{printf} So far, the output from @code{print} and @code{printf} has gone to the standard output, usually the screen. Both @code{print} and @code{printf} can @@ -9798,17 +9826,17 @@ Redirections in @command{awk} are written just like redirections in shell commands, except that they are written inside the @command{awk} program. @c the commas here are part of the see also -@cindex @code{print} statement, See Also redirection@comma{} of output -@cindex @code{printf} statement, See Also redirection@comma{} of output +@cindex @code{print} statement @seealso{redirection of output} +@cindex @code{printf} statement @seealso{redirection of output} There are four forms of output redirection: output to a file, output appended to a file, output through a pipe to another command, and output to a coprocess. We show them all for the @code{print} statement, but they work identically for @code{printf}: @table @code -@cindex @code{>} (right angle bracket), @code{>} operator (I/O) -@cindex right angle bracket (@code{>}), @code{>} operator (I/O) -@cindex operators, input/output +@cindex @code{>} (right angle bracket) @subentry @code{>} operator (I/O) +@cindex right angle bracket (@code{>}) @subentry @code{>} operator (I/O) +@cindex operators @subentry input/output @item print @var{items} > @var{output-file} This redirection prints the items into the output file named @var{output-file}. The @value{FN} @var{output-file} can be any @@ -9840,8 +9868,8 @@ $ @kbd{cat name-list} @noindent Each output file contains one name or number per line. -@cindex @code{>} (right angle bracket), @code{>>} operator (I/O) -@cindex right angle bracket (@code{>}), @code{>>} operator (I/O) +@cindex @code{>} (right angle bracket) @subentry @code{>>} operator (I/O) +@cindex right angle bracket (@code{>}) @subentry @code{>>} operator (I/O) @item print @var{items} >> @var{output-file} This redirection prints the items into the preexisting output file named @var{output-file}. The difference between this and the @@ -9850,9 +9878,9 @@ single-@samp{>} redirection is that the old contents (if any) of appended to the file. If @var{output-file} does not exist, then it is created. -@cindex @code{|} (vertical bar), @code{|} operator (I/O) -@cindex pipe, output -@cindex output, pipes +@cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O) +@cindex pipe @subentry output +@cindex output @subentry pipes @item print @var{items} | @var{command} It is possible to send output to another program through a pipe instead of into a file. This redirection opens a pipe to @@ -9907,9 +9935,9 @@ because (if you mean to refer to that same file or command) every time. @cindex coprocesses -@cindex @code{|} (vertical bar), @code{|&} operator (I/O) -@cindex operators, input/output -@cindex differences in @command{awk} and @command{gawk}, input/output operators +@cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) +@cindex operators @subentry input/output +@cindex differences in @command{awk} and @command{gawk} @subentry input/output operators @item print @var{items} |& @var{command} This redirection prints the items to the input of @var{command}. The difference between this and the @@ -9939,7 +9967,7 @@ asks the system to open a file, pipe, or coprocess only if the particular @var{file} or @var{command} you specify has not already been written to by your program or if it has been closed since it was last written to. -@cindex troubleshooting, printing +@cindex troubleshooting @subentry printing It is a common error to use @samp{>} redirection for the first @code{print} to a file, and then to use @samp{>>} for subsequent output: @@ -9960,10 +9988,10 @@ output is produced in the expected order. However, mixing the operators for the same file is definitely poor style, and is confusing to readers of your program.) -@cindex differences in @command{awk} and @command{gawk}, implementation limitations -@cindex implementation issues, @command{gawk}, limits -@cindex @command{awk}, implementation issues, pipes -@cindex @command{gawk}, implementation issues, pipes +@cindex differences in @command{awk} and @command{gawk} @subentry implementation limitations +@cindex implementation issues, @command{gawk} @subentry limits +@cindex @command{awk} @subentry implementation issues @subentry pipes +@cindex @command{gawk} @subentry implementation issues @subentry pipes @ifnotinfo As mentioned earlier (@pxref{Getline Notes}), @@ -9981,7 +10009,7 @@ program may have open to just one! In @command{gawk}, there is no such limit. open as many pipelines as the underlying operating system permits. @sidebar Piping into @command{sh} -@cindex shells, piping commands into +@cindex shells @subentry piping commands into A particularly powerful way to use redirection is to build command lines and pipe them into the shell, @command{sh}. For example, suppose you @@ -10010,13 +10038,13 @@ command lines to be fed to the shell. @node Special FD @section Special Files for Standard Preopened Data Streams @cindex standard input -@cindex input, standard +@cindex input @subentry standard @cindex standard output -@cindex output, standard +@cindex output @subentry standard @cindex error output @cindex standard error @cindex file descriptors -@cindex files, descriptors, See file descriptors +@cindex files @subentry descriptors @seeentry{file descriptors} Running programs conventionally have three input and output streams already available to them for reading and writing. These are known @@ -10031,7 +10059,7 @@ is typically used for writing error messages; the reason there are two separate streams, standard output and standard error, is so that they can be redirected separately. -@cindex differences in @command{awk} and @command{gawk}, error messages +@cindex differences in @command{awk} and @command{gawk} @subentry error messages @cindex error handling In traditional implementations of @command{awk}, the only way to write an error message to standard error in an @command{awk} program is as follows: @@ -10074,15 +10102,15 @@ the descriptor that the @value{FN} stands for. These special @value{FN}s work for all operating systems that @command{gawk} has been ported to, not just those that are POSIX-compliant: -@cindex common extensions, @code{/dev/stdin} special file -@cindex common extensions, @code{/dev/stdout} special file -@cindex common extensions, @code{/dev/stderr} special file -@cindex extensions, common@comma{} @code{/dev/stdin} special file -@cindex extensions, common@comma{} @code{/dev/stdout} special file -@cindex extensions, common@comma{} @code{/dev/stderr} special file -@cindex file names, standard streams in @command{gawk} +@cindex common extensions @subentry @code{/dev/stdin} special file +@cindex common extensions @subentry @code{/dev/stdout} special file +@cindex common extensions @subentry @code{/dev/stderr} special file +@cindex extensions @subentry common @subentry @code{/dev/stdin} special file +@cindex extensions @subentry common @subentry @code{/dev/stdout} special file +@cindex extensions @subentry common @subentry @code{/dev/stderr} special file +@cindex file names @subentry standard streams in @command{gawk} @cindex @code{/dev/@dots{}} special files -@cindex files, @code{/dev/@dots{}} special files +@cindex files @subentry @code{/dev/@dots{}} special files @cindex @code{/dev/fd/@var{N}} special files (@command{gawk}) @table @file @item /dev/stdin @@ -10102,7 +10130,7 @@ the proper way to write an error message then becomes: print "Serious error detected!" > "/dev/stderr" @end example -@cindex troubleshooting, quotes with file names +@cindex troubleshooting @subentry quotes with file names Note the use of quotes around the @value{FN}. Like with any other redirection, the value must be a string. It is a common error to omit the quotes, which leads @@ -10115,7 +10143,7 @@ invoked with the @option{--traditional} option (@pxref{Options}). @node Special Files @section Special @value{FFN}s in @command{gawk} -@cindex @command{gawk}, file names in +@cindex @command{gawk} @subentry file names in Besides access to standard input, standard output, and standard error, @command{gawk} provides access to any open file descriptor. @@ -10154,8 +10182,8 @@ above two, does actually close the given file descriptor. @node Special Network @subsection Special Files for Network Communications -@cindex networks, support for -@cindex TCP/IP, support for +@cindex networks @subentry support for +@cindex TCP/IP @subentry support for @command{gawk} programs can open a two-way @@ -10184,8 +10212,9 @@ Here are some things to bear in mind when using the special @value{FN}s that @command{gawk} provides: @itemize @value{BULLET} -@cindex compatibility mode (@command{gawk}), file names -@cindex file names, in compatibility mode +@cindex compatibility mode (@command{gawk}) @subentry file names +@cindex file names @subentry in compatibility mode +@cindex POSIX mode @item Recognition of the @value{FN}s for the three standard preopened files is disabled only in POSIX mode. @@ -10208,12 +10237,12 @@ Doing so results in unpredictable behavior. @node Close Files And Pipes @section Closing Input and Output Redirections -@cindex files, output, See output files -@cindex input files, closing -@cindex output, files@comma{} closing -@cindex pipe, closing -@cindex coprocesses, closing -@cindex @code{getline} command, coprocesses@comma{} using from +@cindex files @subentry output @seeentry{output files} +@cindex input files @subentry closing +@cindex output @subentry files, closing +@cindex pipe @subentry closing +@cindex coprocesses @subentry closing +@cindex @code{getline} command @subentry coprocesses, using from If the same @value{FN} or the same shell command is used with @code{getline} more than once during the execution of an @command{awk} program @@ -10315,9 +10344,9 @@ program closes the pipe after each line of output, then each line makes a separate message. @end itemize -@cindex differences in @command{awk} and @command{gawk}, @code{close()} function -@cindex portability, @code{close()} function and -@cindex @code{close()} function, portability +@cindex differences in @command{awk} and @command{gawk} @subentry @code{close()} function +@cindex portability @subentry @code{close()} function and +@cindex @code{close()} function @subentry portability If you use more files than the system allows you to have open, @command{gawk} attempts to multiplex the available open files among your @value{DF}s. @command{gawk}'s ability to do this depends upon the @@ -10365,7 +10394,7 @@ It is, more likely, a close of a file that was never opened with a redirection, so @command{awk} silently does nothing, except return a negative value. -@cindex @code{|} (vertical bar), @code{|&} operator (I/O), pipes@comma{} closing +@cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) @subentry pipes, closing When using the @samp{|&} operator to communicate with a coprocess, it is occasionally useful to be able to close one end of the two-way pipe without closing the other. @@ -10381,11 +10410,11 @@ delayed until which describes it in more detail and gives an example. @sidebar Using @code{close()}'s Return Value -@cindex dark corner, @code{close()} function -@cindex @code{close()} function, return value -@cindex return value@comma{} @code{close()} function -@cindex differences in @command{awk} and @command{gawk}, @code{close()} function -@cindex Unix @command{awk}, @code{close()} function and +@cindex dark corner @subentry @code{close()} function +@cindex @code{close()} function @subentry return value +@cindex return value, @code{close()} function +@cindex differences in @command{awk} and @command{gawk} @subentry @code{close()} function +@cindex Unix @command{awk} @subentry @code{close()} function and In many older versions of Unix @command{awk}, the @code{close()} function is actually a statement. @@ -10399,8 +10428,8 @@ command | getline info retval = close(command) # syntax error in many Unix awks @end example -@cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable, with @command{close()} function +@cindex @command{gawk} @subentry @code{ERRNO} variable in +@cindex @code{ERRNO} variable @subentry with @command{close()} function @command{gawk} treats @code{close()} as a function. The return value is @minus{}1 if the argument names something that was never opened with a redirection, or if there is @@ -10429,6 +10458,7 @@ if it fails. @end multitable @end float +@cindex POSIX mode The POSIX standard is very vague; it says that @code{close()} returns zero on success and a nonzero value otherwise. In general, different implementations vary in what they report when closing @@ -10496,6 +10526,8 @@ For standard output, you may use @code{PROCINFO["-", "NONFATAL"]} or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use @code{PROCINFO["/dev/stderr", "NONFATAL"]}. +@cindex @env{GAWK_SOCK_RETRIES} environment variable +@cindex environment variables @subentry @env{GAWK_SOCK_RETRIES} When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}), @command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES} environment variable (@pxref{Other Environment Variables}) allows you to @@ -10617,7 +10649,7 @@ that provide the values used in expressions. @node Constants @subsection Constant Expressions -@cindex constants, types of +@cindex constants @subentry types of The simplest type of expression is the @dfn{constant}, which always has the same value. There are three types of constants: numeric, @@ -10636,8 +10668,8 @@ have different forms, but are internally stored in an identical manner. @node Scalar Constants @subsubsection Numeric and String Constants -@cindex constants, numeric -@cindex numeric constants +@cindex constants @subentry numeric +@cindex numeric @subentry constants A @dfn{numeric constant} stands for a number. This number can be an integer, a decimal fraction, or a number in scientific (exponential) notation.@footnote{The internal representation of all numbers, @@ -10653,7 +10685,8 @@ have the same value: 1050e-1 @end example -@cindex string constants +@cindex string @subentry constants +@cindex constants @subentry string A @dfn{string constant} consists of a sequence of characters enclosed in double quotation marks. For example: @@ -10662,8 +10695,9 @@ double quotation marks. For example: @end example @noindent -@cindex differences in @command{awk} and @command{gawk}, strings -@cindex strings, length limitations +@cindex differences in @command{awk} and @command{gawk} @subentry strings +@cindex strings @subentry length limitations +@cindex ASCII represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible eight-bit ASCII characters, including ASCII @sc{nul} (character code zero). @@ -10702,9 +10736,9 @@ $ @kbd{gawk 'BEGIN @{ print "hello, } @print{} gawk: cmd. line:1: ^ syntax error @end example -@cindex dark corner, string continuation -@cindex strings, continuation across lines -@cindex differences in @command{awk} and @command{gawk}, strings +@cindex dark corner @subentry string continuation +@cindex strings @subentry continuation across lines +@cindex differences in @command{awk} and @command{gawk} @subentry strings Although POSIX doesn't define what happens if you use an escaped newline, as in the previous C example, all known versions of @command{awk} allow you to do so. Unfortunately, what each one @@ -10718,6 +10752,7 @@ $ @kbd{gawk 'BEGIN @{ print "hello, \} @print{} hello, world @end example +@cindex POSIX mode In POSIX mode (@pxref{Options}), @command{gawk} does not allow escaped newlines. Otherwise, it behaves as just described. @@ -10736,8 +10771,8 @@ $ @kbd{nawk 'BEGIN @{ print "hello, \} @subsubsection Octal and Hexadecimal Numbers @cindex octal numbers @cindex hexadecimal numbers -@cindex numbers, octal -@cindex numbers, hexadecimal +@cindex numbers @subentry octal +@cindex numbers @subentry hexadecimal In @command{awk}, all numbers are in decimal (i.e., base 10). Many other programming languages allow you to specify numbers in other bases, often @@ -10782,8 +10817,8 @@ Being able to use octal and hexadecimal constants in your programs is most useful when working with data that cannot be represented conveniently as characters or as regular numbers, such as binary data of various sorts. -@cindex @command{gawk}, octal numbers and -@cindex @command{gawk}, hexadecimal numbers and +@cindex @command{gawk} @subentry octal numbers and +@cindex @command{gawk} @subentry hexadecimal numbers and @command{gawk} allows the use of octal and hexadecimal constants in your program text. However, such numbers in the input data are not treated differently; doing so by default would break old @@ -10810,8 +10845,8 @@ $ @kbd{gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}'} @print{} 18 @end example -@cindex compatibility mode (@command{gawk}), octal numbers -@cindex compatibility mode (@command{gawk}), hexadecimal numbers +@cindex compatibility mode (@command{gawk}) @subentry octal numbers +@cindex compatibility mode (@command{gawk}) @subentry hexadecimal numbers Octal and hexadecimal source code constants are a @command{gawk} extension. If @command{gawk} is in compatibility mode (@pxref{Options}), @@ -10838,8 +10873,8 @@ $ @kbd{gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}'} @cindex regexp constants @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator A @dfn{regexp constant} is a regular expression description enclosed in slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in @command{awk} programs are constant, but the @samp{~} and @samp{!~} @@ -10864,7 +10899,7 @@ POSIX @command{awk} and @command{gawk}, and then goes on to describe @node Standard Regexp Constants @subsubsection Standard Regular Expression Constants -@cindex dark corner, regexp constants +@cindex dark corner @subentry regexp constants When used on the righthand side of the @samp{~} or @samp{!~} operators, a regexp constant merely stands for the regexp that is to be matched. @@ -10902,8 +10937,8 @@ if (/foo/ ~ $1) print "found foo" @c @cindex automatic warnings @c @cindex warnings, automatic -@cindex @command{gawk}, regexp constants and -@cindex regexp constants, in @command{gawk} +@cindex @command{gawk} @subentry regexp constants and +@cindex regexp constants @subentry in @command{gawk} @noindent This code is ``obviously'' testing @code{$1} for a match against the regexp @code{/foo/}. But in fact, the expression @samp{/foo/ ~ $1} really means @@ -10924,8 +10959,8 @@ matches = /foo/ assigns either zero or one to the variable @code{matches}, depending upon the contents of the current input record. -@cindex differences in @command{awk} and @command{gawk}, regexp constants -@cindex dark corner, regexp constants, as arguments to user-defined functions +@cindex differences in @command{awk} and @command{gawk} @subentry regexp constants +@cindex dark corner @subentry regexp constants @subentry as arguments to user-defined functions @cindexgawkfunc{gensub} @cindexawkfunc{sub} @cindexawkfunc{gsub} @@ -11000,7 +11035,7 @@ it would be nice to have regexp constants that are @dfn{strongly typed}; in other words, that denote a regexp useful for matching, and not an expression. -@cindex values, regexp +@cindex values @subentry regexp @command{gawk} provides this feature. A strongly typed regexp constant looks almost like a regular regexp constant, except that it is preceded by an @samp{@@} sign: @@ -11063,8 +11098,8 @@ value of the original regexp text. @node Variables @subsection Variables -@cindex variables, user-defined -@cindex user-defined, variables +@cindex variables @subentry user-defined +@cindex user-defined @subentry variables @dfn{Variables} are ways of storing values at one point in your program for use later in another part of your program. They can be manipulated entirely within the program text, and they can also be assigned values @@ -11100,8 +11135,8 @@ change a variable's value, and the @code{match()}, @code{split()}, and @code{patsplit()} functions can change the contents of their array parameters (@pxref{String Functions}). -@cindex variables, built-in -@cindex variables, initializing +@cindex variables @subentry built-in +@cindex variables @subentry initializing A few variables have special built-in meanings, such as @code{FS} (the field separator) and @code{NF} (the number of fields in the current input record). @xref{Built-in Variables} for a list of the predefined variables. @@ -11118,8 +11153,8 @@ which is what you would do in C and in most other traditional languages. @node Assignment Options @subsubsection Assigning Variables on the Command Line -@cindex variables, assigning on command line -@cindex command line, variables@comma{} assigning on +@cindex variables @subentry assigning on command line +@cindex command line @subentry variables, assigning on Any @command{awk} variable can be set by including a @dfn{variable assignment} among the arguments on the command line when @command{awk} is invoked @@ -11173,7 +11208,7 @@ $ @kbd{awk '@{ print $n @}' n=4 inventory-shipped n=2 mail-list} @dots{} @end example -@cindex dark corner, command-line arguments +@cindex dark corner @subentry command-line arguments Command-line arguments are made available for explicit examination by the @command{awk} program in the @code{ARGV} array (@pxref{ARGC and ARGV}). @@ -11214,10 +11249,10 @@ this @value{SECTION} discusses this important facet of @command{awk}. @node Strings And Numbers @subsubsection How @command{awk} Converts Between Strings and Numbers -@cindex converting, strings to numbers -@cindex strings, converting -@cindex numbers, converting -@cindex converting, numbers to strings +@cindex converting @subentry string to numbers +@cindex strings @subentry converting +@cindex numbers @subentry converting +@cindex converting @subentry numbers to strings Strings are converted to numbers and numbers are converted to strings, if the context of the @command{awk} program demands it. For example, if the value of either @code{foo} or @code{bar} in the expression @samp{foo + bar} @@ -11238,8 +11273,8 @@ the variables @code{two} and @code{three} are converted to strings and concatenated together. The resulting string is converted back to the number 23, to which 4 is then added. -@cindex null strings, converting numbers to strings -@cindex type conversion +@cindex null strings @subentry converting numbers to strings +@cindex type @subentry conversion If, for some reason, you need to force a number to be converted to a string, concatenate that number with the empty string, @code{""}. To force a string to be converted to a number, add zero to that string. @@ -11265,7 +11300,7 @@ On most modern machines, value exactly.@footnote{Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.} -@cindex dark corner, @code{CONVFMT} variable +@cindex dark corner @subentry @code{CONVFMT} variable Strange results can occur if you set @code{CONVFMT} to a string that doesn't tell @code{sprintf()} how to format floating-point numbers in a useful way. For example, if you forget the @samp{%} in the format, @command{awk} converts @@ -11286,10 +11321,10 @@ b = a "" @value{DARKCORNER} @sidebar Pre-POSIX @command{awk} Used @code{OFMT} for String Conversion -@cindex POSIX @command{awk}, @code{OFMT} variable and +@cindex POSIX @command{awk} @subentry @code{OFMT} variable and @cindex @code{OFMT} variable -@cindex portability, new @command{awk} vs.@: old @command{awk} -@cindex @command{awk}, new vs.@: old, @code{OFMT} variable +@cindex portability @subentry new @command{awk} vs.@: old @command{awk} +@cindex @command{awk} @subentry new vs.@: old @subentry @code{OFMT} variable Prior to the POSIX standard, @command{awk} used the value of @code{OFMT} for converting numbers to strings. @code{OFMT} specifies the output format to use when printing numbers with @code{print}. @@ -11314,7 +11349,7 @@ non-English locales use the comma (@samp{,}) as the decimal point character. European locales often use either a space or a period as the thousands separator, if they have one. -@cindex dark corner, locale's decimal point character +@cindex dark corner @subentry locale's decimal point character The POSIX standard says that @command{awk} always uses the period as the decimal point when reading the @command{awk} program source code, and for command-line variable assignments (@pxref{Other Arguments}). However, @@ -11342,6 +11377,7 @@ the decimal point separator. In the normal @code{"C"} locale, @command{gawk} treats @samp{4,321} as 4, while in the Danish locale, it's treated as the full number including the fractional part, 4.321. +@cindex POSIX mode Some earlier versions of @command{gawk} fully complied with this aspect of the standard. However, many users in non-English locales complained about this behavior, because their data used a period as the decimal @@ -11389,7 +11425,7 @@ of the values provided by constants and variables. @node Arithmetic Ops @subsection Arithmetic Operators @cindex arithmetic operators -@cindex operators, arithmetic +@cindex operators @subentry arithmetic @c @cindex addition @c @cindex subtraction @c @cindex multiplication @@ -11428,9 +11464,9 @@ The following list provides the arithmetic operators in @command{awk}, in order from the highest precedence to the lowest: @table @code -@cindex common extensions, @code{**} operator -@cindex extensions, common@comma{} @code{**} operator -@cindex POSIX @command{awk}, arithmetic operators and +@cindex common extensions @subentry @code{**} operator +@cindex extensions @subentry common @subentry @code{**} operator +@cindex POSIX @command{awk} @subentry arithmetic operators and @item @var{x} ^ @var{y} @itemx @var{x} ** @var{y} Exponentiation; @var{x} raised to the @var{y} power. @samp{2 ^ 3} has @@ -11446,7 +11482,7 @@ Unary plus; the expression is converted to a number. @item @var{x} * @var{y} Multiplication. -@cindex troubleshooting, division +@cindex troubleshooting @subentry division @cindex division @item @var{x} / @var{y} Division; because all numbers in @command{awk} are floating-point @@ -11471,7 +11507,7 @@ Unary plus and minus have the same precedence, the multiplication operators all have the same precedence, and addition and subtraction have the same precedence. -@cindex differences in @command{awk} and @command{gawk}, trunc-mod operation +@cindex differences in @command{awk} and @command{gawk} @subentry trunc-mod operation @cindex trunc-mod operation When computing the remainder of @samp{@var{x} % @var{y}}, the quotient is rounded toward zero to an integer and @@ -11494,9 +11530,9 @@ In other @command{awk} implementations, the signedness of the remainder may be machine-dependent. @c FIXME !!! what does posix say? -@cindex portability, @code{**} operator and -@cindex @code{*} (asterisk), @code{**} operator -@cindex asterisk (@code{*}), @code{**} operator +@cindex portability @subentry @code{**} operator and +@cindex @code{*} (asterisk) @subentry @code{**} operator +@cindex asterisk (@code{*}) @subentry @code{**} operator @quotation NOTE The POSIX standard only specifies the use of @samp{^} for exponentiation. @@ -11511,8 +11547,8 @@ For maximum portability, do not use the @samp{**} operator. @author Brian Kernighan @end quotation -@cindex string operators -@cindex operators, string +@cindex string @subentry operators +@cindex operators @subentry string @cindex concatenating There is only one string operation: concatenation. It does not have a specific operator to represent it. Instead, concatenation is performed by @@ -11535,7 +11571,7 @@ $ @kbd{awk '@{ print "Field number one:" $1 @}' mail-list} @dots{} @end example -@cindex troubleshooting, string concatenation +@cindex troubleshooting @subentry string concatenation Because string concatenation does not have an explicit operator, it is often necessary to ensure that it happens at the right time by using parentheses to enclose the items to concatenate. For example, @@ -11562,7 +11598,7 @@ print "something meaningful" > (file name) @end example @cindex order of evaluation, concatenation -@cindex evaluation order, concatenation +@cindex evaluation order @subentry concatenation @cindex side effects Parentheses should be used around concatenation in all but the most common contexts, such as on the righthand side of @samp{=}. @@ -11643,10 +11679,10 @@ you're never quite sure what you'll get. @node Assignment Ops @subsection Assignment Expressions @cindex assignment operators -@cindex operators, assignment -@cindex expressions, assignment -@cindex @code{=} (equals sign), @code{=} operator -@cindex equals sign (@code{=}), @code{=} operator +@cindex operators @subentry assignment +@cindex expressions @subentry assignment +@cindex @code{=} (equals sign) @subentry @code{=} operator +@cindex equals sign (@code{=}) @subentry @code{=} operator An @dfn{assignment} is an expression that stores a (usually different) value into a variable. For example, let's assign the value one to the variable @code{z}: @@ -11669,7 +11705,7 @@ message = "this " thing " is " predicate @end example @noindent -@cindex side effects, assignment expressions +@cindex side effects @subentry assignment expressions This also illustrates string concatenation. The @samp{=} sign is called an @dfn{assignment operator}. It is the simplest assignment operator because the value of the righthand @@ -11683,8 +11719,8 @@ a @dfn{side effect}. @cindex lvalues/rvalues @cindex rvalues/lvalues -@cindex assignment operators, lvalues/rvalues -@cindex operators, assignment +@cindex assignment operators @subentry lvalues/rvalues +@cindex operators @subentry assignment The lefthand operand of an assignment need not be a variable (@pxref{Variables}); it can also be a field (@pxref{Changing Fields}) or @@ -11695,7 +11731,7 @@ The righthand operand may be any expression; it produces the new value that the assignment stores in the specified variable, field, or array element. (Such values are called @dfn{rvalues}.) -@cindex variables, types of +@cindex variables @subentry types of It is important to note that variables do @emph{not} have permanent types. A variable's type is simply the type of whatever value was last assigned to it. In the following program fragment, the variable @@ -11753,8 +11789,8 @@ and then test whether @code{x} equals one. But this style tends to make programs hard to read; such nesting of assignments should be avoided, except perhaps in a one-shot program. -@cindex @code{+} (plus sign), @code{+=} operator -@cindex plus sign (@code{+}), @code{+=} operator +@cindex @code{+} (plus sign) @subentry @code{+=} operator +@cindex plus sign (@code{+}) @subentry @code{+=} operator Aside from @samp{=}, there are several other assignment operators that do arithmetic with the old value of the variable. For example, the operator @samp{+=} computes a new value by adding the righthand value @@ -11797,8 +11833,8 @@ BEGIN @{ @end group @end example -@cindex operators, assignment, evaluation order -@cindex assignment operators, evaluation order +@cindex operators @subentry assignment @subentry evaluation order +@cindex assignment operators @subentry evaluation order @noindent The indices of @code{bar} are practically guaranteed to be different, because @code{rand()} returns different values each time it is called. @@ -11831,18 +11867,18 @@ The value of @code{a[3]} could be either two or four. case, the righthand operand is an expression whose value is converted to a number. -@cindex @code{-} (hyphen), @code{-=} operator -@cindex hyphen (@code{-}), @code{-=} operator -@cindex @code{*} (asterisk), @code{*=} operator -@cindex asterisk (@code{*}), @code{*=} operator -@cindex @code{/} (forward slash), @code{/=} operator -@cindex forward slash (@code{/}), @code{/=} operator -@cindex @code{%} (percent sign), @code{%=} operator -@cindex percent sign (@code{%}), @code{%=} operator -@cindex @code{^} (caret), @code{^=} operator -@cindex caret (@code{^}), @code{^=} operator -@cindex @code{*} (asterisk), @code{**=} operator -@cindex asterisk (@code{*}), @code{**=} operator +@cindex @code{-} (hyphen) @subentry @code{-=} operator +@cindex hyphen (@code{-}) @subentry @code{-=} operator +@cindex @code{*} (asterisk) @subentry @code{*=} operator +@cindex asterisk (@code{*}) @subentry @code{*=} operator +@cindex @code{/} (forward slash) @subentry @code{/=} operator +@cindex forward slash (@code{/}) @subentry @code{/=} operator +@cindex @code{%} (percent sign) @subentry @code{%=} operator +@cindex percent sign (@code{%}) @subentry @code{%=} operator +@cindex @code{^} (caret) @subentry @code{^=} operator +@cindex caret (@code{^}) @subentry @code{^=} operator +@cindex @code{*} (asterisk) @subentry @code{**=} operator +@cindex asterisk (@code{*}) @subentry @code{**=} operator @float Table,table-assign-ops @caption{Arithmetic assignment operators} @multitable @columnfractions .30 .70 @@ -11852,32 +11888,32 @@ to a number. @item @var{lvalue} @code{*=} @var{coefficient} @tab Multiply the value of @var{lvalue} by @var{coefficient}. @item @var{lvalue} @code{/=} @var{divisor} @tab Divide the value of @var{lvalue} by @var{divisor}. @item @var{lvalue} @code{%=} @var{modulus} @tab Set @var{lvalue} to its remainder by @var{modulus}. -@cindex common extensions, @code{**=} operator -@cindex extensions, common@comma{} @code{**=} operator -@cindex @command{awk} language, POSIX version +@cindex common extensions @subentry @code{**=} operator +@cindex extensions @subentry common @subentry @code{**=} operator +@cindex @command{awk} @subentry language, POSIX version @cindex POSIX @command{awk} @item @var{lvalue} @code{^=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. @item @var{lvalue} @code{**=} @var{power} @tab Raise @var{lvalue} to the power @var{power}. @value{COMMONEXT} @end multitable @end float -@cindex POSIX @command{awk}, @code{**=} operator and -@cindex portability, @code{**=} operator and +@cindex POSIX @command{awk} @subentry @code{**=} operator and +@cindex portability @subentry @code{**=} operator and @quotation NOTE Only the @samp{^=} operator is specified by POSIX. For maximum portability, do not use the @samp{**=} operator. @end quotation @sidebar Syntactic Ambiguities Between @samp{/=} and Regular Expressions -@cindex dark corner, regexp constants, @code{/=} operator and -@cindex @code{/} (forward slash), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant -@cindex forward slash (@code{/}), @code{/=} operator, vs. @code{/=@dots{}/} regexp constant -@cindex regexp constants, @code{/=@dots{}/}, @code{/=} operator and +@cindex dark corner @subentry regexp constants @subentry @code{/=} operator and +@cindex @code{/} (forward slash) @subentry @code{/=} operator @subentry vs. @code{/=@dots{}/} regexp constant +@cindex forward slash (@code{/}) @subentry @code{/=} operator @subentry vs. @code{/=@dots{}/} regexp constant +@cindex regexp constants @subentry @code{/=@dots{}/} @subentry @code{/=} operator and @c derived from email from "Nelson H. F. Beebe" <beebe@math.utah.edu> @c Date: Mon, 1 Sep 1997 13:38:35 -0600 (MDT) -@cindex dark corner, @code{/=} operator vs. @code{/=@dots{}/} regexp constant +@cindex dark corner @subentry @code{/=} operator vs. @code{/=@dots{}/} regexp constant @cindex ambiguity, syntactic: @code{/=} operator vs. @code{/=@dots{}/} regexp constant @cindex syntactic ambiguity: @code{/=} operator vs. @code{/=@dots{}/} regexp constant @cindex @code{/=} operator vs. @code{/=@dots{}/} regexp constant @@ -11910,16 +11946,16 @@ and @command{mawk} also do not. @subsection Increment and Decrement Operators @cindex increment operators -@cindex operators, decrement/increment +@cindex operators @subentry decrement/increment @dfn{Increment} and @dfn{decrement operators} increase or decrease the value of a variable by one. An assignment operator can do the same thing, so the increment operators add no power to the @command{awk} language; however, they are convenient abbreviations for very common operations. @cindex side effects -@cindex @code{+} (plus sign), @code{++} operator -@cindex plus sign (@code{+}), @code{++} operator -@cindex side effects, decrement/increment operators +@cindex @code{+} (plus sign) @subentry @code{++} operator +@cindex plus sign (@code{+}) @subentry @code{++} operator +@cindex side effects @subentry decrement/increment operators The operator used for adding one is written @samp{++}. It can be used to increment a variable either before or after taking its value. To @dfn{pre-increment} a variable @code{v}, write @samp{++v}. This adds @@ -11950,8 +11986,8 @@ long as you stick to numbers that are fairly small (less than @end ifnotinfo @end ifnottex -@cindex @code{$} (dollar sign), incrementing fields and arrays -@cindex dollar sign (@code{$}), incrementing fields and arrays +@cindex @code{$} (dollar sign) @subentry incrementing fields and arrays +@cindex dollar sign (@code{$}) @subentry incrementing fields and arrays Fields and array elements are incremented just like variables. (Use @samp{$(i++)} when you want to do a field reference and a variable increment at the same time. The parentheses are necessary @@ -11964,8 +12000,8 @@ the lvalue to pre-decrement or after it to post-decrement. Following is a summary of increment and decrement expressions: @table @code -@cindex @code{+} (plus sign), @code{++} operator -@cindex plus sign (@code{+}), @code{++} operator +@cindex @code{+} (plus sign) @subentry @code{++} operator +@cindex plus sign (@code{+}) @subentry @code{++} operator @item ++@var{lvalue} Increment @var{lvalue}, returning the new value as the value of the expression. @@ -11974,8 +12010,8 @@ value of the expression. Increment @var{lvalue}, returning the @emph{old} value of @var{lvalue} as the value of the expression. -@cindex @code{-} (hyphen), @code{--} operator -@cindex hyphen (@code{-}), @code{--} operator +@cindex @code{-} (hyphen) @subentry @code{--} operator +@cindex hyphen (@code{-}) @subentry @code{--} operator @item --@var{lvalue} Decrement @var{lvalue}, returning the new value as the value of the expression. @@ -11991,8 +12027,8 @@ like @samp{@var{lvalue}++}, but instead of adding, it subtracts.) @sidebar Operator Evaluation Order @cindex precedence -@cindex operators, precedence of -@cindex portability, operators +@cindex operators @subentry precedence of +@cindex portability @subentry operators @cindex evaluation order @cindex Marx, Groucho @quotation @@ -12084,7 +12120,7 @@ BEGIN @{ @} @end example -@cindex dark corner, @code{"0"} is actually true +@cindex dark corner @subentry @code{"0"} is actually true There is a surprising consequence of the ``nonzero or non-null'' rule: the string constant @code{"0"} is actually true, because it is non-null. @value{DARKCORNER} @@ -12100,13 +12136,12 @@ the string constant @code{"0"} is actually true, because it is non-null. @c leave it alone. @cindex comparison expressions -@cindex expressions, comparison -@cindex expressions, matching, See comparison expressions -@cindex matching, expressions, See comparison expressions -@cindex relational operators, See comparison operators -@cindex operators, relational, See operators@comma{} comparison -@cindex variable typing -@cindex variables, types of, comparison expressions and +@cindex expressions @subentry comparison +@cindex expressions, matching @seeentry{comparison expressions} +@cindex matching @subentry expressions @seeentry{comparison expressions} +@cindex relational operators @seeentry{comparison operators} +@cindex operators, relational @seeentry{operators, comparison} +@cindex variables @subentry types of @subentry comparison expressions and Unlike in other programming languages, in @command{awk} variables do not have a fixed type. Instead, they can be either a number or a string, depending upon the value that is assigned to them. @@ -12200,9 +12235,9 @@ $ @kbd{echo 37 | awk '@{ printf("%s %s < 42\n", $1,} Here are the rules for when @command{awk} treats data as a number, and for when it treats data as a string. -@cindex numeric, strings -@cindex strings, numeric -@cindex POSIX @command{awk}, numeric strings and +@cindex numeric @subentry strings +@cindex strings @subentry numeric +@cindex POSIX @command{awk} @subentry numeric strings and The POSIX standard uses the term @dfn{numeric string} for input data that looks numeric. The @samp{37} in the previous example is a numeric string. So what is the type of a numeric string? Answer: numeric. @@ -12414,28 +12449,29 @@ $ @kbd{echo hello 37 | gawk '@{ print typeof($1), typeof($2) @}'} @node Comparison Operators @subsubsection Comparison Operators +@cindex operators @subentry comparison @dfn{Comparison expressions} compare strings or numbers for relationships such as equality. They are written using @dfn{relational operators}, which are a superset of those in C. @ref{table-relational-ops} describes them. -@cindex @code{<} (left angle bracket), @code{<} operator -@cindex left angle bracket (@code{<}), @code{<} operator -@cindex @code{<} (left angle bracket), @code{<=} operator -@cindex left angle bracket (@code{<}), @code{<=} operator -@cindex @code{>} (right angle bracket), @code{>=} operator -@cindex right angle bracket (@code{>}), @code{>=} operator -@cindex @code{>} (right angle bracket), @code{>} operator -@cindex right angle bracket (@code{>}), @code{>} operator -@cindex @code{=} (equals sign), @code{==} operator -@cindex equals sign (@code{=}), @code{==} operator -@cindex @code{!} (exclamation point), @code{!=} operator -@cindex exclamation point (@code{!}), @code{!=} operator +@cindex @code{<} (left angle bracket) @subentry @code{<} operator +@cindex left angle bracket (@code{<}) @subentry @code{<} operator +@cindex @code{<} (left angle bracket) @subentry @code{<=} operator +@cindex left angle bracket (@code{<}) @subentry @code{<=} operator +@cindex @code{>} (right angle bracket) @subentry @code{>=} operator +@cindex right angle bracket (@code{>}) @subentry @code{>=} operator +@cindex @code{>} (right angle bracket) @subentry @code{>} operator +@cindex right angle bracket (@code{>}) @subentry @code{>} operator +@cindex @code{=} (equals sign) @subentry @code{==} operator +@cindex equals sign (@code{=}) @subentry @code{==} operator +@cindex @code{!} (exclamation point) @subentry @code{!=} operator +@cindex exclamation point (@code{!}) @subentry @code{!=} operator @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator @cindex @code{in} operator @float Table,table-relational-ops @caption{Relational operators} @@ -12464,7 +12500,7 @@ and so on. Thus, @code{"10"} is less than @code{"9"}. If there are two strings where one is a prefix of the other, the shorter string is less than the longer one. Thus, @code{"abc"} is less than @code{"abcd"}. -@cindex troubleshooting, @code{==} operator +@cindex troubleshooting @subentry @code{==} operator It is very easy to accidentally mistype the @samp{==} operator and leave off one of the @samp{=} characters. The result is still valid @command{awk} code, but the program does not do what is intended: @@ -12516,7 +12552,7 @@ $ @kbd{echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'} @print{} false @end example -@cindex comparison expressions, string vs.@: regexp +@cindex comparison expressions @subentry string vs.@: regexp @c @cindex string comparison vs.@: regexp comparison @c @cindex regexp comparison vs.@: string comparison @noindent @@ -12548,15 +12584,15 @@ has the value one if @code{x} contains @samp{foo}, such as @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator The righthand operand of the @samp{~} and @samp{!~} operators may be either a regexp constant (@code{/}@dots{}@code{/}) or an ordinary expression. In the latter case, the value of the expression as a string is used as a dynamic regexp (@pxref{Regexp Usage}; also @pxref{Computed Regexps}). -@cindex @command{awk}, regexp constants and +@cindex @command{awk} @subentry regexp constants and @cindex regexp constants A constant regular expression in slashes by itself is also an expression. @@ -12584,6 +12620,7 @@ comparison.@footnote{Technically, string comparison is supposed to behave the same way as if the strings were compared with the C @code{strcoll()} function.} +@cindex POSIX mode Because this behavior differs considerably from existing practice, @command{gawk} only implemented it when in POSIX mode (@pxref{Options}). Here is an example to illustrate the difference, in an @code{en_US.UTF-8} @@ -12615,6 +12652,7 @@ a <= b && a >= b @end example @end quotation +@cindex POSIX mode As of @value{PVERSION} 4.2, @command{gawk} continues to use locale collating order for @code{<}, @code{<=}, @code{>}, and @code{>=} only in POSIX mode. @@ -12629,12 +12667,12 @@ and http://austingroupbugs.net/view.php?id=1070. @cindex and Boolean-logic operator @cindex or Boolean-logic operator @cindex not Boolean-logic operator -@cindex expressions, Boolean +@cindex expressions @subentry Boolean @cindex Boolean expressions -@cindex operators, Boolean, See Boolean expressions -@cindex Boolean operators, See Boolean expressions -@cindex logical operators, See Boolean expressions -@cindex operators, logical, See Boolean expressions +@cindex operators, Boolean @seeentry{Boolean expressions} +@cindex Boolean operators @seeentry{Boolean expressions} +@cindex logical operators @seeentry{Boolean expressions} +@cindex operators, logical @seeentry{Boolean expressions} A @dfn{Boolean expression} is a combination of comparison expressions or matching expressions, using the Boolean operators ``or'' @@ -12666,7 +12704,7 @@ both @samp{edu} and @samp{li}: if ($0 ~ /edu/ && $0 ~ /li/) print @end example -@cindex side effects, Boolean operators +@cindex side effects @subentry Boolean operators The subexpression @var{boolean2} is evaluated only if @var{boolean1} is true. This can make a difference when @var{boolean2} contains expressions that have side effects. In the case of @samp{$0 ~ /foo/ && @@ -12706,11 +12744,11 @@ BEGIN @{ if (! ("HOME" in ENVIRON)) @end table @cindex short-circuit operators -@cindex operators, short-circuit -@cindex @code{&} (ampersand), @code{&&} operator -@cindex ampersand (@code{&}), @code{&&} operator -@cindex @code{|} (vertical bar), @code{||} operator -@cindex vertical bar (@code{|}), @code{||} operator +@cindex operators @subentry short-circuit +@cindex @code{&} (ampersand) @subentry @code{&&} operator +@cindex ampersand (@code{&}) @subentry @code{&&} operator +@cindex @code{|} (vertical bar) @subentry @code{||} operator +@cindex vertical bar (@code{|}) @subentry @code{||} operator The @samp{&&} and @samp{||} operators are called @dfn{short-circuit} operators because of the way they work. Evaluation of the full expression is ``short-circuited'' if the result can be determined partway through @@ -12722,10 +12760,10 @@ by putting a newline after them. But you cannot put a newline in front of either of these operators without using backslash continuation (@pxref{Statements/Lines}). -@cindex @code{!} (exclamation point), @code{!} operator -@cindex exclamation point (@code{!}), @code{!} operator +@cindex @code{!} (exclamation point) @subentry @code{!} operator +@cindex exclamation point (@code{!}) @subentry @code{!} operator @cindex newlines -@cindex variables, flag +@cindex variables @subentry flag @cindex flag variables The actual value of an expression using the @samp{!} operator is either one or zero, depending upon the truth value of the expression it @@ -12779,8 +12817,8 @@ The reason it's there is to avoid printing the bracketing @node Conditional Exp @subsection Conditional Expressions @cindex conditional expressions -@cindex expressions, conditional -@cindex expressions, selecting +@cindex expressions @subentry conditional +@cindex expressions @subentry selecting A @dfn{conditional expression} is a special kind of expression that has three operands. It allows you to use one expression's value to select @@ -12804,7 +12842,7 @@ For example, the following expression produces the absolute value of @code{x}: x >= 0 ? x : -x @end example -@cindex side effects, conditional expressions +@cindex side effects @subentry conditional expressions Each time the conditional expression is computed, only one of @var{if-true-exp} and @var{if-false-exp} is used; the other is ignored. This is important when the expressions have side effects. For example, @@ -12822,9 +12860,9 @@ and the other is not. @xref{Arrays}, for more information about arrays. -@cindex differences in @command{awk} and @command{gawk}, line continuations -@cindex line continuations, @command{gawk} -@cindex @command{gawk}, line continuation in +@cindex differences in @command{awk} and @command{gawk} @subentry line continuations +@cindex line continuations @subentry @command{gawk} +@cindex @command{gawk} @subentry line continuation in As a minor @command{gawk} extension, a statement that uses @samp{?:} can be continued simply by putting a newline after either character. @@ -12843,7 +12881,7 @@ This enables you to ask for it by name at any point in the program. For example, the function @code{sqrt()} computes the square root of a number. -@cindex functions, built-in +@cindex functions @subentry built-in A fixed set of functions are @dfn{built in}, which means they are available in every @command{awk} program. The @code{sqrt()} function is one of these. @xref{Built-in} for a list of built-in @@ -12854,7 +12892,7 @@ for instructions on how to do this. Finally, @command{gawk} lets you write functions in C or C++ that may be called from your program (@pxref{Dynamic Extensions}). -@cindex arguments, in function calls +@cindex arguments @subentry in function calls The way to use a function is with a @dfn{function call} expression, which consists of the function name followed immediately by a list of @dfn{arguments} in parentheses. The arguments are expressions that @@ -12869,7 +12907,7 @@ atan2(y, x) @ii{two arguments} rand() @ii{no arguments} @end example -@cindex troubleshooting, function call syntax +@cindex troubleshooting @subentry function call syntax @quotation CAUTION Do not put any space between the function name and the opening parenthesis! A user-defined function name looks just like the name of a @@ -12904,7 +12942,7 @@ which is a way to choose the function to call at runtime, instead of when you write the source code to your program. We defer discussion of this feature until later; see @ref{Indirect Calls}. -@cindex side effects, function calls +@cindex side effects @subentry function calls Like every other expression, the function call has a value, often called the @dfn{return value}, which is computed by the function based on the arguments you give it. In this example, the return value @@ -12954,7 +12992,7 @@ $ @kbd{awk -f matchit.awk} @node Precedence @section Operator Precedence (How Operators Nest) @cindex precedence -@cindex operators, precedence +@cindex operators @subentry precedence of @dfn{Operator precedence} determines how operators are grouped when different operators appear close by in one expression. For example, @@ -13001,47 +13039,47 @@ to lowest precedence: @item @code{(}@dots{}@code{)} Grouping. -@cindex @code{$} (dollar sign), @code{$} field operator -@cindex dollar sign (@code{$}), @code{$} field operator +@cindex @code{$} (dollar sign) @subentry @code{$} field operator +@cindex dollar sign (@code{$}) @subentry @code{$} field operator @item @code{$} Field reference. -@cindex @code{+} (plus sign), @code{++} operator -@cindex plus sign (@code{+}), @code{++} operator -@cindex @code{-} (hyphen), @code{--} operator -@cindex hyphen (@code{-}), @code{--} operator +@cindex @code{+} (plus sign) @subentry @code{++} operator +@cindex plus sign (@code{+}) @subentry @code{++} operator +@cindex @code{-} (hyphen) @subentry @code{--} operator +@cindex hyphen (@code{-}) @subentry @code{--} operator @item @code{++ --} Increment, decrement. -@cindex @code{^} (caret), @code{^} operator -@cindex caret (@code{^}), @code{^} operator -@cindex @code{*} (asterisk), @code{**} operator -@cindex asterisk (@code{*}), @code{**} operator +@cindex @code{^} (caret) @subentry @code{^} operator +@cindex caret (@code{^}) @subentry @code{^} operator +@cindex @code{*} (asterisk) @subentry @code{**} operator +@cindex asterisk (@code{*}) @subentry @code{**} operator @item @code{^ **} Exponentiation. These operators group right to left. -@cindex @code{+} (plus sign), @code{+} operator -@cindex plus sign (@code{+}), @code{+} operator -@cindex @code{-} (hyphen), @code{-} operator -@cindex hyphen (@code{-}), @code{-} operator -@cindex @code{!} (exclamation point), @code{!} operator -@cindex exclamation point (@code{!}), @code{!} operator +@cindex @code{+} (plus sign) @subentry @code{+} operator +@cindex plus sign (@code{+}) @subentry @code{+} operator +@cindex @code{-} (hyphen) @subentry @code{-} operator +@cindex hyphen (@code{-}) @subentry @code{-} operator +@cindex @code{!} (exclamation point) @subentry @code{!} operator +@cindex exclamation point (@code{!}) @subentry @code{!} operator @item @code{+ - !} Unary plus, minus, logical ``not.'' -@cindex @code{*} (asterisk), @code{*} operator, as multiplication operator -@cindex asterisk (@code{*}), @code{*} operator, as multiplication operator -@cindex @code{/} (forward slash), @code{/} operator -@cindex forward slash (@code{/}), @code{/} operator -@cindex @code{%} (percent sign), @code{%} operator -@cindex percent sign (@code{%}), @code{%} operator +@cindex @code{*} (asterisk) @subentry @code{*} operator @subentry as multiplication operator +@cindex asterisk (@code{*}) @subentry @code{*} operator @subentry as multiplication operator +@cindex @code{/} (forward slash) @subentry @code{/} operator +@cindex forward slash (@code{/}) @subentry @code{/} operator +@cindex @code{%} (percent sign) @subentry @code{%} operator +@cindex percent sign (@code{%}) @subentry @code{%} operator @item @code{* / %} Multiplication, division, remainder. -@cindex @code{+} (plus sign), @code{+} operator -@cindex plus sign (@code{+}), @code{+} operator -@cindex @code{-} (hyphen), @code{-} operator -@cindex hyphen (@code{-}), @code{-} operator +@cindex @code{+} (plus sign) @subentry @code{+} operator +@cindex plus sign (@code{+}) @subentry @code{+} operator +@cindex @code{-} (hyphen) @subentry @code{-} operator +@cindex hyphen (@code{-}) @subentry @code{-} operator @item @code{+ -} Addition, subtraction. @@ -13050,35 +13088,35 @@ There is no special symbol for concatenation. The operands are simply written side by side (@pxref{Concatenation}). -@cindex @code{<} (left angle bracket), @code{<} operator -@cindex left angle bracket (@code{<}), @code{<} operator -@cindex @code{<} (left angle bracket), @code{<=} operator -@cindex left angle bracket (@code{<}), @code{<=} operator -@cindex @code{>} (right angle bracket), @code{>=} operator -@cindex right angle bracket (@code{>}), @code{>=} operator -@cindex @code{>} (right angle bracket), @code{>} operator -@cindex right angle bracket (@code{>}), @code{>} operator -@cindex @code{=} (equals sign), @code{==} operator -@cindex equals sign (@code{=}), @code{==} operator -@cindex @code{!} (exclamation point), @code{!=} operator -@cindex exclamation point (@code{!}), @code{!=} operator -@cindex @code{>} (right angle bracket), @code{>>} operator (I/O) -@cindex right angle bracket (@code{>}), @code{>>} operator (I/O) -@cindex operators, input/output -@cindex @code{|} (vertical bar), @code{|} operator (I/O) -@cindex vertical bar (@code{|}), @code{|} operator (I/O) -@cindex operators, input/output -@cindex @code{|} (vertical bar), @code{|&} operator (I/O) -@cindex vertical bar (@code{|}), @code{|&} operator (I/O) -@cindex operators, input/output +@cindex @code{<} (left angle bracket) @subentry @code{<} operator +@cindex left angle bracket (@code{<}) @subentry @code{<} operator +@cindex @code{<} (left angle bracket) @subentry @code{<=} operator +@cindex left angle bracket (@code{<}) @subentry @code{<=} operator +@cindex @code{>} (right angle bracket) @subentry @code{>=} operator +@cindex right angle bracket (@code{>}) @subentry @code{>=} operator +@cindex @code{>} (right angle bracket) @subentry @code{>} operator +@cindex right angle bracket (@code{>}) @subentry @code{>} operator +@cindex @code{=} (equals sign) @subentry @code{==} operator +@cindex equals sign (@code{=}) @subentry @code{==} operator +@cindex @code{!} (exclamation point) @subentry @code{!=} operator +@cindex exclamation point (@code{!}) @subentry @code{!=} operator +@cindex @code{>} (right angle bracket) @subentry @code{>>} operator (I/O) +@cindex right angle bracket (@code{>}) @subentry @code{>>} operator (I/O) +@cindex operators @subentry input/output +@cindex @code{|} (vertical bar) @subentry @code{|} operator (I/O) +@cindex vertical bar (@code{|}) @subentry @code{|} operator (I/O) +@cindex operators @subentry input/output +@cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) +@cindex vertical bar (@code{|}) @subentry @code{|&} operator (I/O) +@cindex operators @subentry input/output @item @code{< <= == != > >= >> | |&} Relational and redirection. The relational operators and the redirections have the same precedence level. Characters such as @samp{>} serve both as relationals and as redirections; the context distinguishes between the two meanings. -@cindex @code{print} statement, I/O operators in -@cindex @code{printf} statement, I/O operators in +@cindex @code{print} statement @subentry I/O operators in +@cindex @code{printf} statement @subentry I/O operators in Note that the I/O redirection operators in @code{print} and @code{printf} statements belong to the statement level, not to expressions. The redirection does not produce an expression that could be the operand of @@ -13090,8 +13128,8 @@ The correct way to write this statement is @samp{print foo > (a ? b : c)}. @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator @item @code{~ !~} Matching, nonmatching. @@ -13099,43 +13137,43 @@ Matching, nonmatching. @item @code{in} Array membership. -@cindex @code{&} (ampersand), @code{&&} operator -@cindex ampersand (@code{&}), @code{&&} operator +@cindex @code{&} (ampersand) @subentry @code{&&} operator +@cindex ampersand (@code{&}) @subentry @code{&&} operator @item @code{&&} Logical ``and.'' -@cindex @code{|} (vertical bar), @code{||} operator -@cindex vertical bar (@code{|}), @code{||} operator +@cindex @code{|} (vertical bar) @subentry @code{||} operator +@cindex vertical bar (@code{|}) @subentry @code{||} operator @item @code{||} Logical ``or.'' -@cindex @code{?} (question mark), @code{?:} operator -@cindex question mark (@code{?}), @code{?:} operator -@cindex @code{:} (colon), @code{?:} operator -@cindex colon (@code{:}), @code{?:} operator +@cindex @code{?} (question mark) @subentry @code{?:} operator +@cindex question mark (@code{?}) @subentry @code{?:} operator +@cindex @code{:} (colon) @subentry @code{?:} operator +@cindex colon (@code{:}) @subentry @code{?:} operator @item @code{?:} Conditional. This operator groups right to left. -@cindex @code{+} (plus sign), @code{+=} operator -@cindex plus sign (@code{+}), @code{+=} operator -@cindex @code{-} (hyphen), @code{-=} operator -@cindex hyphen (@code{-}), @code{-=} operator -@cindex @code{*} (asterisk), @code{*=} operator -@cindex asterisk (@code{*}), @code{*=} operator -@cindex @code{*} (asterisk), @code{**=} operator -@cindex asterisk (@code{*}), @code{**=} operator -@cindex @code{/} (forward slash), @code{/=} operator -@cindex forward slash (@code{/}), @code{/=} operator -@cindex @code{%} (percent sign), @code{%=} operator -@cindex percent sign (@code{%}), @code{%=} operator -@cindex @code{^} (caret), @code{^=} operator -@cindex caret (@code{^}), @code{^=} operator +@cindex @code{+} (plus sign) @subentry @code{+=} operator +@cindex plus sign (@code{+}) @subentry @code{+=} operator +@cindex @code{-} (hyphen) @subentry @code{-=} operator +@cindex hyphen (@code{-}) @subentry @code{-=} operator +@cindex @code{*} (asterisk) @subentry @code{*=} operator +@cindex asterisk (@code{*}) @subentry @code{*=} operator +@cindex @code{*} (asterisk) @subentry @code{**=} operator +@cindex asterisk (@code{*}) @subentry @code{**=} operator +@cindex @code{/} (forward slash) @subentry @code{/=} operator +@cindex forward slash (@code{/}) @subentry @code{/=} operator +@cindex @code{%} (percent sign) @subentry @code{%=} operator +@cindex percent sign (@code{%}) @subentry @code{%=} operator +@cindex @code{^} (caret) @subentry @code{^=} operator +@cindex caret (@code{^}) @subentry @code{^=} operator @item @code{= += -= *= /= %= ^= **=} Assignment. These operators group right to left. @end table -@cindex POSIX @command{awk}, @code{**} operator and -@cindex portability, operators, not in POSIX @command{awk} +@cindex POSIX @command{awk} @subentry @code{**} operator and +@cindex portability @subentry operators @subentry not in POSIX @command{awk} @quotation NOTE The @samp{|&}, @samp{**}, and @samp{**=} operators are not specified by POSIX. For maximum portability, do not use them. @@ -13282,7 +13320,7 @@ building something useful. * Empty:: The empty pattern, which matches every record. @end menu -@cindex patterns, types of +@cindex patterns @subentry types of Patterns in @command{awk} control the execution of rules---a rule is executed when its pattern matches the current input record. The following is a summary of the types of @command{awk} patterns: @@ -13323,8 +13361,8 @@ The empty pattern matches every input record. @node Regexp Patterns @subsection Regular Expressions as Patterns -@cindex patterns, regular expressions as -@cindex regular expressions, as patterns +@cindex patterns @subentry regexp constants as +@cindex regular expressions @subentry as patterns Regular expressions are one of the first kinds of patterns presented in this book. @@ -13340,8 +13378,8 @@ END @{ print buzzwords, "buzzwords seen" @} @node Expression Patterns @subsection Expressions as Patterns -@cindex expressions, as patterns -@cindex patterns, expressions as +@cindex expressions @subentry as patterns +@cindex patterns @subentry expressions as Any @command{awk} expression is valid as an @command{awk} pattern. The pattern matches if the expression's value is nonzero (if a @@ -13352,8 +13390,8 @@ value depends directly on the new input record's text; otherwise, it depends on only what has happened so far in the execution of the @command{awk} program. -@cindex comparison expressions, as patterns -@cindex patterns, comparison expressions as +@cindex comparison expressions @subentry as patterns +@cindex patterns @subentry comparison expressions as Comparison expressions, using the comparison operators described in @ref{Typing and Comparison}, are a very common kind of pattern. @@ -13366,12 +13404,12 @@ is used as a dynamic regular expression The following example prints the second field of each input record whose first field is precisely @samp{li}: -@cindex @code{/} (forward slash), patterns and -@cindex forward slash (@code{/}), patterns and +@cindex @code{/} (forward slash) @subentry patterns and +@cindex forward slash (@code{/}) @subentry patterns and @cindex @code{~} (tilde), @code{~} operator @cindex tilde (@code{~}), @code{~} operator -@cindex @code{!} (exclamation point), @code{!~} operator -@cindex exclamation point (@code{!}), @code{!~} operator +@cindex @code{!} (exclamation point) @subentry @code{!~} operator +@cindex exclamation point (@code{!}) @subentry @code{!~} operator @example $ @kbd{awk '$1 == "li" @{ print $2 @}' mail-list} @end example @@ -13387,15 +13425,15 @@ $ @kbd{awk '$1 ~ /li/ @{ print $2 @}' mail-list} @print{} 555-6699 @end example -@cindex regexp constants, as patterns -@cindex patterns, regexp constants as +@cindex regexp constants @subentry as patterns +@cindex patterns @subentry regexp constants as A regexp constant as a pattern is also a special case of an expression pattern. The expression @code{/li/} has the value one if @samp{li} appears in the current input record. Thus, as a pattern, @code{/li/} matches any record containing @samp{li}. -@cindex Boolean expressions, as patterns -@cindex patterns, Boolean expressions as +@cindex Boolean expressions @subentry as patterns +@cindex patterns @subentry Boolean expressions as Boolean expressions are also commonly used as patterns. Whether the pattern matches an input record depends on whether its subexpressions match. @@ -13437,10 +13475,10 @@ $ @kbd{awk '! /li/' mail-list} @end group @end example -@cindex @code{BEGIN} pattern, Boolean patterns and -@cindex @code{END} pattern, Boolean patterns and -@cindex @code{BEGINFILE} pattern, Boolean patterns and -@cindex @code{ENDFILE} pattern, Boolean patterns and +@cindex @code{BEGIN} pattern @subentry Boolean patterns and +@cindex @code{END} pattern @subentry Boolean patterns and +@cindex @code{BEGINFILE} pattern @subentry Boolean patterns and +@cindex @code{ENDFILE} pattern @subentry Boolean patterns and The subexpressions of a Boolean operator in a pattern can be constant regular expressions, comparisons, or any other @command{awk} expressions. Range patterns are not expressions, so they cannot appear inside Boolean @@ -13456,8 +13494,8 @@ patterns is described in @ref{Precedence}. @subsection Specifying Record Ranges with Patterns @cindex range patterns -@cindex patterns, ranges in -@cindex lines, matching ranges of +@cindex patterns @subentry ranges in +@cindex lines @subentry matching ranges of @cindex @code{,} (comma), in range patterns @cindex comma (@code{,}), in range patterns A @dfn{range pattern} is made of two patterns separated by a comma, in @@ -13482,7 +13520,7 @@ input record; when this succeeds, the range pattern is @dfn{turned off} again for the following record. Then the range pattern goes back to checking @var{begpat} against each record. -@cindex @code{if} statement, actions@comma{} changing +@cindex @code{if} statement @subentry actions, changing The record that turns on the range pattern and the one that turns it off both match the range pattern. If you don't want to operate on these records, you can write @code{if} statements in the rule's action @@ -13507,13 +13545,13 @@ looks like this: @end example @noindent -@cindex lines, skipping between markers +@cindex lines @subentry skipping between markers @c @cindex flag variables This program fails because the range pattern is both turned on and turned off by the first line, which just has a @samp{%} on it. To accomplish this task, write the program in the following manner, using a flag: -@cindex @code{!} (exclamation point), @code{!} operator +@cindex @code{!} (exclamation point) @subentry @code{!} operator @example /^%$/ @{ skip = ! skip; next @} skip == 1 @{ next @} # skip lines with `skip' set @@ -13538,8 +13576,8 @@ $ @kbd{echo Yes | gawk '(/1/,/2/) || /Yes/'} @error{} gawk: cmd. line:1: ^ syntax error @end example -@cindex range patterns, line continuation and -@cindex dark corner, range patterns, line continuation and +@cindex range patterns @subentry line continuation and +@cindex dark corner @subentry range patterns, line continuation and As a minor point of interest, although it is poor style, POSIX allows you to put a newline after the comma in a range pattern. @value{DARKCORNER} @@ -13581,8 +13619,8 @@ $ @kbd{awk '} @print{} "li" appears in 4 records. @end example -@cindex @code{BEGIN} pattern, operators and -@cindex @code{END} pattern, operators and +@cindex @code{BEGIN} pattern @subentry operators and +@cindex @code{END} pattern @subentry operators and This program finds the number of records in the input file @file{mail-list} that contain the string @samp{li}. The @code{BEGIN} rule prints a title for the report. There is no need to use the @code{BEGIN} rule to @@ -13630,7 +13668,7 @@ rule checks the @code{FNR} and @code{NR} variables. @node I/O And BEGIN/END @subsubsection Input/Output from @code{BEGIN} and @code{END} Rules -@cindex input/output, from @code{BEGIN} and @code{END} +@cindex input/output @subentry from @code{BEGIN} and @code{END} There are several (sometimes subtle) points to be aware of when doing I/O from a @code{BEGIN} or @code{END} rule. The first has to do with the value of @code{$0} in a @code{BEGIN} @@ -13643,11 +13681,11 @@ without a variable (@pxref{Getline}). Another way is simply to assign a value to @code{$0}. @cindex Brian Kernighan's @command{awk} -@cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns -@cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns -@cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and -@cindex @code{BEGIN} pattern, @code{print} statement and -@cindex @code{END} pattern, @code{print} statement and +@cindex differences in @command{awk} and @command{gawk} @subentry @code{BEGIN}/@code{END} patterns +@cindex POSIX @command{awk} @subentry @code{BEGIN}/@code{END} patterns +@cindex @code{print} statement @subentry @code{BEGIN}/@code{END} patterns and +@cindex @code{BEGIN} pattern @subentry @code{print} statement and +@cindex @code{END} pattern @subentry @code{print} statement and The second point is similar to the first, but from the other direction. Traditionally, due largely to implementation issues, @code{$0} and @code{NF} were @emph{undefined} inside an @code{END} rule. @@ -13670,10 +13708,10 @@ this in @code{BEGIN} rules, it is a very bad idea in @code{END} rules, at least in @command{gawk}. It is also poor style, because if an empty line is needed in the output, the program should print one explicitly. -@cindex @code{next} statement, @code{BEGIN}/@code{END} patterns and -@cindex @code{nextfile} statement, @code{BEGIN}/@code{END} patterns and -@cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and -@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and +@cindex @code{next} statement @subentry @code{BEGIN}/@code{END} patterns and +@cindex @code{nextfile} statement @subentry @code{BEGIN}/@code{END} patterns and +@cindex @code{BEGIN} pattern @subentry @code{next}/@code{nextfile} statements and +@cindex @code{END} pattern @subentry @code{next}/@code{nextfile} statements and Finally, the @code{next} and @code{nextfile} statements are not allowed in a @code{BEGIN} rule, because the implicit read-a-record-and-match-against-the-rules loop has not started yet. Similarly, those statements @@ -13690,7 +13728,7 @@ are not valid in an @code{END} rule, because all the input has been read. @subsection The @code{BEGINFILE} and @code{ENDFILE} Special Patterns @cindex @code{BEGINFILE} pattern @cindex @code{ENDFILE} pattern -@cindex differences in @command{awk} and @command{gawk}, @code{BEGINFILE}/@code{ENDFILE} patterns +@cindex differences in @command{awk} and @command{gawk} @subentry @code{BEGINFILE}/@code{ENDFILE} patterns This @value{SECTION} describes a @command{gawk}-specific feature. @@ -13725,9 +13763,9 @@ file named on the command line cannot be opened for reading. However, you can bypass the fatal error and move on to the next file on the command line. -@cindex @command{gawk}, @code{ERRNO} variable in -@cindex @code{ERRNO} variable, with @code{BEGINFILE} pattern -@cindex @code{nextfile} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and +@cindex @command{gawk} @subentry @code{ERRNO} variable in +@cindex @code{ERRNO} variable @subentry with @code{BEGINFILE} pattern +@cindex @code{nextfile} statement @subentry @code{BEGINFILE}/@code{ENDFILE} patterns and You do this by checking if the @code{ERRNO} variable is not the empty string; if so, then @command{gawk} was not able to open the file. In this case, your program can execute the @code{nextfile} statement @@ -13754,13 +13792,13 @@ rule is present, the error becomes non-fatal, and instead @code{ERRNO} is set. This makes it possible to catch and process I/O errors at the level of the @command{awk} program. -@cindex @code{next} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and +@cindex @code{next} statement @subentry @code{BEGINFILE}/@code{ENDFILE} patterns and The @code{next} statement (@pxref{Next Statement}) is not allowed inside either a @code{BEGINFILE} or an @code{ENDFILE} rule. The @code{nextfile} statement is allowed only inside a @code{BEGINFILE} rule, not inside an @code{ENDFILE} rule. -@cindex @code{getline} statement, @code{BEGINFILE}/@code{ENDFILE} patterns and +@cindex @code{getline} command @subentry @code{BEGINFILE}/@code{ENDFILE} patterns and The @code{getline} statement (@pxref{Getline}) is restricted inside both @code{BEGINFILE} and @code{ENDFILE}: only redirected forms of @code{getline} are allowed. @@ -13795,7 +13833,7 @@ rule to grab it before moving on to the next file.] @subsection The Empty Pattern @cindex empty pattern -@cindex patterns, empty +@cindex patterns @subentry empty An empty (i.e., nonexistent) pattern is considered to match @emph{every} input record. For example, the program: @@ -13808,8 +13846,8 @@ prints the first field of every record. @node Using Shell Variables @section Using Shell Variables in Programs -@cindex shells, variables -@cindex @command{awk} programs, shell variables in +@cindex shells @subentry variables +@cindex @command{awk} programs @subentry shell variables in @c @cindex shell and @command{awk} interaction @command{awk} programs are often used as components in larger @@ -13819,7 +13857,7 @@ hold a pattern that the @command{awk} program searches for. There are two ways to get the value of the shell variable into the body of the @command{awk} program. -@cindex shells, quoting +@cindex shells @subentry quoting A common method is to use shell quoting to substitute the variable's value into the program inside the script. For example, consider the following program: @@ -13895,12 +13933,12 @@ in outline, an @command{awk} program generally looks like this: @dots{} @end display -@cindex @code{@{@}} (braces), actions and -@cindex braces (@code{@{@}}), actions and -@cindex separators, for statements in actions -@cindex newlines, separating statements in actions -@cindex @code{;} (semicolon), separating statements in actions -@cindex semicolon (@code{;}), separating statements in actions +@cindex @code{@{@}} (braces) @subentry actions and +@cindex braces (@code{@{@}}) @subentry actions and +@cindex separators @subentry for statements in actions +@cindex newlines @subentry separating statements in actions +@cindex @code{;} (semicolon) @subentry separating statements in actions +@cindex semicolon (@code{;}) @subentry separating statements in actions An action consists of one or more @command{awk} @dfn{statements}, enclosed in braces (@samp{@{@r{@dots{}}@}}). Each statement specifies one thing to do. The statements are separated by newlines or semicolons. @@ -13917,7 +13955,7 @@ well. An omitted action is equivalent to @samp{@{ print $0 @}}: The following types of statements are supported in @command{awk}: @table @asis -@cindex side effects, statements +@cindex side effects @subentry statements @item Expressions Call functions or assign values to variables (@pxref{Expressions}). Executing @@ -13956,21 +13994,21 @@ For deleting array elements. @node Statements @section Control Statements in Actions @cindex control statements -@cindex statements, control, in actions -@cindex actions, control statements in +@cindex statements @subentry control, in actions +@cindex actions @subentry control statements in @dfn{Control statements}, such as @code{if}, @code{while}, and so on, control the flow of execution in @command{awk} programs. Most of @command{awk}'s control statements are patterned after similar statements in C. -@cindex compound statements@comma{} control statements and -@cindex statements, compound@comma{} control statements and -@cindex body, in actions -@cindex @code{@{@}} (braces), statements, grouping -@cindex braces (@code{@{@}}), statements, grouping -@cindex newlines, separating statements in actions -@cindex @code{;} (semicolon), separating statements in actions -@cindex semicolon (@code{;}), separating statements in actions +@cindex compound statements, control statements and +@cindex statements @subentry compound, control statements and +@cindex body @subentry in actions +@cindex @code{@{@}} (braces) @subentry statements, grouping +@cindex braces (@code{@{@}}) @subentry statements, grouping +@cindex newlines @subentry separating statements in actions +@cindex @code{;} (semicolon) @subentry separating statements in actions +@cindex semicolon (@code{;}) @subentry separating statements in actions All the control statements start with special keywords, such as @code{if} and @code{while}, to distinguish them from simple expressions. Many control statements contain other statements. For example, the @@ -14052,8 +14090,8 @@ the first thing on its line. @subsection The @code{while} Statement @cindex @code{while} statement @cindex loops -@cindex loops, @code{while} -@cindex loops, See Also @code{while} statement +@cindex loops @subentry @code{while} +@cindex loops @seealso{@code{while} statement} In programming, a @dfn{loop} is a part of a program that can be executed two or more times in succession. @@ -14066,7 +14104,7 @@ while (@var{condition}) @var{body} @end example -@cindex body, in loops +@cindex body @subentry in loops @noindent @var{body} is a statement called the @dfn{body} of the loop, and @var{condition} is an expression that controls how long the loop @@ -14114,7 +14152,7 @@ program is harder to read without it. @node Do Statement @subsection The @code{do}-@code{while} Statement @cindex @code{do}-@code{while} statement -@cindex loops, @code{do}-@code{while} +@cindex loops @subentry @code{do}-@code{while} The @code{do} loop is a variation of the @code{while} looping statement. The @code{do} loop executes the @var{body} once and then repeats the @@ -14160,7 +14198,7 @@ occasionally is there a real use for a @code{do} statement. @node For Statement @subsection The @code{for} Statement @cindex @code{for} statement -@cindex loops, @code{for}, iterative +@cindex loops @subentry @code{for} @subentry iterative The @code{for} statement makes it more convenient to count iterations of a loop. The general form of the @code{for} statement looks like this: @@ -14235,7 +14273,7 @@ while (@var{condition}) @{ @} @end example -@cindex loops, @code{continue} statements and +@cindex loops @subentry @code{continue} statement and @noindent The only exception is when the @code{continue} statement (@pxref{Continue Statement}) is used @@ -14337,8 +14375,8 @@ described in @ref{Getopt Function}.) @node Break Statement @subsection The @code{break} Statement @cindex @code{break} statement -@cindex loops, exiting -@cindex loops, @code{break} statement and +@cindex loops @subentry exiting +@cindex loops @subentry @code{break} statement and The @code{break} statement jumps out of the innermost @code{for}, @code{while}, or @code{do} loop that encloses it. The following example @@ -14399,9 +14437,9 @@ This is discussed in @ref{Switch Statement}. @c @cindex @code{break}, outside of loops @c @cindex historical features @c @cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk}, @code{break} statement and -@cindex dark corner, @code{break} statement -@cindex @command{gawk}, @code{break} statement in +@cindex POSIX @command{awk} @subentry @code{break} statement and +@cindex dark corner @subentry @code{break} statement +@cindex @command{gawk} @subentry @code{break} statement in @cindex Brian Kernighan's @command{awk} The @code{break} statement has no meaning when used outside the body of a loop or @code{switch}. @@ -14465,9 +14503,9 @@ the increment (@samp{x++}) is never reached. @c @cindex @code{continue}, outside of loops @c @cindex historical features @c @cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk}, @code{continue} statement and -@cindex dark corner, @code{continue} statement -@cindex @command{gawk}, @code{continue} statement in +@cindex POSIX @command{awk} @subentry @code{continue} statement and +@cindex dark corner @subentry @code{continue} statement +@cindex @command{gawk} @subentry @code{continue} statement in @cindex Brian Kernighan's @command{awk} The @code{continue} statement has no special meaning with respect to the @code{switch} statement, nor does it have any meaning when used outside the @@ -14500,7 +14538,7 @@ Contrast this with the effect of the @code{getline} function flow of control in any way (i.e., the rest of the current action executes with a new input record). -@cindex @command{awk} programs, execution of +@cindex @command{awk} programs @subentry execution of At the highest level, @command{awk} program execution is a loop that reads an input record and then tests each rule's pattern against it. If you think of this loop as a @code{for} statement whose body contains the @@ -14535,13 +14573,13 @@ then the code in any @code{END} rules is executed. The @code{next} statement is not allowed inside @code{BEGINFILE} and @code{ENDFILE} rules. @xref{BEGINFILE/ENDFILE}. -@c @cindex @command{awk} language, POSIX version @c @cindex @code{next}, inside a user-defined function -@cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and -@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and -@cindex POSIX @command{awk}, @code{next}/@code{nextfile} statements and -@cindex @code{next} statement, user-defined functions and -@cindex functions, user-defined, @code{next}/@code{nextfile} statements and +@cindex @command{awk} @subentry language, POSIX version +@cindex @code{BEGIN} pattern @subentry @code{next}/@code{nextfile} statements and +@cindex @code{END} pattern @subentry @code{next}/@code{nextfile} statements and +@cindex POSIX @command{awk} @subentry @code{next}/@code{nextfile} statements and +@cindex @code{next} statement @subentry user-defined functions and +@cindex functions @subentry user-defined @subentry @code{next}/@code{nextfile} statements and According to the POSIX standard, the behavior is undefined if the @code{next} statement is used in a @code{BEGIN} or @code{END} rule. @command{gawk} treats it as a syntax error. Although POSIX does not disallow it, @@ -14604,8 +14642,8 @@ inclusion into the POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=607, the Austin Group website}. @end quotation -@cindex functions, user-defined, @code{next}/@code{nextfile} statements and -@cindex @code{nextfile} statement, user-defined functions and +@cindex functions @subentry user-defined @subentry @code{next}/@code{nextfile} statements and +@cindex @code{nextfile} statement @subentry user-defined functions and @cindex Brian Kernighan's @command{awk} @cindex @command{mawk} utility The current version of BWK @command{awk} and @command{mawk} @@ -14627,8 +14665,8 @@ is ignored. The @code{exit} statement is written as follows: @code{exit} [@var{return code}] @end display -@cindex @code{BEGIN} pattern, @code{exit} statement and -@cindex @code{END} pattern, @code{exit} statement and +@cindex @code{BEGIN} pattern @subentry @code{exit} statement and +@cindex @code{END} pattern @subentry @code{exit} statement and When an @code{exit} statement is executed from a @code{BEGIN} rule, the program stops processing everything immediately. No input records are read. However, if an @code{END} rule is present, @@ -14651,7 +14689,7 @@ the @code{END} rule. @xref{Assert Function} for an example that does this. -@cindex dark corner, @code{exit} statement +@cindex dark corner @subentry @code{exit} statement If an argument is supplied to @code{exit}, its value is used as the exit status code for the @command{awk} process. If no argument is supplied, @code{exit} causes @command{awk} to return a ``success'' status. @@ -14661,7 +14699,7 @@ called a second time from an @code{END} rule with no argument, @command{awk} uses the previously supplied exit value. @value{DARKCORNER} @xref{Exit Status} for more information. -@cindex programming conventions, @code{exit} statement +@cindex programming conventions @subentry @code{exit} statement For example, suppose an error condition occurs that is difficult or impossible to handle. Conventionally, programs report this by exiting with a nonzero status. An @command{awk} program can do this @@ -14693,7 +14731,7 @@ results across different operating systems. @node Built-in Variables @section Predefined Variables @cindex predefined variables -@cindex variables, predefined +@cindex variables @subentry predefined Most @command{awk} variables are available to use for your own purposes; they never change unless your program assigns values to @@ -14704,7 +14742,7 @@ to tell @command{awk} how to do certain things. Others are set automatically by @command{awk}, so that they carry information from the internal workings of @command{awk} to your program. -@cindex @command{gawk}, predefined variables and +@cindex @command{gawk} @subentry predefined variables and This @value{SECTION} documents all of @command{gawk}'s predefined variables, most of which are also documented in the @value{CHAPTER}s describing their areas of activity. @@ -14719,7 +14757,7 @@ their areas of activity. @node User-modified @subsection Built-in Variables That Control @command{awk} -@cindex predefined variables, user-modifiable +@cindex predefined variables @subentry user-modifiable @cindex user-modifiable variables The following is an alphabetical list of variables that you can change to @@ -14734,8 +14772,8 @@ in the description of each variable.) @table @code @cindex @code{BINMODE} variable @cindex binary input/output -@cindex input/output, binary -@cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable +@cindex input/output @subentry binary +@cindex differences in @command{awk} and @command{gawk} @subentry @code{BINMODE} variable @item BINMODE # On non-POSIX systems, this variable specifies use of binary mode for all I/O. Numeric values of one, two, or three specify that input @@ -14751,9 +14789,9 @@ detail in @ref{PC Using}. @command{mawk} (@pxref{Other Versions}) also supports this variable, but only using numeric values. @cindex @code{CONVFMT} variable -@cindex POSIX @command{awk}, @code{CONVFMT} variable and -@cindex numbers, converting, to strings -@cindex strings, converting, numbers to +@cindex POSIX @command{awk} @subentry @code{CONVFMT} variable and +@cindex numbers @subentry converting @subentry to strings +@cindex strings @subentry converting @subentry numbers to @item @code{CONVFMT} A string that controls the conversion of numbers to strings (@pxref{Conversion}). @@ -14763,11 +14801,11 @@ It works by being passed, in effect, as the first argument to the Its default value is @code{"%.6g"}. @code{CONVFMT} was introduced by the POSIX standard. -@cindex @command{gawk}, @code{FIELDWIDTHS} variable in +@cindex @command{gawk} @subentry @code{FIELDWIDTHS} variable in @cindex @code{FIELDWIDTHS} variable -@cindex differences in @command{awk} and @command{gawk}, @code{FIELDWIDTHS} variable -@cindex field separators, @code{FIELDWIDTHS} variable and -@cindex separators, field, @code{FIELDWIDTHS} variable and +@cindex differences in @command{awk} and @command{gawk} @subentry @code{FIELDWIDTHS} variable +@cindex field separator @subentry @code{FIELDWIDTHS} variable and +@cindex separators @subentry field @subentry @code{FIELDWIDTHS} variable and @item FIELDWIDTHS # A space-separated list of columns that tells @command{gawk} how to split input with fixed columnar boundaries. @@ -14778,11 +14816,11 @@ Assigning a value to @code{FIELDWIDTHS} overrides the use of @code{FS} and @code{FPAT} for field splitting. @xref{Constant Size} for more information. -@cindex @command{gawk}, @code{FPAT} variable in +@cindex @command{gawk} @subentry @code{FPAT} variable in @cindex @code{FPAT} variable -@cindex differences in @command{awk} and @command{gawk}, @code{FPAT} variable -@cindex field separators, @code{FPAT} variable and -@cindex separators, field, @code{FPAT} variable and +@cindex differences in @command{awk} and @command{gawk} @subentry @code{FPAT} variable +@cindex field separator @subentry @code{FPAT} variable and +@cindex separators @subentry field @subentry @code{FPAT} variable and @item FPAT # A regular expression (as a string) that tells @command{gawk} to create the fields based on text that matches the regular expression. @@ -14791,8 +14829,8 @@ overrides the use of @code{FS} and @code{FIELDWIDTHS} for field splitting. @xref{Splitting By Content} for more information. @cindex @code{FS} variable -@cindex separators, field -@cindex field separators +@cindex separators @subentry field +@cindex field separator @item FS The input field separator (@pxref{Field Separators}). The value is a single-character string or a multicharacter regular @@ -14817,19 +14855,19 @@ You can set the value of @code{FS} on the command line using the awk -F, '@var{program}' @var{input-files} @end example -@cindex @command{gawk}, field separators and +@cindex @command{gawk} @subentry field separators and If @command{gawk} is using @code{FIELDWIDTHS} or @code{FPAT} for field splitting, assigning a value to @code{FS} causes @command{gawk} to return to the normal, @code{FS}-based field splitting. An easy way to do this is to simply say @samp{FS = FS}, perhaps with an explanatory comment. -@cindex @command{gawk}, @code{IGNORECASE} variable in +@cindex @command{gawk} @subentry @code{IGNORECASE} variable in @cindex @code{IGNORECASE} variable -@cindex differences in @command{awk} and @command{gawk}, @code{IGNORECASE} variable -@cindex case sensitivity, string comparisons and -@cindex case sensitivity, regexps and -@cindex regular expressions, case sensitivity +@cindex differences in @command{awk} and @command{gawk} @subentry @code{IGNORECASE} variable +@cindex case sensitivity @subentry string comparisons and +@cindex case sensitivity @subentry regexps and +@cindex regular expressions @subentry case sensitivity @item IGNORECASE # If @code{IGNORECASE} is nonzero or non-null, then all string comparisons and all regular expression matching are case-independent. @@ -14844,9 +14882,9 @@ and it does not affect field splitting when using a single-character field separator. @xref{Case-sensitivity}. -@cindex @command{gawk}, @code{LINT} variable in +@cindex @command{gawk} @subentry @code{LINT} variable in @cindex @code{LINT} variable -@cindex differences in @command{awk} and @command{gawk}, @code{LINT} variable +@cindex differences in @command{awk} and @command{gawk} @subentry @code{LINT} variable @cindex lint checking @item LINT # When this variable is true (nonzero or non-null), @command{gawk} @@ -14868,8 +14906,8 @@ of lint warnings during program execution is independent of the flavor of @command{awk} being executed. @cindex @code{OFMT} variable -@cindex numbers, converting, to strings -@cindex strings, converting, numbers to +@cindex numbers @subentry converting @subentry to strings +@cindex strings @subentry converting @subentry numbers to @item OFMT A string that controls conversion of numbers to strings (@pxref{Conversion}) for @@ -14880,10 +14918,10 @@ Its default value is @code{"%.6g"}. Earlier versions of @command{awk} used @code{OFMT} to specify the format for converting numbers to strings in general expressions; this is now done by @code{CONVFMT}. -@cindex @code{print} statement, @code{OFMT} variable and +@cindex @code{print} statement @subentry @code{OFMT} variable and @cindex @code{OFS} variable -@cindex separators, field -@cindex field separators +@cindex separators @subentry field +@cindex field separator @item OFS The output field separator (@pxref{Output Separators}). It is output between the fields printed by a @code{print} statement. Its @@ -14907,7 +14945,7 @@ numbers, by default @code{"N"} (@code{roundTiesToEven} in the IEEE 754 standard; @pxref{Setting the rounding mode}). @cindex @code{RS} variable -@cindex separators, for records +@cindex separators @subentry for records @cindex record separators @item @code{RS} The input record separator. Its default value is a string @@ -14927,7 +14965,7 @@ or if @command{gawk} is in compatibility mode just the first character of @code{RS}'s value is used. @cindex @code{SUBSEP} variable -@cindex separators, subscript +@cindex separators @subentry subscript @cindex subscript separators @item @code{SUBSEP} The subscript separator. It has the default value of @@ -14936,10 +14974,10 @@ multidimensional array. Thus, the expression @samp{@w{foo["A", "B"]}} really accesses @code{foo["A\034B"]} (@pxref{Multidimensional}). -@cindex @command{gawk}, @code{TEXTDOMAIN} variable in +@cindex @command{gawk} @subentry @code{TEXTDOMAIN} variable in @cindex @code{TEXTDOMAIN} variable -@cindex differences in @command{awk} and @command{gawk}, @code{TEXTDOMAIN} variable -@cindex internationalization, localization +@cindex differences in @command{awk} and @command{gawk} @subentry @code{TEXTDOMAIN} variable +@cindex internationalization @subentry localization @item TEXTDOMAIN # Used for internationalization of programs at the @command{awk} level. It sets the default text domain for specially @@ -14952,8 +14990,8 @@ The default value of @code{TEXTDOMAIN} is @code{"messages"}. @node Auto-set @subsection Built-in Variables That Convey Information -@cindex predefined variables, conveying information -@cindex variables, predefined, conveying information +@cindex predefined variables @subentry conveying information +@cindex variables @subentry predefined @subentry conveying information The following is an alphabetical list of variables that @command{awk} sets automatically on certain occasions in order to provide information to your program. @@ -14966,8 +15004,8 @@ mode (@pxref{Options}), they are not special: @c @asis for docbook @table @asis @cindex @code{ARGC}/@code{ARGV} variables -@cindex arguments, command-line -@cindex command line, arguments +@cindex arguments @subentry command-line +@cindex command line @subentry arguments @item @code{ARGC}, @code{ARGV} The command-line arguments available to @command{awk} programs are stored in an array called @code{ARGV}. @code{ARGC} is the number of command-line @@ -14995,12 +15033,12 @@ contains @samp{inventory-shipped}, and @code{ARGV[2]} contains index of the last element in @code{ARGV}, because the elements are numbered from zero. -@cindex programming conventions, @code{ARGC}/@code{ARGV} variables +@cindex programming conventions @subentry @code{ARGC}/@code{ARGV} variables The names @code{ARGC} and @code{ARGV}, as well as the convention of indexing the array from 0 to @code{ARGC} @minus{} 1, are derived from the C language's method of accessing command-line arguments. -@cindex dark corner, value of @code{ARGV[0]} +@cindex dark corner @subentry value of @code{ARGV[0]} The value of @code{ARGV[0]} can vary from system to system. Also, you should note that the program text is @emph{not} included in @code{ARGV}, nor are any of @command{awk}'s command-line options. @@ -15009,7 +15047,7 @@ about how @command{awk} uses these variables. @value{DARKCORNER} @cindex @code{ARGIND} variable -@cindex differences in @command{awk} and @command{gawk}, @code{ARGIND} variable +@cindex differences in @command{awk} and @command{gawk} @subentry @code{ARGIND} variable @item @code{ARGIND #} The index in @code{ARGV} of the current file being processed. Every time @command{gawk} opens a new @value{DF} for processing, it sets @@ -15017,18 +15055,18 @@ Every time @command{gawk} opens a new @value{DF} for processing, it sets When @command{gawk} is processing the input files, @samp{FILENAME == ARGV[ARGIND]} is always true. -@cindex files, processing@comma{} @code{ARGIND} variable and +@cindex files @subentry processing, @code{ARGIND} variable and This variable is useful in file processing; it allows you to tell how far along you are in the list of @value{DF}s as well as to distinguish between successive instances of the same @value{FN} on the command line. -@cindex file names, distinguishing +@cindex file names @subentry distinguishing While you can change the value of @code{ARGIND} within your @command{awk} program, @command{gawk} automatically sets it to a new value when it opens the next file. @cindex @code{ENVIRON} array -@cindex environment variables, in @code{ENVIRON} array +@cindex environment variables @subentry in @code{ENVIRON} array @item @code{ENVIRON} An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of @@ -15064,10 +15102,10 @@ On such systems, the @code{ENVIRON} array is empty (except for @pxref{AWKLIBPATH Variable}). @end ifnotdocbook -@cindex @command{gawk}, @code{ERRNO} variable in +@cindex @command{gawk} @subentry @code{ERRNO} variable in @cindex @code{ERRNO} variable -@cindex differences in @command{awk} and @command{gawk}, @code{ERRNO} variable -@cindex error handling, @code{ERRNO} variable and +@cindex differences in @command{awk} and @command{gawk} @subentry @code{ERRNO} variable +@cindex error handling @subentry @code{ERRNO} variable and @item @code{ERRNO #} If a system error occurs during a redirection for @code{getline}, during a read for @code{getline}, or during a @code{close()} operation, then @@ -15090,7 +15128,7 @@ of @code{errno}. For non-system errors, @code{PROCINFO["errno"]} will be zero. @cindex @code{FILENAME} variable -@cindex dark corner, @code{FILENAME} variable +@cindex dark corner @subentry @code{FILENAME} variable @item @code{FILENAME} The name of the current input file. When no @value{DF}s are listed on the command line, @command{awk} reads from the standard input and @@ -15124,8 +15162,8 @@ to @code{NF} can be used to create fields in or remove fields from the current record. @xref{Changing Fields}. @cindex @code{FUNCTAB} array -@cindex @command{gawk}, @code{FUNCTAB} array in -@cindex differences in @command{awk} and @command{gawk}, @code{FUNCTAB} variable +@cindex @command{gawk} @subentry @code{FUNCTAB} array in +@cindex differences in @command{awk} and @command{gawk} @subentry @code{FUNCTAB} variable @item @code{FUNCTAB #} An array whose indices and corresponding values are the names of all the built-in, user-defined, and extension functions in the program. @@ -15143,9 +15181,9 @@ the beginning of the program's execution (@pxref{Records}). @command{awk} increments @code{NR} each time it reads a new record. -@cindex @command{gawk}, @code{PROCINFO} array in +@cindex @command{gawk} @subentry @code{PROCINFO} array in @cindex @code{PROCINFO} array -@cindex differences in @command{awk} and @command{gawk}, @code{PROCINFO} array +@cindex differences in @command{awk} and @command{gawk} @subentry @code{PROCINFO} array @item @code{PROCINFO #} The elements of this array provide access to information about the running @command{awk} program. @@ -15154,7 +15192,7 @@ are guaranteed to be available: @table @code @item PROCINFO["argv"] -@cindex command line, arguments +@cindex command line @subentry arguments The @code{PROCINFO["argv"]} array contains all of the command-line arguments (after glob expansion and redirection processing on platforms where that must be done manually by the program) with subscripts ranging from 0 through @@ -15234,6 +15272,7 @@ while the program runs. @item PROCINFO["platform"] @cindex platform running on +@cindex @code{PROCINFO} array @subentry platform running on This element gives a string indicating the platform for which @command{gawk} was compiled. The value will be one of the following: @@ -15277,8 +15316,8 @@ Assigning a new value to this element changes the default. The value of the @code{getuid()} system call. @item PROCINFO["version"] -@cindex version of @command{gawk} -@cindex @command{gawk} version +@cindex version of @subentry @command{gawk} +@cindex @command{gawk} @subentry version of The version of @command{gawk}. @end table @@ -15289,10 +15328,10 @@ if your version of @command{gawk} supports arbitrary-precision arithmetic @table @code @item PROCINFO["gmp_version"] -@cindex version of GNU MP library +@cindex version of @subentry GNU MP library The version of the GNU MP library. -@cindex version of GNU MPFR library +@cindex version of @subentry GNU MPFR library @item PROCINFO["mpfr_version"] The version of the GNU MPFR library. @@ -15312,8 +15351,8 @@ of @command{gawk} supports dynamic loading of extension functions @table @code @item PROCINFO["api_major"] -@cindex version of @command{gawk} extension API -@cindex extension API, version number +@cindex version of @subentry @command{gawk} extension API +@cindex extension API @subentry version number The major version of the extension API. @item PROCINFO["api_minor"] @@ -15385,16 +15424,16 @@ The start index in characters of the substring that is matched by the is the position of the string where the matched substring starts, or zero if no match was found. -@cindex @command{gawk}, @code{RT} variable in +@cindex @command{gawk} @subentry @code{RT} variable in @cindex @code{RT} variable -@cindex differences in @command{awk} and @command{gawk}, @code{RS}/@code{RT} variables +@cindex differences in @command{awk} and @command{gawk} @subentry @code{RS}/@code{RT} variables @item @code{RT #} The input text that matched the text denoted by @code{RS}, the record separator. It is set every time a record is read. -@cindex @command{gawk}, @code{SYMTAB} array in +@cindex @command{gawk} @subentry @code{SYMTAB} array in @cindex @code{SYMTAB} array -@cindex differences in @command{awk} and @command{gawk}, @code{SYMTAB} variable +@cindex differences in @command{awk} and @command{gawk} @subentry @code{SYMTAB} variable @item @code{SYMTAB #} An array whose indices are the names of all defined global variables and arrays in the program. @code{SYMTAB} makes @command{gawk}'s symbol table @@ -15471,9 +15510,9 @@ is available as an element within the @code{SYMTAB} array. @end table @sidebar Changing @code{NR} and @code{FNR} -@cindex @code{NR} variable, changing -@cindex @code{FNR} variable, changing -@cindex dark corner, @code{FNR}/@code{NR} variables +@cindex @code{NR} variable @subentry changing +@cindex @code{FNR} variable @subentry changing +@cindex dark corner @subentry @code{FNR}/@code{NR} variables @command{awk} increments @code{NR} and @code{FNR} each time it reads a record, instead of setting them to the absolute value of the number of records read. This means that a program can @@ -15504,9 +15543,9 @@ changed. @node ARGC and ARGV @subsection Using @code{ARGC} and @code{ARGV} -@cindex @code{ARGC}/@code{ARGV} variables, how to use -@cindex arguments, command-line -@cindex command line, arguments +@cindex @code{ARGC}/@code{ARGV} variables @subentry how to use +@cindex arguments @subentry command-line +@cindex command line @subentry arguments @ref{Auto-set} presented the following program describing the information contained in @code{ARGC} @@ -15619,7 +15658,7 @@ BEGIN @{ @} @end example -@cindex differences in @command{awk} and @command{gawk}, @code{ARGC}/@code{ARGV} variables +@cindex differences in @command{awk} and @command{gawk} @subentry @code{ARGC}/@code{ARGV} variables Ending the @command{awk} options with @option{--} isn't necessary in @command{gawk}. Unless @option{--posix} has been specified, @command{gawk} silently puts any unrecognized options @@ -15835,10 +15874,10 @@ Only the values are stored; the indices are implicit from the order of the values. Here, eight is the value at index zero, because eight appears in the position with zero elements before it. -@cindex arrays, indexing +@cindex arrays @subentry indexing @cindex indexing arrays @cindex associative arrays -@cindex arrays, associative +@cindex arrays @subentry associative Arrays in @command{awk} are different---they are @dfn{associative}. This means that each array is a collection of pairs---an index and its corresponding array element value: @@ -15961,7 +16000,7 @@ whose value is @w{@code{"number ten"}}. The result is: @noindent @cindex sparse arrays -@cindex arrays, sparse +@cindex arrays @subentry sparse Now the array is @dfn{sparse}, which just means some indices are missing. It has elements 0--3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or 9. @@ -16029,10 +16068,10 @@ array subscripts; this is discussed in more detail in Here, the number @code{1} isn't double-quoted, because @command{awk} automatically converts it to a string. -@cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex case sensitivity, array indices and -@cindex arrays, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array indices and +@cindex @command{gawk} @subentry @code{IGNORECASE} variable in +@cindex case sensitivity @subentry array indices and +@cindex arrays @subentry @code{IGNORECASE} variable and +@cindex @code{IGNORECASE} variable @subentry array indices and The value of @code{IGNORECASE} has no effect upon array subscripting. The identical string value used to store an array element must be used to retrieve it. @@ -16046,9 +16085,9 @@ is independent of the number of elements in the array. @node Reference to Elements @subsection Referring to an Array Element -@cindex arrays, referencing elements +@cindex arrays @subentry referencing elements @cindex array members -@cindex elements of arrays +@cindex elements in arrays The principal way to use an array is to refer to one of its elements. An @dfn{array reference} is an expression as follows: @@ -16068,7 +16107,7 @@ The value of the array reference is the current value of that array element. For example, @code{foo[4.3]} is an expression referencing the element of array @code{foo} at index @samp{4.3}. -@cindex arrays, unassigned elements +@cindex arrays @subentry unassigned elements @cindex unassigned array elements @cindex empty array elements A reference to an array element that has no recorded value yields a value of @@ -16077,7 +16116,7 @@ that have not been assigned any value as well as elements that have been deleted (@pxref{Delete}). @cindex non-existent array elements -@cindex arrays, elements that don't exist +@cindex arrays @subentry elements @subentry that don't exist @quotation NOTE A reference to an element that does not exist @emph{automatically} creates that array element, with the null string as its value. (In some cases, @@ -16098,7 +16137,7 @@ an array element equal to the empty string. @end quotation @c @cindex arrays, @code{in} operator and -@cindex @code{in} operator, testing if array element exists +@cindex @code{in} operator @subentry testing if array element exists To determine whether an element exists in an array at a certain index, use the following expression: @@ -16106,7 +16145,7 @@ the following expression: @var{indx} in @var{array} @end example -@cindex side effects, array indexing +@cindex side effects @subentry array indexing @noindent This expression tests whether the particular index @var{indx} exists, without the side effect of creating that element if it is not present. @@ -16139,8 +16178,8 @@ if (frequencies[2] != "") @node Assigning Elements @subsection Assigning Array Elements -@cindex arrays, elements, assigning values -@cindex elements in arrays, assigning values +@cindex arrays @subentry elements @subentry assigning values +@cindex elements in arrays @subentry assigning values Array elements can be assigned values just like @command{awk} variables: @@ -16157,7 +16196,7 @@ assign to that element of the array. @node Array Example @subsection Basic Array Example -@cindex arrays, an example of using +@cindex arrays @subentry example of using The following program takes a list of lines, each beginning with a line number, and prints them out in order of line number. The line numbers @@ -16232,10 +16271,10 @@ END @{ @node Scanning an Array @subsection Scanning All Elements of an Array -@cindex elements in arrays, scanning +@cindex elements in arrays @subentry scanning @cindex scanning arrays -@cindex arrays, scanning -@cindex loops, @code{for}, array scanning +@cindex arrays @subentry scanning +@cindex loops @subentry @code{for} @subentry array scanning In programs that use arrays, it is often necessary to use a loop that executes once for each element of an array. In other languages, where @@ -16254,12 +16293,12 @@ for (@var{var} in @var{array}) @end example @noindent -@cindex @code{in} operator, use in loops +@cindex @code{in} operator @subentry use in loops This loop executes @var{body} once for each index in @var{array} that the program has previously used, with the variable @var{var} set to that index. -@cindex arrays, @code{for} statement and -@cindex @code{for} statement, looping over arrays +@cindex arrays @subentry @code{for} statement and +@cindex @code{for} statement @subentry looping over arrays The following program uses this form of the @code{for} statement. The first rule scans the input records and notes which words appear (at least once) in the input, by storing a one into the array @code{used} with @@ -16297,9 +16336,9 @@ END @{ @xref{Word Sorting} for a more detailed example of this type. -@cindex arrays, elements, order of access by @code{in} operator -@cindex elements in arrays, order of access by @code{in} operator -@cindex @code{in} operator, order of array access +@cindex arrays @subentry elements @subentry order of access by @code{in} operator +@cindex elements in arrays @subentry order of access by @code{in} operator +@cindex @code{in} operator @subentry order of array access The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within @command{awk} and in standard @command{awk} cannot be controlled @@ -16377,7 +16416,7 @@ to use for comparison of array elements. This advanced feature is described later in @ref{Array Sorting}. @end itemize -@cindex @code{PROCINFO}, values of @code{sorted_in} +@cindex @code{PROCINFO} array @subentry values of @code{sorted_in} The following special values for @code{PROCINFO["sorted_in"]} are available: @table @code @@ -16512,11 +16551,11 @@ sorting arrays; see @ref{Array Sorting Functions}. @node Numeric Array Subscripts @section Using Numbers to Subscript Arrays -@cindex numbers, as array subscripts -@cindex array subscripts, numbers as -@cindex arrays, numeric subscripts -@cindex subscripts in arrays, numbers as -@cindex @code{CONVFMT} variable, array subscripts and +@cindex numbers @subentry as array subscripts +@cindex array subscripts @subentry numbers as +@cindex arrays @subentry numeric subscripts +@cindex subscripts in arrays @subentry numbers as +@cindex @code{CONVFMT} variable @subentry array subscripts and An important aspect to remember about arrays is that @emph{array subscripts are always strings}. When a numeric value is used as a subscript, it is converted to a string value before being used for subscripting @@ -16546,7 +16585,7 @@ string value from @code{xyz}---this time @code{"12.15"}---because the value of @code{CONVFMT} only allows two significant digits. This test fails, because @code{"12.15"} is different from @code{"12.153"}. -@cindex converting integer array subscripts +@cindex converting @subentry integer array subscripts to strings @cindex integer array indices According to the rules for conversions (@pxref{Conversion}), integer @@ -16580,10 +16619,10 @@ effect on your programs. @node Uninitialized Subscripts @section Using Uninitialized Variables as Subscripts -@cindex variables, uninitialized@comma{} as array subscripts +@cindex variables @subentry uninitialized, as array subscripts @cindex uninitialized variables, as array subscripts -@cindex subscripts in arrays, uninitialized variables as -@cindex arrays, subscripts, uninitialized variables as +@cindex subscripts in arrays @subentry uninitialized variables as +@cindex arrays @subentry subscripts, uninitialized variables as Suppose it's necessary to write a program to print the input data in reverse order. A reasonable attempt to do so (with some test @@ -16627,10 +16666,10 @@ Here, the @samp{++} forces @code{lines} to be numeric, thus making the ``old value'' numeric zero. This is then converted to @code{"0"} as the array subscript. -@cindex array subscripts, null strings as -@cindex null strings, as array subscripts -@cindex dark corner, array subscripts -@cindex lint checking, array subscripts +@cindex array subscripts @subentry null string as +@cindex null strings @subentry as array subscripts +@cindex dark corner @subentry array subscripts +@cindex lint checking @subentry array subscripts Even though it is somewhat unusual, the null string (@code{""}) is a valid array subscript. @value{DARKCORNER} @@ -16641,9 +16680,9 @@ on the command line (@pxref{Options}). @node Delete @section The @code{delete} Statement @cindex @code{delete} statement -@cindex deleting elements in arrays -@cindex arrays, elements, deleting -@cindex elements in arrays, deleting +@cindex deleting @subentry elements in arrays +@cindex arrays @subentry elements @subentry deleting +@cindex elements in arrays @subentry deleting To remove an individual element of an array, use the @code{delete} statement: @@ -16674,7 +16713,7 @@ if (4 in foo) print "This will never be printed" @end example -@cindex null strings, deleting array elements and +@cindex null strings @subentry deleting array elements and It is important to note that deleting an element is @emph{not} the same as assigning it a null value (the empty string, @code{""}). For example: @@ -16687,19 +16726,19 @@ if (4 in foo) @end group @end example -@cindex lint checking, array elements +@cindex lint checking @subentry array subscripts It is not an error to delete an element that does not exist. However, if @option{--lint} is provided on the command line (@pxref{Options}), @command{gawk} issues a warning message when an element that is not in the array is deleted. -@cindex common extensions, @code{delete} to delete entire arrays -@cindex extensions, common@comma{} @code{delete} to delete entire arrays -@cindex arrays, deleting entire contents -@cindex deleting entire arrays +@cindex common extensions @subentry @code{delete} to delete entire arrays +@cindex extensions @subentry common @subentry @code{delete} to delete entire arrays +@cindex arrays @subentry deleting entire contents +@cindex deleting @subentry entire arrays @cindex @code{delete} @var{array} -@cindex differences in @command{awk} and @command{gawk}, array elements, deleting +@cindex differences in @command{awk} and @command{gawk} @subentry array elements, deleting All the elements of an array may be deleted with a single statement by leaving off the subscript in the @code{delete} statement, as follows: @@ -16725,7 +16764,7 @@ POSIX standard. See @uref{http://austingroupbugs.net/view.php?id=544, the Austin Group website}. @end quotation -@cindex portability, deleting array elements +@cindex portability @subentry deleting array elements @cindex Brennan, Michael The following statement provides a portable but nonobvious way to clear out an array:@footnote{Thanks to Michael Brennan for pointing this out.} @@ -16734,7 +16773,7 @@ out an array:@footnote{Thanks to Michael Brennan for pointing this out.} split("", array) @end example -@cindex @code{split()} function, array elements@comma{} deleting +@cindex @code{split()} function @subentry array elements, deleting The @code{split()} function (@pxref{String Functions}) clears out the target array first. This call asks it to split @@ -16760,8 +16799,8 @@ a = 3 * Multiscanning:: Scanning multidimensional arrays. @end menu -@cindex subscripts in arrays, multidimensional -@cindex arrays, multidimensional +@cindex subscripts in arrays @subentry multidimensional +@cindex arrays @subentry multidimensional A @dfn{multidimensional array} is an array in which an element is identified by a sequence of indices instead of a single index. For example, a two-dimensional array requires two indices. The usual way (in many @@ -16769,7 +16808,7 @@ languages, including @command{awk}) to refer to an element of a two-dimensional array named @code{grid} is with @code{grid[@var{x},@var{y}]}. -@cindex @code{SUBSEP} variable, multidimensional arrays and +@cindex @code{SUBSEP} variable @subentry multidimensional arrays and Multidimensional arrays are supported in @command{awk} through concatenation of indices into one string. @command{awk} converts the indices into strings @@ -16801,7 +16840,7 @@ combined strings that are ambiguous. Suppose that @code{SUBSEP} is "b@@c"]}} are indistinguishable because both are actually stored as @samp{foo["a@@b@@c"]}. -@cindex @code{in} operator, index existence in multidimensional arrays +@cindex @code{in} operator @subentry index existence in multidimensional arrays To test whether a particular index sequence exists in a multidimensional array, use the same operator (@code{in}) that is used for single-dimensional arrays. Write the whole sequence of indices @@ -16870,8 +16909,8 @@ There is no special @code{for} statement for scanning a multidimensional arrays or elements---there is only a multidimensional @emph{way of accessing} an array. -@cindex subscripts in arrays, multidimensional, scanning -@cindex arrays, multidimensional, scanning +@cindex subscripts in arrays @subentry multidimensional @subentry scanning +@cindex arrays @subentry multidimensional @subentry scanning @cindex scanning multidimensional arrays However, if your program has an array that is always accessed as multidimensional, you can get the effect of scanning it by combining @@ -16914,7 +16953,7 @@ separate indices is recovered. @node Arrays of Arrays @section Arrays of Arrays -@cindex arrays of arrays +@cindex arrays @subentry arrays of arrays @command{gawk} goes beyond standard @command{awk}'s multidimensional array access and provides true arrays of @@ -17136,7 +17175,7 @@ element is itself a subarray. @node Functions @chapter Functions -@cindex functions, built-in +@cindex functions @subentry built-in @cindex built-in functions This @value{CHAPTER} describes @command{awk}'s built-in functions, which fall into three categories: numeric, string, and I/O. @@ -17190,17 +17229,17 @@ the function followed by arguments in parentheses. For example, @samp{atan2(y + z, 1)} is a call to the function @code{atan2()} and has two arguments. -@cindex programming conventions, functions, calling -@cindex whitespace, functions@comma{} calling +@cindex programming conventions @subentry functions @subentry calling +@cindex whitespace @subentry functions, calling Whitespace is ignored between the built-in function name and the opening parenthesis, but nonetheless it is good practice to avoid using whitespace there. User-defined functions do not permit whitespace in this way, and it is easier to avoid mistakes by following a simple convention that always works---no whitespace after a function name. -@cindex troubleshooting, @command{gawk}, fatal errors@comma{} function arguments -@cindex @command{gawk}, function arguments and -@cindex differences in @command{awk} and @command{gawk}, function arguments (@command{gawk}) +@cindex troubleshooting @subentry @command{gawk} @subentry fatal errors, function arguments +@cindex @command{gawk} @subentry function arguments and +@cindex differences in @command{awk} and @command{gawk} @subentry function arguments Each built-in function accepts a certain number of arguments. In some cases, arguments can be omitted. The defaults for omitted arguments vary from function to function and are described under the @@ -17217,9 +17256,9 @@ i = 4 j = sqrt(i++) @end example -@cindex evaluation order, functions -@cindex functions, built-in, evaluation order -@cindex built-in functions, evaluation order +@cindex evaluation order @subentry functions +@cindex functions @subentry built-in @subentry evaluation order +@cindex built-in functions @subentry evaluation order @noindent the variable @code{i} is incremented to the value five before @code{sqrt()} is called with a value of four for its actual parameter. @@ -17241,7 +17280,7 @@ two arguments 11 and 10. @node Numeric Functions @subsection Numeric Functions -@cindex numeric functions +@cindex numeric @subentry functions The following list describes all of the built-in functions that work with numbers. @@ -17310,7 +17349,7 @@ is negative. @cindex Beebe, Nelson H.F.@: @item @code{rand()} @cindexawkfunc{rand} -@cindex random numbers, @code{rand()}/@code{srand()} functions +@cindex random numbers @subentry @code{rand()}/@code{srand()} functions Return a random number. The values of @code{rand()} are uniformly distributed between zero and one. The value could be zero but is never one.@footnote{The C version of @@ -17356,7 +17395,7 @@ function roll(n) @{ return 1 + int(rand() * n) @} @end example @cindex seeding random number generator -@cindex random numbers, seed of +@cindex random numbers @subentry seed of @quotation CAUTION In most @command{awk} implementations, including @command{gawk}, @code{rand()} starts generating numbers from the same @@ -17459,7 +17498,7 @@ pound sign (@samp{#}). They are not available in compatibility mode @itemx @code{asorti(}@var{source} [@code{,} @var{dest} [@code{,} @var{how} ] ]@code{) #} @cindexgawkfunc{asorti} @cindex sort array -@cindex arrays, elements, retrieving number of +@cindex arrays @subentry elements @subentry retrieving number of @cindexgawkfunc{asort} @cindex sort array indices These two functions are similar in behavior, so they are described @@ -17479,7 +17518,7 @@ sequential integers starting with one. If the optional array @var{dest} is specified, then @var{source} is duplicated into @var{dest}. @var{dest} is then sorted, leaving the indices of @var{source} unchanged. -@cindex @command{gawk}, @code{IGNORECASE} variable in +@cindex @command{gawk} @subentry @code{IGNORECASE} variable in When comparing strings, @code{IGNORECASE} affects the sorting (@pxref{Array Sorting Functions}). If the @var{source} array contains subarrays as values (@pxref{Arrays of @@ -17610,7 +17649,7 @@ and the third argument must be assignable. @item @code{index(@var{in}, @var{find})} @cindexawkfunc{index} -@cindex search in string +@cindex search for substring @cindex find substring in string Search the string @var{in} for the first occurrence of the string @var{find}, and return the position in characters where that occurrence @@ -17624,7 +17663,7 @@ $ @kbd{awk 'BEGIN @{ print index("peanut", "an") @}'} @noindent If @var{find} is not found, @code{index()} returns zero. -@cindex dark corner, regexp as second argument to @code{index()} +@cindex dark corner @subentry regexp as second argument to @code{index()} With BWK @command{awk} and @command{gawk}, it is a fatal error to use a regexp constant for @var{find}. Other implementations allow it, simply treating the regexp @@ -17632,7 +17671,7 @@ constant as an expression meaning @samp{$0 ~ /regexp/}. @value{DARKCORNER} @item @code{length(}[@var{string}]@code{)} @cindexawkfunc{length} -@cindex string length +@cindex string @subentry length @cindex length of string Return the number of characters in @var{string}. If @var{string} is a number, the length of the digit string representing @@ -17657,8 +17696,8 @@ three characters. If no argument is supplied, @code{length()} returns the length of @code{$0}. @c @cindex historical features -@cindex portability, @code{length()} function -@cindex POSIX @command{awk}, functions and, @code{length()} +@cindex portability @subentry @code{length()} function +@cindex POSIX @command{awk} @subentry functions and @subentry @code{length()} @quotation NOTE In older versions of @command{awk}, the @code{length()} function could be called @@ -17668,7 +17707,7 @@ support historical practice. For programs to be maximally portable, always supply the parentheses. @end quotation -@cindex dark corner, @code{length()} function +@cindex dark corner @subentry @code{length()} function If @code{length()} is called with a variable that has not been used, @command{gawk} forces the variable to be a scalar. Other implementations of @command{awk} leave the variable without a type. @@ -17689,11 +17728,11 @@ If @option{--lint} has been specified on the command line, @command{gawk} issues a warning about this. -@cindex common extensions, @code{length()} applied to an array -@cindex extensions, common@comma{} @code{length()} applied to an array -@cindex differences in @command{awk} and @command{gawk}, @code{length()} function +@cindex common extensions @subentry @code{length()} applied to an array +@cindex extensions @subentry common @subentry @code{length()} applied to an array +@cindex differences in @command{awk} and @command{gawk} @subentry @code{length()} function @cindex number of array elements -@cindex array, number of elements +@cindex arrays @subentry number of elements With @command{gawk} and several other @command{awk} implementations, when given an array argument, the @code{length()} function returns the number of elements in the array. @value{COMMONEXT} @@ -17708,7 +17747,7 @@ If @option{--posix} is supplied, using an array argument is a fatal error @item @code{match(@var{string}, @var{regexp}} [@code{, @var{array}}]@code{)} @cindexawkfunc{match} -@cindex string, regular expression match +@cindex string @subentry regular expression match of @cindex match regexp in string Search @var{string} for the longest, leftmost substring matched by the regular expression @@ -17729,9 +17768,11 @@ functions that work with regular expressions, such as for @code{match()}, the order is the same as for the @samp{~} operator: @samp{@var{string} ~ @var{regexp}}. -@cindex @code{RSTART} variable, @code{match()} function and -@cindex @code{RLENGTH} variable, @code{match()} function and -@cindex @code{match()} function, @code{RSTART}/@code{RLENGTH} variables +@cindex @code{RSTART} variable @subentry @code{match()} function and +@cindex @code{RLENGTH} variable @subentry @code{match()} function and +@cindex @code{match()} function @subentry @code{RSTART}/@code{RLENGTH} variables +@cindex @code{match()} function @subentry side effects +@cindex side effects @subentry @code{match()} function The @code{match()} function sets the predefined variable @code{RSTART} to the index. It also sets the predefined variable @code{RLENGTH} to the length in characters of the matched substring. If no match is found, @@ -17779,7 +17820,7 @@ Match of ru+n found at 12 in My program runs Match of Melvin found at 1 in Melvin was here. @end example -@cindex differences in @command{awk} and @command{gawk}, @code{match()} function +@cindex differences in @command{awk} and @command{gawk} @subentry @code{match()} function If @var{array} is present, it is cleared, and then the zeroth element of @var{array} is set to the entire portion of @var{string} matched by @var{regexp}. If @var{regexp} contains parentheses, @@ -17816,7 +17857,7 @@ subexpression, because they may not all have matched text; thus, they should be tested for with the @code{in} operator (@pxref{Reference to Elements}). -@cindex troubleshooting, @code{match()} function +@cindex troubleshooting @subentry @code{match()} function The @var{array} argument to @code{match()} is a @command{gawk} extension. In compatibility mode (@pxref{Options}), @@ -17879,7 +17920,7 @@ split("cul-de-sac", a, "-", seps) @end example @noindent -@cindex strings splitting, example +@cindex strings @subentry splitting, example splits the string @code{"cul-de-sac"} into three fields using @samp{-} as the separator. It sets the contents of the array @code{a} as follows: @@ -17899,7 +17940,7 @@ seps[2] = "-" @noindent The value returned by this call to @code{split()} is three. -@cindex differences in @command{awk} and @command{gawk}, @code{split()} function +@cindex differences in @command{awk} and @command{gawk} @subentry @code{split()} function As with input field-splitting, when the value of @var{fieldsep} is @w{@code{" "}}, leading and trailing whitespace is ignored in values assigned to the elements of @@ -17915,7 +17956,7 @@ Note, however, that @code{RS} has no effect on the way @code{split()} works. Even though @samp{RS = ""} causes the newline character to also be an input field separator, this does not affect how @code{split()} splits strings. -@cindex dark corner, @code{split()} function +@cindex dark corner @subentry @code{split()} function Modern implementations of @command{awk}, including @command{gawk}, allow the third argument to be a regexp constant (@w{@code{/}@dots{}@code{/}}) as well as a string. @value{DARKCORNER} @@ -17935,11 +17976,12 @@ If @var{string} does not match @var{fieldsep} at all (but is not null), @var{array} has one element only. The value of that element is the original @var{string}. +@cindex POSIX mode In POSIX mode (@pxref{Options}), the fourth argument is not allowed. @item @code{sprintf(@var{format}, @var{expression1}, @dots{})} @cindexawkfunc{sprintf} -@cindex formatting strings +@cindex formatting @subentry strings Return (without printing) the string that @code{printf} would have printed out with the same arguments (@pxref{Printf}). @@ -17953,7 +17995,7 @@ pival = sprintf("pi = %.2f (approx.)", 22/7) assigns the string @w{@samp{pi = 3.14 (approx.)}} to the variable @code{pival}. @cindexgawkfunc{strtonum} -@cindex convert string to number +@cindex converting @subentry string to numbers @item @code{strtonum(@var{str}) #} Examine @var{str} and return its numeric value. If @var{str} begins with a leading @samp{0}, @code{strtonum()} assumes that @var{str} @@ -18052,8 +18094,10 @@ an @samp{&}: @{ sub(/\|/, "\\&"); print @} @end example -@cindex @code{sub()} function, arguments of -@cindex @code{gsub()} function, arguments of +@cindex @code{sub()} function @subentry arguments of +@cindex @code{gsub()} function @subentry arguments of +@cindex side effects @subentry @code{sub()} function +@cindex side effects @subentry @code{gsub()} function As mentioned, the third argument to @code{sub()} must be a variable, field, or array element. Some versions of @command{awk} allow the third argument to @@ -18068,7 +18112,7 @@ sub(/USA/, "United States", "the USA and Canada") @end example @noindent -@cindex troubleshooting, @code{gsub()}/@code{sub()} functions +@cindex troubleshooting @subentry @code{gsub()}/@code{sub()} functions For historical compatibility, @command{gawk} accepts such erroneous code. However, using any other nonchangeable object as the third parameter causes a fatal error and your program @@ -18103,7 +18147,7 @@ in the string, @code{substr()} returns the null string. Similarly, if @var{length} is present but less than or equal to zero, the null string is returned. -@cindex troubleshooting, @code{substr()} function +@cindex troubleshooting @subentry @code{substr()} function The string returned by @code{substr()} @emph{cannot} be assigned. Thus, it is a mistake to attempt to change a portion of a string, as shown in the following example: @@ -18122,7 +18166,7 @@ of @code{sub()} or @code{gsub()}: gsub(/xyz/, "pdq", substr($0, 5, 20)) # WRONG @end example -@cindex portability, @code{substr()} function +@cindex portability @subentry @code{substr()} function (Some commercial versions of @command{awk} treat @code{substr()} as assignable, but doing so is not portable.) @@ -18135,11 +18179,11 @@ string = "abcdef" string = substr(string, 1, 2) "CDE" substr(string, 6) @end example -@cindex case sensitivity, converting case -@cindex strings, converting letter case +@cindex case sensitivity @subentry converting case +@cindex strings @subentry converting letter case @item @code{tolower(@var{string})} @cindexawkfunc{tolower} -@cindex convert string to lower case +@cindex converting @subentry string to lower case Return a copy of @var{string}, with each uppercase character in the string replaced with its corresponding lowercase character. Nonalphabetic characters are left unchanged. For example, @@ -18147,7 +18191,7 @@ Nonalphabetic characters are left unchanged. For example, @item @code{toupper(@var{string})} @cindexawkfunc{toupper} -@cindex convert string to upper case +@cindex converting @subentry string to upper case Return a copy of @var{string}, with each lowercase character in the string replaced with its corresponding uppercase character. Nonalphabetic characters are left unchanged. For example, @@ -18155,10 +18199,10 @@ Nonalphabetic characters are left unchanged. For example, @end table @sidebar Matching the Null String -@cindex matching, null strings -@cindex null strings, matching -@cindex @code{*} (asterisk), @code{*} operator, null strings@comma{} matching -@cindex asterisk (@code{*}), @code{*} operator, null strings@comma{} matching +@cindex matching @subentry null strings +@cindex null strings @subentry matching +@cindex @code{*} (asterisk) @subentry @code{*} operator @subentry null strings, matching +@cindex asterisk (@code{*}) @subentry @code{*} operator @subentry null strings, matching In @command{awk}, the @samp{*} operator can match the null string. This is particularly important for the @code{sub()}, @code{gsub()}, @@ -18177,14 +18221,14 @@ Although this makes a certain amount of sense, it can be surprising. @node Gory Details @subsubsection More about @samp{\} and @samp{&} with @code{sub()}, @code{gsub()}, and @code{gensub()} -@cindex escape processing, @code{gsub()}/@code{gensub()}/@code{sub()} functions -@cindex @code{sub()} function, escape processing -@cindex @code{gsub()} function, escape processing -@cindex @code{gensub()} function (@command{gawk}), escape processing -@cindex @code{\} (backslash), @code{gsub()}/@code{gensub()}/@code{sub()} functions and -@cindex backslash (@code{\}), @code{gsub()}/@code{gensub()}/@code{sub()} functions and -@cindex @code{&} (ampersand), @code{gsub()}/@code{gensub()}/@code{sub()} functions and -@cindex ampersand (@code{&}), @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex escape processing @subentry @code{gsub()}/@code{gensub()}/@code{sub()} functions +@cindex @code{sub()} function @subentry escape processing +@cindex @code{gsub()} function @subentry escape processing +@cindex @code{gensub()} function (@command{gawk}) @subentry escape processing +@cindex @code{\} (backslash) @subentry @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex backslash (@code{\}) @subentry @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex @code{&} (ampersand) @subentry @code{gsub()}/@code{gensub()}/@code{sub()} functions and +@cindex ampersand (@code{&}) @subentry @code{gsub()}/@code{gensub()}/@code{sub()} functions and @quotation CAUTION This subsubsection has been reported to cause headaches. @@ -18350,7 +18394,7 @@ was expected. In addition, the @command{gawk} maintainer's proposal was lost during the standardization process. The final rules are somewhat simpler. The results are similar except for one case. -@cindex POSIX @command{awk}, functions and, @code{gsub()}/@code{sub()} +@cindex POSIX @command{awk} @subentry functions and @subentry @code{gsub()}/@code{sub()} The POSIX rules state that @samp{\&} in the replacement string produces a literal @samp{&}, @samp{\\} produces a literal @samp{\}, and @samp{\} followed by anything else is not special; the @samp{\} is placed straight into the output. @@ -18480,7 +18524,7 @@ to do substitutions. @node I/O Functions @subsection Input/Output Functions -@cindex input/output functions +@cindex input/output @subentry functions The following functions relate to input/output (I/O). Optional parameters are enclosed in square brackets ([ ]): @@ -18488,7 +18532,7 @@ Optional parameters are enclosed in square brackets ([ ]): @table @asis @item @code{close(}@var{filename} [@code{,} @var{how}]@code{)} @cindexawkfunc{close} -@cindex files, closing +@cindex files @subentry closing @cindex close file or coprocess Close the file @var{filename} for input or output. Alternatively, the argument may be a shell command that was used for creating a coprocess, or @@ -18516,8 +18560,8 @@ Flush any buffered output associated with @var{filename}, which is either a file opened for writing or a shell command for redirecting output to a pipe or coprocess. -@cindex buffers, flushing -@cindex output, buffering +@cindex buffers @subentry flushing +@cindex output @subentry buffering Many utility programs @dfn{buffer} their output (i.e., they save information to write to a disk file or the screen in memory until there is enough for it to be worthwhile to send the data to the output device). @@ -18529,7 +18573,7 @@ This is the purpose of the @code{fflush()} function---@command{gawk} also buffers its output, and the @code{fflush()} function forces @command{gawk} to flush its buffers. -@cindex extensions, common@comma{} @code{fflush()} function +@cindex extensions @subentry common @subentry @code{fflush()} function @cindex Brian Kernighan's @command{awk} Brian Kernighan added @code{fflush()} to his @command{awk} in April 1992. For two decades, it was a common extension. In December @@ -18556,7 +18600,7 @@ only the standard output. @c @cindex automatic warnings @c @cindex warnings, automatic -@cindex troubleshooting, @code{fflush()} function +@cindex troubleshooting @subentry @code{fflush()} function @code{fflush()} returns zero if the buffer is successfully flushed; otherwise, it returns a nonzero value. (@command{gawk} returns @minus{}1.) In the case where all buffers are flushed, the return value is zero @@ -18572,7 +18616,7 @@ In such a case, @code{fflush()} returns @minus{}1, as well. @end table @sidebar Interactive Versus Noninteractive Buffering -@cindex buffering, interactive vs.@: noninteractive +@cindex buffering @subentry interactive vs.@: noninteractive As a side point, buffering issues can be even more confusing if your program is @dfn{interactive} (i.e., communicating @@ -18647,8 +18691,8 @@ close("/bin/sh") @end example @noindent -@cindex troubleshooting, @code{system()} function -@cindex @option{--sandbox} option, disabling @code{system()} function +@cindex troubleshooting @subentry @code{system()} function +@cindex @option{--sandbox} option @subentry disabling @code{system()} function However, if your @command{awk} program is interactive, @code{system()} is useful for running large self-contained programs, such as a shell or an editor. @@ -18694,9 +18738,9 @@ As of August, 2018, BWK @command{awk} now follows @command{gawk}'s behavior for the return value of @code{system()}. @sidebar Controlling Output Buffering with @code{system()} -@cindex buffers, flushing -@cindex buffering, input/output -@cindex output, buffering +@cindex buffers @subentry flushing +@cindex buffering @subentry input/output +@cindex output @subentry buffering The @code{fflush()} function provides explicit control over output buffering for individual files and pipes. However, its use is not portable to many older @@ -18756,9 +18800,9 @@ you would see the latter (undesirable) output. @cindex timestamps @cindex log files, timestamps in -@cindex files, log@comma{} timestamps in -@cindex @command{gawk}, timestamps -@cindex POSIX @command{awk}, timestamps and +@cindex files @subentry log, timestamps in +@cindex @command{gawk} @subentry timestamps +@cindex POSIX @command{awk} @subentry timestamps and @command{awk} programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. Many programs log their timestamps @@ -18789,8 +18833,8 @@ which is sufficient to represent times through including negative timestamps that represent times before the epoch. -@cindex @command{date} utility, GNU -@cindex time, retrieving +@cindex @command{date} utility @subentry GNU +@cindex time @subentry retrieving In order to make it easier to process such log files and to produce useful reports, @command{gawk} provides the following functions for working with timestamps. They are @command{gawk} extensions; they are @@ -18833,7 +18877,7 @@ whether daylight savings time is in effect for the specified time. If @var{datespec} does not contain enough elements or if the resulting time is out of range, @code{mktime()} returns @minus{}1. -@cindex @command{gawk}, @code{PROCINFO} array in +@cindex @command{gawk} @subentry @code{PROCINFO} array in @cindex @code{PROCINFO} array @item @code{strftime(}[@var{format} [@code{,} @var{timestamp} [@code{,} @var{utc-flag}] ] ]@code{)} @cindexgawkfunc{strftime} @@ -18871,9 +18915,9 @@ log file with the current time of day. In particular, it is easy to determine how long ago a particular record was logged. It also allows you to produce log records using the ``seconds since the epoch'' format. -@cindex converting, dates to timestamps -@cindex dates, converting to timestamps -@cindex timestamps, converting dates to +@cindex converting @subentry dates to timestamps +@cindex dates @subentry converting to timestamps +@cindex timestamps @subentry converting dates to The @code{mktime()} function allows you to convert a textual representation of a date and time into a timestamp. This makes it easy to do before/after comparisons of dates and times, particularly when dealing with date and @@ -18887,7 +18931,7 @@ in that it copies nonformat specification characters verbatim to the returned string, while substituting date and time values for format specifications in the @var{format} string. -@cindex format specifiers, @code{strftime()} function (@command{gawk}) +@cindex format specifiers @subentry @code{strftime()} function (@command{gawk}) @code{strftime()} is guaranteed by the 1999 ISO C standard@footnote{Unfortunately, not every system's @code{strftime()} necessarily @@ -18987,7 +19031,7 @@ The weekday as a decimal number (1--7). Monday is day one. The week number of the year (with the first Sunday as the first day of week one) as a decimal number (00--53). -@c @cindex ISO 8601 +@cindex ISO @subentry ISO 8601 date and time standard @item %V The week number of the year (with the first Monday as the first day of week one) as a decimal number (01--53). @@ -19088,8 +19132,8 @@ The date in VMS format (e.g., @samp{20-JUN-1991}). Additionally, the alternative representations are recognized but their normal representations are used. -@cindex @code{date} utility, POSIX -@cindex POSIX @command{awk}, @code{date} utility and +@cindex @code{date} utility @subentry POSIX +@cindex POSIX @command{awk} @subentry @code{date} utility and The following example is an @command{awk} implementation of the POSIX @command{date} utility. Normally, the @command{date} utility prints the current date and time of day in a well-known format. However, if you @@ -19138,7 +19182,7 @@ gawk 'BEGIN @{ @node Bitwise Functions @subsection Bit-Manipulation Functions @cindex bit-manipulation functions -@cindex bitwise, operations +@cindex bitwise @subentry operations @cindex AND bitwise operation @cindex OR bitwise operation @cindex XOR bitwise operation @@ -19265,7 +19309,7 @@ Operands | 0 | 1 | 0 | 1 | 0 | 1 @end docbook @end float -@cindex bitwise complement +@cindex bitwise @subentry complement @cindex complement, bitwise As you can see, the result of an AND operation is 1 only when @emph{both} bits are 1. @@ -19276,7 +19320,7 @@ The next operation is the @dfn{complement}; the complement of 1 is 0 and the complement of 0 is 1. Thus, this operation ``flips'' all the bits of a given value. -@cindex bitwise, shift +@cindex bitwise @subentry shift @cindex left shift, bitwise @cindex right shift, bitwise @cindex shift, bitwise @@ -19291,35 +19335,33 @@ bits, you end up with @samp{11001000}. The following list describes @command{gawk}'s built-in functions that implement the bitwise operations. Optional parameters are enclosed in square brackets ([ ]): -@cindex @command{gawk}, bitwise operations in +@cindex @command{gawk} @subentry bitwise operations in @table @asis @cindexgawkfunc{and} -@cindex bitwise AND +@cindex bitwise @subentry AND @item @code{and(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)} Return the bitwise AND of the arguments. There must be at least two. @cindexgawkfunc{compl} -@cindex bitwise complement +@cindex bitwise @subentry complement @item @code{compl(@var{val})} Return the bitwise complement of @var{val}. @cindexgawkfunc{lshift} -@cindex left shift @item @code{lshift(@var{val}, @var{count})} Return the value of @var{val}, shifted left by @var{count} bits. @cindexgawkfunc{or} -@cindex bitwise OR +@cindex bitwise @subentry OR @item @code{or(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)} Return the bitwise OR of the arguments. There must be at least two. @cindexgawkfunc{rshift} -@cindex right shift @item @code{rshift(@var{val}, @var{count})} Return the value of @var{val}, shifted right by @var{count} bits. @cindexgawkfunc{xor} -@cindex bitwise XOR +@cindex bitwise @subentry XOR @item @code{xor(}@var{v1}@code{,} @var{v2} [@code{,} @dots{}]@code{)} Return the bitwise XOR of the arguments. There must be at least two. @end table @@ -19335,6 +19377,7 @@ Here is a user-defined function (@pxref{User-defined}) that illustrates the use of these functions: @cindex @code{bits2str()} user-defined function +@cindex user-defined @subentry function @subentry @code{bits2str()} @cindex @code{testbits.awk} program @example @group @@ -19408,11 +19451,11 @@ $ @kbd{gawk -f testbits.awk} @print{} rshift(0x99, 2) = 0x26 = 00100110 @end example -@cindex converting, strings to numbers -@cindex strings, converting -@cindex numbers, converting -@cindex converting, numbers to strings -@cindex numbers, as string of bits +@cindex converting @subentry string to numbers +@cindex strings @subentry converting +@cindex numbers @subentry converting +@cindex converting @subentry numbers to strings +@cindex numbers @subentry as string of bits The @code{bits2str()} function turns a binary number into a string. Initializing @code{mask} to one creates a binary value where the rightmost bit @@ -19505,8 +19548,8 @@ that traverses every element of an array of arrays Return a true value if @var{x} is an array. Otherwise, return false. @cindexgawkfunc{typeof} -@cindex variable type -@cindex type, of variable +@cindex variable type, @code{typeof()} function (@command{gawk}) +@cindex type @subentry of variable, @code{typeof()} function (@command{gawk}) @item typeof(@var{x}) Return one of the following strings, depending upon the type of @var{x}: @@ -19590,11 +19633,11 @@ not change their arguments from untyped to unassigned. @node I18N Functions @subsection String-Translation Functions -@cindex @command{gawk}, string-translation functions -@cindex functions, string-translation +@cindex @command{gawk} @subentry string-translation functions +@cindex functions @subentry string-translation @cindex string-translation functions @cindex internationalization -@cindex @command{awk} programs, internationalizing +@cindex @command{awk} programs @subentry internationalizing @command{gawk} provides facilities for internationalizing @command{awk} programs. These include the functions described in the following list. @@ -19640,8 +19683,8 @@ The default value for @var{category} is @code{"LC_MESSAGES"}. @node User-defined @section User-Defined Functions -@cindex user-defined functions -@cindex functions, user-defined +@cindex user-defined @subentry functions +@cindex functions @subentry user-defined Complicated @command{awk} programs can often be simplified by defining your own functions. User-defined functions can be called just like built-in ones (@pxref{Function Calls}), but it is up to you to define @@ -19665,7 +19708,7 @@ variable definitions is appallingly awful.} @author Brian Kernighan @end quotation -@cindex functions, defining +@cindex functions @subentry defining Definitions of functions can appear anywhere between the rules of an @command{awk} program. Thus, the general form of an @command{awk} program is extended to include sequences of rules @emph{and} user-defined function @@ -19685,9 +19728,9 @@ The definition of a function named @var{name} looks like this: @end group @end display -@cindex names, functions -@cindex functions, names of -@cindex namespace issues, functions +@cindex names @subentry functions +@cindex functions @subentry names of +@cindex naming issues @subentry functions @noindent Here, @var{name} is the name of the function to define. A valid function name is like a valid variable name: a sequence of letters, digits, and @@ -19742,13 +19785,13 @@ arguments on some occasions and local variables on others. Another way to think of this is that omitted arguments default to the null string. -@cindex programming conventions, functions, writing +@cindex programming conventions @subentry functions @subentry writing Usually when you write a function, you know how many names you intend to use for arguments and how many you intend to use as local variables. It is conventional to place some extra space between the arguments and the local variables, in order to document how your function is supposed to be used. -@cindex variables, shadowing +@cindex variables @subentry shadowing @cindex shadowing of variable values During execution of the function body, the arguments and local variable values hide, or @dfn{shadow}, any variables of the same names used in the @@ -19763,7 +19806,7 @@ is executing. Once the body finishes, you can once again access the variables that were shadowed while the function was running. @cindex recursive functions -@cindex functions, recursive +@cindex functions @subentry recursive The function body can contain expressions that call functions. They can even call this function, either directly or by way of another function. When this happens, we say the function is @dfn{recursive}. @@ -19775,11 +19818,11 @@ which is described in detail in @ref{Return Statement}. Many of the subsequent examples in this @value{SECTION} use the @code{return} statement. -@cindex common extensions, @code{func} keyword -@cindex extensions, common@comma{} @code{func} keyword -@c @cindex @command{awk} language, POSIX version +@cindex common extensions @subentry @code{func} keyword +@cindex extensions @subentry common @subentry @code{func} keyword @c @cindex POSIX @command{awk} -@cindex POSIX @command{awk}, @code{function} keyword in +@cindex @command{awk} @subentry language, POSIX version +@cindex POSIX @command{awk} @subentry @code{function} keyword in In many @command{awk} implementations, including @command{gawk}, the keyword @code{function} may be abbreviated @code{func}. @value{COMMONEXT} @@ -19802,7 +19845,7 @@ syntactically valid, because functions may be used before they are defined in @command{awk} programs.@footnote{This program won't actually run, because @code{foo()} is undefined.}) -@cindex portability, functions@comma{} defining +@cindex portability @subentry functions, defining To ensure that your @command{awk} programs are portable, always use the keyword @code{function} when defining a function. @@ -19877,6 +19920,7 @@ already empty: @c 8/2014: Thanks to Mike Brennan for the improved formulation @cindex @code{rev()} user-defined function +@cindex user-defined @subentry function @subentry @code{rev()} @example function rev(str) @{ @@ -19903,6 +19947,7 @@ The following example uses the built-in @code{strftime()} function to create an @command{awk} version of @code{ctime()}: @cindex @code{ctime()} user-defined function +@cindex user-defined @subentry function @subentry @code{ctime()} @example @c file eg/lib/ctime.awk # ctime.awk @@ -19928,7 +19973,7 @@ code could have changed @code{PROCINFO["strftime"]}. @node Function Calling @subsection Calling User-Defined Functions -@cindex functions, user-defined, calling +@cindex functions @subentry user-defined @subentry calling @dfn{Calling a function} means causing the function to run and do its job. A function call is an expression and its value is the value returned by the function. @@ -19966,8 +20011,8 @@ an error. @node Variable Scope @subsubsection Controlling Variable Scope -@cindex local variables, in a function -@cindex variables, local to a function +@cindex local variables @subentry in a function +@cindex variables @subentry local to a function Unlike in many languages, there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block in @command{awk}, but you can make a variable local to a function. It is @@ -20152,8 +20197,8 @@ cannot alter this outer value, because it is shadowed during the execution of @code{myfunc()} and cannot be seen or changed from there. @cindex call by reference -@cindex arrays, as parameters to functions -@cindex functions, arrays as parameters to +@cindex arrays @subentry as parameters to functions +@cindex functions @subentry arrays as parameters to However, when arrays are the parameters to functions, they are @emph{not} copied. Instead, the array itself is made available for direct manipulation by the function. This is usually termed @dfn{call by reference}. @@ -20188,7 +20233,7 @@ prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because @subsubsection Other Points About Calling Functions @cindex undefined functions -@cindex functions, undefined +@cindex functions @subentry undefined Some @command{awk} implementations allow you to call a function that has not been defined. They only report a problem at runtime, when the program actually tries to call the function. For example: @@ -20209,12 +20254,12 @@ Because the @samp{if} statement will never be true, it is not really a problem that @code{foo()} has not been defined. Usually, though, it is a problem if a program calls an undefined function. -@cindex lint checking, undefined functions +@cindex lint checking @subentry undefined functions If @option{--lint} is specified (@pxref{Options}), @command{gawk} reports calls to undefined functions. -@cindex portability, @code{next} statement in user-defined functions +@cindex portability @subentry @code{next} statement in user-defined functions Some @command{awk} implementations generate a runtime error if you use either the @code{next} statement or the @code{nextfile} statement @@ -20260,7 +20305,7 @@ don't like the warning, fix your code! It's incorrect, after all.) @node Return Statement @subsection The @code{return} Statement -@cindex @code{return} statement@comma{} user-defined functions +@cindex @code{return} statement, user-defined functions As seen in several earlier examples, the body of a user-defined function can contain a @code{return} statement. @@ -20305,7 +20350,7 @@ function maxelt(vec, i, ret) @} @end example -@cindex programming conventions, function parameters +@cindex programming conventions @subentry function parameters @noindent You call @code{maxelt()} with one argument, which is an array name. The local variables @code{i} and @code{ret} are not intended to be arguments; @@ -20391,10 +20436,10 @@ being aware of them. @section Indirect Function Calls @cindex indirect function calls -@cindex function calls, indirect +@cindex function calls @subentry indirect @cindex function pointers @cindex pointers to functions -@cindex differences in @command{awk} and @command{gawk}, indirect function calls +@cindex differences in @command{awk} and @command{gawk} @subentry indirect function calls This section describes an advanced, @command{gawk}-specific extension. @@ -20446,9 +20491,10 @@ This style of programming works, but can be awkward. With @dfn{indirect} function calls, you tell @command{gawk} to use the @emph{value} of a variable as the @emph{name} of the function to call. -@cindex @code{@@}-notation for indirect function calls -@cindex indirect function calls, @code{@@}-notation -@cindex function calls, indirect, @code{@@}-notation for +@cindex @code{@@} (at-sign) @subentry @code{@@}-notation for indirect function calls +@cindex at-sign (@code{@@}) @subentry @code{@@}-notation for indirect function calls +@cindex indirect function calls @subentry @code{@@}-notation +@cindex function calls @subentry indirect @subentry @code{@@}-notation for The syntax is similar to that of a regular function call: an identifier immediately followed by an opening parenthesis, any arguments, and then a closing parenthesis, with the addition of a leading @samp{@@} @@ -20861,8 +20907,8 @@ It contains the following chapters: @node Library Functions @chapter A Library of @command{awk} Functions @cindex libraries of @command{awk} functions -@cindex functions, library -@cindex functions, user-defined, library of +@cindex functions @subentry library +@cindex functions @subentry user-defined @subentry library of @ref{User-defined} describes how to write your own @command{awk} functions. Writing functions is important, because @@ -20909,7 +20955,7 @@ and would like to contribute them to the @command{awk} user community, see @ref{How To Contribute}, for more information. @end ifclear -@cindex portability, example programs +@cindex portability @subentry example programs The programs in this @value{CHAPTER} and in @ref{Sample Programs}, freely use @command{gawk}-specific features. @@ -20929,8 +20975,8 @@ to skip any remaining input in the input file. @item @c 12/2000: Thanks to Nelson Beebe for pointing out the output issue. -@cindex case sensitivity, example programs -@cindex @code{IGNORECASE} variable, in example programs +@cindex case sensitivity @subentry example programs +@cindex @code{IGNORECASE} variable @subentry in example programs Finally, some of the programs choose to ignore upper- and lowercase distinctions in their input. They do so by assigning one to @code{IGNORECASE}. You can achieve almost the same effect@footnote{The effects are @@ -20967,19 +21013,19 @@ comparisons use only lowercase letters. @node Library Names @section Naming Library Function Global Variables -@cindex names, arrays/variables -@cindex names, functions -@cindex namespace issues -@cindex @command{awk} programs, documenting -@cindex documentation, of @command{awk} programs +@cindex names @subentry arrays/variables +@cindex names @subentry functions +@cindex naming issues +@cindex @command{awk} programs @subentry documenting +@cindex documentation @subentry of @command{awk} programs Due to the way the @command{awk} language evolved, variables are either @dfn{global} (usable by the entire program) or @dfn{local} (usable just by a specific function). There is no intermediate state analogous to @code{static} variables in C. -@cindex variables, global, for library functions +@cindex variables @subentry global @subentry for library functions @cindex private variables -@cindex variables, private +@cindex variables @subentry private Library functions often need to have global variables that they can use to preserve state information between calls to the function---for example, @code{getopt()}'s variable @code{_opti} @@ -20993,15 +21039,15 @@ either another library function or a user's main program. For example, a name like @code{i} or @code{j} is not a good choice, because user programs often use variable names like these for their own purposes. -@cindex programming conventions, private variable names +@cindex programming conventions @subentry private variable names The example programs shown in this @value{CHAPTER} all start the names of their private variables with an underscore (@samp{_}). Users generally don't use leading underscores in their variable names, so this convention immediately decreases the chances that the variable names will be accidentally shared with the user's program. -@cindex @code{_} (underscore), in names of private variables -@cindex underscore (@code{_}), in names of private variables +@cindex @code{_} (underscore) @subentry in names of private variables +@cindex underscore (@code{_}) @subentry in names of private variables In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables---for example, @code{_pw_byname()} in the user database routines @@ -21023,7 +21069,7 @@ The leading capital letter indicates that it is global, while the fact that the variable name is not all capital letters indicates that the variable is not one of @command{awk}'s predefined variables, such as @code{FS}. -@cindex @option{--dump-variables} option, using for library functions +@cindex @option{--dump-variables} option @subentry using for library functions It is also important that @emph{all} variables in library functions that do not need to save state are, in fact, declared local.@footnote{@command{gawk}'s @option{--dump-variables} command-line @@ -21041,9 +21087,9 @@ function lib_func(x, y, l1, l2) @} @end example -@cindex arrays, associative, library functions and -@cindex libraries of @command{awk} functions, associative arrays and -@cindex functions, library, associative arrays and +@cindex arrays @subentry associative @subentry library functions and +@cindex libraries of @command{awk} functions @subentry associative arrays and +@cindex functions @subentry library @subentry associative arrays and @cindex Tcl A different convention, common in the Tcl community, is to use a single associative array to hold the values needed by the library function(s), or @@ -21191,9 +21237,10 @@ be tested with @command{gawk} and the results compared to the built-in @cindex assertions @cindex @code{assert()} function (C library) -@cindex libraries of @command{awk} functions, assertions -@cindex functions, library, assertions -@cindex @command{awk} programs, lengthy, assertions +@cindex C library functions @subentry @code{assert()} +@cindex libraries of @command{awk} functions @subentry assertions +@cindex functions @subentry library @subentry assertions +@cindex @command{awk} programs @subentry lengthy @subentry assertions When writing large programs, it is often useful to know that a condition or set of conditions is true. Before proceeding with a particular computation, you make a statement about what you believe to be @@ -21224,6 +21271,7 @@ prog.c:5: assertion failed: a <= 5 && b >= 17.1 @end example @cindex @code{assert()} user-defined function +@cindex user-defined @subentry function @subentry @code{assert()} The C language makes it possible to turn the condition into a string for use in printing the diagnostic message. This is not possible in @command{awk}, so this @code{assert()} function also requires a string version of the condition @@ -21295,7 +21343,7 @@ If the assertion fails, you see a message similar to the following: mydata:1357: assertion failed: a <= 5 && b >= 17.1 @end example -@cindex @code{END} pattern, @code{assert()} user-defined function and +@cindex @code{END} pattern @subentry @code{assert()} user-defined function and There is a small problem with this version of @code{assert()}. An @code{END} rule is automatically added to the program calling @code{assert()}. Normally, if a program consists @@ -21305,7 +21353,7 @@ attempts to read the input @value{DF}s or standard input (@pxref{Using BEGIN/END}), most likely causing the program to hang as it waits for input. -@cindex @code{BEGIN} pattern, @code{assert()} user-defined function and +@cindex @code{BEGIN} pattern @subentry @code{assert()} user-defined function and There is a simple workaround to this: make sure that such a @code{BEGIN} rule always ends with an @code{exit} statement. @@ -21314,12 +21362,12 @@ with an @code{exit} statement. @subsection Rounding Numbers @cindex rounding numbers -@cindex numbers, rounding -@cindex libraries of @command{awk} functions, rounding numbers -@cindex functions, library, rounding numbers -@cindex @code{print} statement, @code{sprintf()} function and -@cindex @code{printf} statement, @code{sprintf()} function and -@cindex @code{sprintf()} function, @code{print}/@code{printf} statements and +@cindex numbers @subentry rounding +@cindex libraries of @command{awk} functions @subentry rounding numbers +@cindex functions @subentry library @subentry rounding numbers +@cindex @code{print} statement @subentry @code{sprintf()} function and +@cindex @code{printf} statement @subentry @code{sprintf()} function and +@cindex @code{sprintf()} function @subentry @code{print}/@code{printf} statements and The way @code{printf} and @code{sprintf()} (@pxref{Printf}) perform rounding often depends upon the system's C @code{sprintf()} @@ -21333,6 +21381,7 @@ traditional rounding; it might be useful if your @command{awk}'s @code{printf} does unbiased rounding: @cindex @code{round()} user-defined function +@cindex user-defined @subentry function @subentry @code{round()} @example @c file eg/lib/round.awk # round.awk --- do normal rounding @@ -21380,10 +21429,10 @@ function round(x, ival, aval, fraction) @node Cliff Random Function @subsection The Cliff Random Number Generator -@cindex random numbers, Cliff +@cindex random numbers @subentry Cliff @cindex Cliff random numbers -@cindex numbers, Cliff random -@cindex functions, library, Cliff random numbers +@cindex numbers @subentry Cliff random +@cindex functions @subentry library @subentry Cliff random numbers The @uref{http://mathworld.wolfram.com/CliffRandomNumberGenerator.html, Cliff random number generator} @@ -21392,6 +21441,7 @@ for randomness by showing no structure.'' It is easily programmed, in less than 10 lines of @command{awk} code: @cindex @code{cliff_rand()} user-defined function +@cindex user-defined @subentry function @subentry @code{cliff_rand()} @example @c file eg/lib/cliff_rand.awk # cliff_rand.awk --- generate Cliff random numbers @@ -21426,10 +21476,10 @@ isn't random enough, you might try using this function instead. @node Ordinal Functions @subsection Translating Between Characters and Numbers -@cindex libraries of @command{awk} functions, character values as numbers -@cindex functions, library, character values as numbers -@cindex characters, values of as numbers -@cindex numbers, as values of characters +@cindex libraries of @command{awk} functions @subentry character values as numbers +@cindex functions @subentry library @subentry character values as numbers +@cindex characters @subentry values of as numbers +@cindex numbers @subentry as values of characters One commercial implementation of @command{awk} supplies a built-in function, @code{ord()}, which takes a character and returns the numeric value for that character in the machine's character set. If the string passed to @@ -21441,8 +21491,11 @@ Both functions are written very nicely in @command{awk}; there is no real reason to build them into the @command{awk} interpreter: @cindex @code{ord()} user-defined function +@cindex user-defined @subentry function @subentry @code{ord()} @cindex @code{chr()} user-defined function +@cindex user-defined @subentry function @subentry @code{chr()} @cindex @code{_ord_init()} user-defined function +@cindex user-defined @subentry function @subentry @code{_ord_init()} @example @c file eg/lib/ord.awk # ord.awk --- do ord and chr @@ -21551,10 +21604,10 @@ function. It is commented out for production use. @node Join Function @subsection Merging an Array into a String -@cindex libraries of @command{awk} functions, merging arrays into strings -@cindex functions, library, merging arrays into strings -@cindex strings, merging arrays into -@cindex arrays, merging into strings +@cindex libraries of @command{awk} functions @subentry merging arrays into strings +@cindex functions @subentry library @subentry merging arrays into strings +@cindex strings @subentry merging arrays into +@cindex arrays @subentry merging into strings When doing string processing, it is often useful to be able to join all the strings in an array into one long string. The following function, @code{join()}, accomplishes this task. It is used later in several of @@ -21569,6 +21622,7 @@ assumption, as the array was likely created with @code{split()} (@pxref{String Functions}): @cindex @code{join()} user-defined function +@cindex user-defined @subentry function @subentry @code{join()} @example @c file eg/lib/join.awk # join.awk --- join an array into a string @@ -21611,10 +21665,10 @@ more difficult than they really need to be.} @node Getlocaltime Function @subsection Managing the Time of Day -@cindex libraries of @command{awk} functions, managing, time -@cindex functions, library, managing time -@cindex timestamps, formatted -@cindex time, managing +@cindex libraries of @command{awk} functions @subentry managing @subentry time +@cindex functions @subentry library @subentry managing time +@cindex timestamps @subentry formatted +@cindex time @subentry managing The @code{systime()} and @code{strftime()} functions described in @ref{Time Functions} provide the minimum functionality necessary for dealing with the time of day @@ -21627,6 +21681,7 @@ with preformatted time information. It returns a string with the current time formatted in the same way as the @command{date} utility: @cindex @code{getlocaltime()} user-defined function +@cindex user-defined @subentry function @subentry @code{getlocaltime()} @example @c file eg/lib/gettime.awk # getlocaltime.awk --- get the time of day in a usable format @@ -21738,6 +21793,7 @@ The following function, based on a suggestion by Denis Shirokov, reads the entire contents of the named file in one shot: @cindex @code{readfile()} user-defined function +@cindex user-defined @subentry function @subentry @code{readfile()} @example @c file eg/lib/readfile.awk # readfile.awk --- read an entire file at once @@ -21867,9 +21923,9 @@ function shell_quote(s, # parameter @node Data File Management @section @value{DDF} Management -@cindex files, managing -@cindex libraries of @command{awk} functions, managing, data files -@cindex functions, library, managing data files +@cindex files @subentry managing +@cindex libraries of @command{awk} functions @subentry managing @subentry data files +@cindex functions @subentry library @subentry managing data files This @value{SECTION} presents functions that are useful for managing command-line @value{DF}s. @@ -21884,8 +21940,8 @@ command-line @value{DF}s. @node Filetrans Function @subsection Noting @value{DDF} Boundaries -@cindex files, managing, data file boundaries -@cindex files, initialization and cleanup +@cindex files @subentry managing @subentry data file boundaries +@cindex files @subentry initialization and cleanup The @code{BEGIN} and @code{END} rules are each executed exactly once, at the beginning and end of your @command{awk} program, respectively (@pxref{BEGIN/END}). @@ -21950,7 +22006,9 @@ supplied in the ``main'' program, @code{endfile()} is called first. Once again, the value of multiple @code{BEGIN} and @code{END} rules should be clear. @cindex @code{beginfile()} user-defined function +@cindex user-defined @subentry function @subentry @code{beginfile()} @cindex @code{endfile()} user-defined function +@cindex user-defined @subentry function @subentry @code{endfile()} If the same @value{DF} occurs twice in a row on the command line, then @code{endfile()} and @code{beginfile()} are not executed at the end of the first pass and at the beginning of the second pass. @@ -22005,7 +22063,7 @@ For more information, refer to @ref{BEGINFILE/ENDFILE}. @node Rewind Function @subsection Rereading the Current File -@cindex files, reading +@cindex files @subentry reading Another request for a new built-in function was for a function that would make it possible to reread the current file. The requesting user didn't want to have to use @code{getline} @@ -22018,6 +22076,7 @@ and then start over with it from the top. For lack of a better name, we'll call the function @code{rewind()}: @cindex @code{rewind()} user-defined function +@cindex user-defined @subentry function @subentry @code{rewind()} @example @c file eg/lib/rewind.awk # rewind.awk --- rewind the current file and start over @@ -22091,9 +22150,9 @@ $ @kbd{gawk -f rewind.awk -f test.awk data } @node File Checking @subsection Checking for Readable @value{DDF}s -@cindex troubleshooting, readable data files -@cindex readable data files@comma{} checking -@cindex files, skipping +@cindex troubleshooting @subentry readable data files +@cindex readable data files, checking +@cindex files @subentry skipping Normally, if you give @command{awk} a @value{DF} that isn't readable, it stops with a fatal error. There are times when you might want to just ignore such files and keep going.@footnote{The @code{BEGINFILE} @@ -22131,7 +22190,7 @@ BEGIN @{ @c endfile @end example -@cindex troubleshooting, @code{getline} function +@cindex troubleshooting @subentry @code{getline} command This works, because the @code{getline} won't be fatal. Removing the element from @code{ARGV} with @code{delete} skips the file (because it's no longer in the list). @@ -22209,7 +22268,7 @@ not @samp{<}. @subsection Treating Assignments as @value{FFN}s @cindex assignments as file names -@cindex file names, assignments as +@cindex file names @subentry assignments as Occasionally, you might not want @command{awk} to process command-line variable assignments (@pxref{Assignment Options}). @@ -22268,12 +22327,12 @@ are left alone. @node Getopt Function @section Processing Command-Line Options -@cindex libraries of @command{awk} functions, command-line options -@cindex functions, library, command-line options -@cindex command-line options, processing -@cindex options, command-line, processing -@cindex functions, library, C library -@cindex arguments, processing +@cindex libraries of @command{awk} functions @subentry command-line options +@cindex functions @subentry library @subentry command-line options +@cindex command line @subentry options @subentry processing +@cindex options @subentry command-line @subentry processing +@cindex functions @subentry library @subentry C library +@cindex arguments @subentry processing Most utilities on POSIX-compatible systems take options on the command line that can be used to change the way a program behaves. @command{awk} is an example of such a program @@ -22285,6 +22344,7 @@ The first occurrence on the command line of either @option{--} or a string that does not begin with @samp{-} ends the options. @cindex @code{getopt()} function (C library) +@cindex C library functions @subentry @code{getopt()} Modern Unix systems provide a C function named @code{getopt()} for processing command-line arguments. The programmer provides a string describing the one-letter options. If an option requires an argument, it is followed in the @@ -22388,6 +22448,7 @@ We have left it alone, as using @code{substr()} is more portable.} The discussion that follows walks through the code a bit at a time: @cindex @code{getopt()} user-defined function +@cindex user-defined @subentry function @subentry @code{getopt()} @example @c file eg/lib/getopt.awk # getopt.awk --- Do C library getopt(3) function in awk @@ -22430,6 +22491,7 @@ a string of options (the @code{options} parameter). If @code{options} has a zero length, @code{getopt()} immediately returns @minus{}1: @cindex @code{getopt()} user-defined function +@cindex user-defined @subentry function @subentry @code{getopt()} @example @c file eg/lib/getopt.awk function getopt(argc, argv, options, thisopt, i) @@ -22627,10 +22689,10 @@ use @code{getopt()} to process their arguments. @node Passwd Functions @section Reading the User Database -@cindex libraries of @command{awk} functions, user database, reading -@cindex functions, library, user database@comma{} reading -@cindex user database@comma{} reading -@cindex database, users@comma{} reading +@cindex libraries of @command{awk} functions @subentry user database, reading +@cindex functions @subentry library @subentry user database, reading +@cindex user database, reading +@cindex database @subentry users, reading @cindex @code{PROCINFO} array The @code{PROCINFO} array (@pxref{Built-in Variables}) @@ -22644,12 +22706,14 @@ user database. @xref{Group Functions} for a similar suite that retrieves information from the group database. @cindex @code{getpwent()} function (C library) +@cindex C library functions @subentry @code{getpwent()} @cindex @code{getpwent()} user-defined function -@cindex users, information about, retrieving +@cindex user-defined @subentry function @subentry @code{getpwent()} +@cindex users, information about @subentry retrieving @cindex login information @cindex account information @cindex password file -@cindex files, password +@cindex files @subentry password The POSIX standard does not define the file where user information is kept. Instead, it provides the @code{<pwd.h>} header file and several C language subroutines for obtaining user information. @@ -22777,8 +22841,8 @@ shell, such as Bash. A few lines representative of @command{pwcat}'s output are as follows: @cindex Jacobs, Andrew -@cindex Robbins, Arnold -@cindex Robbins, Miriam +@cindex Robbins @subentry Arnold +@cindex Robbins @subentry Miriam @example $ @kbd{pwcat} @print{} root:x:0:1:Operator:/:/bin/sh @@ -22797,6 +22861,7 @@ information. There are several functions here, corresponding to the C functions of the same names: @cindex @code{_pw_init()} user-defined function +@cindex user-defined @subentry function @subentry @code{_pw_init()} @example @c file eg/lib/passwdawk.in # passwd.awk --- access password file information @@ -22850,7 +22915,7 @@ function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat) @c endfile @end example -@cindex @code{BEGIN} pattern, @code{pwcat} program +@cindex @code{BEGIN} pattern @subentry @code{pwcat} program The @code{BEGIN} rule sets a private variable to the directory where @command{pwcat} is stored. Because it is used to help out an @command{awk} library routine, we have chosen to put it in @file{/usr/local/libexec/awk}; @@ -22863,8 +22928,8 @@ occurrence (@code{_pw_bycount}). The variable @code{_pw_inited} is used for efficiency, as @code{_pw_init()} needs to be called only once. -@cindex @code{PROCINFO} array, testing the field splitting -@cindex @code{getline} command, @code{_pw_init()} function +@cindex @code{PROCINFO} array @subentry testing the field splitting +@cindex @code{getline} command @subentry @code{_pw_init()} function Because this function uses @code{getline} to read information from @command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}. It notes in the variable @code{using_fw} whether field splitting @@ -22889,12 +22954,14 @@ if necessary), @code{RS}, and @code{$0}. The use of @code{@w{_pw_count}} is explained shortly. @cindex @code{getpwnam()} function (C library) +@cindex C library functions @subentry @code{getpwnam()} The @code{getpwnam()} function takes a username as a string argument. If that user is in the database, it returns the appropriate line. Otherwise, it relies on the array reference to a nonexistent element to create the element with the null string as its value: @cindex @code{getpwnam()} user-defined function +@cindex user-defined @subentry function @subentry @code{getpwnam()} @example @group @c file eg/lib/passwdawk.in @@ -22908,11 +22975,13 @@ function getpwnam(name) @end example @cindex @code{getpwuid()} function (C library) +@cindex C library functions @subentry @code{getpwuid()} Similarly, the @code{getpwuid()} function takes a user ID number argument. If that user number is in the database, it returns the appropriate line. Otherwise, it returns the null string: @cindex @code{getpwuid()} user-defined function +@cindex user-defined @subentry function @subentry @code{getpwuid()} @example @c file eg/lib/passwdawk.in function getpwuid(uid) @@ -22924,11 +22993,13 @@ function getpwuid(uid) @end example @cindex @code{getpwent()} function (C library) +@cindex C library functions @subentry @code{getpwent()} The @code{getpwent()} function simply steps through the database, one entry at a time. It uses @code{_pw_count} to track its current position in the @code{_pw_bycount} array: @cindex @code{getpwent()} user-defined function +@cindex user-defined @subentry function @subentry @code{getpwent()} @example @c file eg/lib/passwdawk.in function getpwent() @@ -22942,10 +23013,12 @@ function getpwent() @end example @cindex @code{endpwent()} function (C library) +@cindex C library functions @subentry @code{endpwent()} The @code{@w{endpwent()}} function resets @code{@w{_pw_count}} to zero, so that subsequent calls to @code{getpwent()} start over again: @cindex @code{endpwent()} user-defined function +@cindex user-defined @subentry function @subentry @code{endpwent()} @example @c file eg/lib/passwdawk.in function endpwent() @@ -22980,17 +23053,19 @@ uses these functions. @node Group Functions @section Reading the Group Database -@cindex libraries of @command{awk} functions, group database, reading -@cindex functions, library, group database@comma{} reading +@cindex libraries of @command{awk} functions @subentry group database, reading +@cindex functions @subentry library @subentry group database, reading @cindex group database, reading -@cindex database, group, reading -@cindex @code{PROCINFO} array, group membership and +@cindex database @subentry group, reading +@cindex @code{PROCINFO} array @subentry group membership and @cindex @code{getgrent()} function (C library) +@cindex C library functions @subentry @code{getgrent()} @cindex @code{getgrent()} user-defined function -@cindex groups@comma{} information about +@cindex user-defined @subentry function @subentry @code{getgrent()} +@cindex groups, information about @cindex account information @cindex group file -@cindex files, group +@cindex files @subentry group Much of the discussion presented in @ref{Passwd Functions} applies to the group database as well. Although there has traditionally @@ -23126,8 +23201,9 @@ $ @kbd{grcat} Here are the functions for obtaining information from the group database. There are several, modeled after the C library functions of the same names: -@cindex @code{getline} command, @code{_gr_init()} user-defined function +@cindex @code{getline} command @subentry @code{_gr_init()} user-defined function @cindex @code{_gr_init()} user-defined function +@cindex user-defined @subentry function @subentry @code{_gr_init()} @example @c file eg/lib/groupawk.in # group.awk --- functions for dealing with the group file @@ -23241,6 +23317,7 @@ initializes @code{_gr_count} to zero (it is used later), and makes @code{_gr_inited} nonzero. @cindex @code{getgrnam()} function (C library) +@cindex C library functions @subentry @code{getgrnam()} The @code{getgrnam()} function takes a group name as its argument, and if that group exists, it is returned. Otherwise, it @@ -23248,6 +23325,7 @@ relies on the array reference to a nonexistent element to create the element with the null string as its value: @cindex @code{getgrnam()} user-defined function +@cindex user-defined @subentry function @subentry @code{getgrnam()} @example @c file eg/lib/groupawk.in function getgrnam(group) @@ -23259,10 +23337,12 @@ function getgrnam(group) @end example @cindex @code{getgrgid()} function (C library) +@cindex C library functions @subentry @code{getgrgid()} The @code{getgrgid()} function is similar; it takes a numeric group ID and looks up the information associated with that group ID: @cindex @code{getgrgid()} user-defined function +@cindex user-defined @subentry function @subentry @code{getgrgid()} @example @c file eg/lib/groupawk.in function getgrgid(gid) @@ -23274,10 +23354,12 @@ function getgrgid(gid) @end example @cindex @code{getgruser()} function (C library) +@cindex C library functions @subentry @code{getgruser()} The @code{getgruser()} function does not have a C counterpart. It takes a username and returns the list of groups that have the user as a member: -@cindex @code{getgruser()} function, user-defined +@cindex @code{getgruser()} user-defined function +@cindex user-defined @subentry function @subentry @code{getgruser()} @example @c file eg/lib/groupawk.in function getgruser(user) @@ -23289,10 +23371,12 @@ function getgruser(user) @end example @cindex @code{getgrent()} function (C library) +@cindex C library functions @subentry @code{getgrent()} The @code{getgrent()} function steps through the database one entry at a time. It uses @code{_gr_count} to track its position in the list: @cindex @code{getgrent()} user-defined function +@cindex user-defined @subentry function @subentry @code{getgrent()} @example @c file eg/lib/groupawk.in function getgrent() @@ -23308,10 +23392,12 @@ function getgrent() @end example @cindex @code{endgrent()} function (C library) +@cindex C library functions @subentry @code{endgrent()} The @code{endgrent()} function resets @code{_gr_count} to zero so that @code{getgrent()} can start over again: @cindex @code{endgrent()} user-defined function +@cindex user-defined @subentry function @subentry @code{endgrent()} @example @c file eg/lib/groupawk.in function endgrent() @@ -23348,6 +23434,7 @@ You call it with the array and a string representing the name of the array: @cindex @code{walk_array()} user-defined function +@cindex user-defined @subentry function @subentry @code{walk_array()} @example @c file eg/lib/walkarray.awk function walk_array(arr, name, i) @@ -23573,7 +23660,7 @@ output identical to that of the original version. @node Sample Programs @chapter Practical @command{awk} Programs -@cindex @command{awk} programs, examples of +@cindex @command{awk} programs @subentry examples of @c FULLXREF ON @ref{Library Functions}, @@ -23642,7 +23729,7 @@ cut.awk -- -c1-8 myfiles > results @node Clones @section Reinventing Wheels for Fun and Profit -@cindex POSIX, programs@comma{} implementing in @command{awk} +@cindex POSIX @subentry programs, implementing in @command{awk} This @value{SECTION} presents a number of POSIX utilities implemented in @command{awk}. Reinventing these programs in @command{awk} is often enjoyable, @@ -23673,8 +23760,8 @@ The programs are presented in alphabetical order. @cindex @command{cut} utility @cindex @command{cut} utility -@cindex fields, cutting -@cindex columns, cutting +@cindex fields @subentry cutting +@cindex columns @subentry cutting The @command{cut} utility selects, or ``cuts,'' characters or fields from its standard input and sends them to its standard output. Fields are separated by TABs by default, @@ -23754,8 +23841,8 @@ function usage() @c endfile @end example -@cindex @code{BEGIN} pattern, running @command{awk} programs and -@cindex @code{FS} variable, running @command{awk} programs and +@cindex @code{BEGIN} pattern @subentry running @command{awk} programs and +@cindex @code{FS} variable @subentry running @command{awk} programs and Next comes a @code{BEGIN} rule that parses the command-line options. It sets @code{FS} to a single TAB character, because that is @command{cut}'s default field separator. The rule then sets the output field separator to be the @@ -23801,7 +23888,7 @@ BEGIN @{ @c endfile @end example -@cindex field separators, spaces as +@cindex field separator @subentry spaces as The code must take special care when the field delimiter is a space. Using a single space (@code{@w{" "}}) for the value of @code{FS} is @@ -23993,9 +24080,9 @@ of picking the input line apart by characters. @node Egrep Program @subsection Searching for Regular Expressions in Files -@cindex regular expressions, searching for -@cindex searching, files for regular expressions -@cindex files, searching for regular expressions +@cindex regular expressions @subentry searching for +@cindex searching @subentry files for regular expressions +@cindex files @subentry searching for regular expressions @cindex @command{egrep} utility The @command{egrep} utility searches files for patterns. It uses regular expressions that are almost identical to those available in @command{awk} @@ -24208,8 +24295,8 @@ print the @value{FN}, and then skip to the next file with @code{nextfile}. Finally, each line is printed, with a leading @value{FN} and colon if necessary: -@cindex @code{!} (exclamation point), @code{!} operator -@cindex exclamation point (@code{!}), @code{!} operator +@cindex @code{!} (exclamation point) @subentry @code{!} operator +@cindex exclamation point (@code{!}) @subentry @code{!} operator @example @c file eg/prog/egrep.awk @{ @@ -24269,8 +24356,8 @@ function usage() @node Id Program @subsection Printing Out User Information -@cindex printing, user information -@cindex users, information about, printing +@cindex printing @subentry user information +@cindex users, information about @subentry printing @cindex @command{id} utility The @command{id} utility lists a user's real and effective user ID numbers, real and effective group ID numbers, and the user's group set, if any. @@ -24283,7 +24370,7 @@ $ @kbd{id} @print{} uid=1000(arnold) gid=1000(arnold) groups=1000(arnold),4(adm),7(lp),27(sudo) @end example -@cindex @code{PROCINFO} array, user and group ID numbers and +@cindex @code{PROCINFO} array @subentry user and group ID numbers and This information is part of what is provided by @command{gawk}'s @code{PROCINFO} array (@pxref{Built-in Variables}). However, the @command{id} utility provides a more palatable output than just @@ -24410,7 +24497,7 @@ the empty string into this function saves several lines of code. @c FIXME: One day, update to current POSIX version of split -@cindex files, splitting +@cindex files @subentry splitting @cindex @code{split} utility The @command{split} program splits large text files into smaller pieces. Usage is as follows:@footnote{This is the traditional usage. The @@ -24552,8 +24639,8 @@ way as to solve the EBCDIC issue as well. @node Tee Program @subsection Duplicating Output into Multiple Files -@cindex files, multiple@comma{} duplicating output into -@cindex output, duplicating into files +@cindex files @subentry multiple, duplicating output into +@cindex output @subentry duplicating into files @cindex @code{tee} utility The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies its standard input to its standard output and also duplicates it to the @@ -24676,8 +24763,8 @@ END @{ @c FIXME: One day, update to current POSIX version of uniq -@cindex printing, unduplicated lines of text -@cindex text@comma{} printing, unduplicated lines of +@cindex printing @subentry unduplicated lines of text +@cindex text, printing @subentry unduplicated lines of @cindex @command{uniq} utility The @command{uniq} utility reads sorted lines of data on its standard input, and by default removes duplicate lines. In other words, it only @@ -24957,11 +25044,11 @@ suggestion. @c FIXME: One day, update to current POSIX version of wc -@cindex counting -@cindex input files, counting elements in -@cindex words, counting -@cindex characters, counting -@cindex lines, counting +@cindex counting words, lines, and characters +@cindex input files @subentry counting elements in +@cindex words @subentry counting +@cindex characters @subentry counting +@cindex lines @subentry counting @cindex @command{wc} utility The @command{wc} (word count) utility counts lines, words, and characters in one or more input files. Its usage is as follows: @@ -25159,9 +25246,9 @@ We hope you find them both interesting and enjoyable. @node Dupword Program @subsection Finding Duplicated Words in a Document -@cindex words, duplicate@comma{} searching for -@cindex searching, for words -@cindex documents@comma{} searching +@cindex words @subentry duplicate, searching for +@cindex searching @subentry for words +@cindex documents, searching A common error when writing large amounts of prose is to accidentally duplicate words. Typically you will see this in text as something like ``the the program does the following@dots{}'' When the text is online, often @@ -25229,7 +25316,7 @@ word, comparing it to the previous one: @node Alarm Program @subsection An Alarm Clock Program @cindex insomnia, cure for -@cindex Robbins, Arnold +@cindex Robbins @subentry Arnold @quotation @i{Nothing cures insomnia like a ringing alarm clock.} @author Arnold Robbins @@ -25261,7 +25348,7 @@ Aharon Robbins <arnold@skeeve.com> wrote: @author Erik Quanstrom @end quotation -@cindex time, alarm clock example program +@cindex time @subentry alarm clock example program @cindex alarm clock example program The following program is a simple ``alarm clock'' program. You give it a time of day and an optional message. At the specified time, @@ -25272,6 +25359,7 @@ repetitions. This program uses the @code{getlocaltime()} function from @ref{Getlocaltime Function}. +@cindex ASCII All the work is done in the @code{BEGIN} rule. The first part is argument checking and setting of defaults: the delay, the count, and the message to print. If the user supplied a message without the ASCII BEL @@ -25417,7 +25505,7 @@ seconds are necessary: @node Translate Program @subsection Transliterating Characters -@cindex characters, transliterating +@cindex characters @subentry transliterating @cindex @command{tr} utility The system @command{tr} utility transliterates characters. For example, it is often used to map uppercase letters into lowercase for further processing: @@ -25569,8 +25657,8 @@ for inspiration. @node Labels Program @subsection Printing Mailing Labels -@cindex printing, mailing labels -@cindex mailing labels@comma{} printing +@cindex printing @subentry mailing labels +@cindex mailing labels, printing Here is a ``real-world''@footnote{``Real world'' is defined as ``a program actually used to get something done.''} program. This @@ -25701,7 +25789,7 @@ END @{ @node Word Sorting @subsection Generating Word-Usage Counts -@cindex words, usage counts@comma{} generating +@cindex words @subentry usage counts, generating When working with large amounts of text, it can be interesting to know how often different words appear. For example, an author may overuse @@ -25832,7 +25920,7 @@ to use the @command{sort} program. @node History Sorting @subsection Removing Duplicates from Unsorted Text -@cindex lines, duplicate@comma{} removing +@cindex lines @subentry duplicate, removing The @command{uniq} program (@pxref{Uniq Program}) removes duplicate lines from @emph{sorted} data. @@ -25902,8 +25990,8 @@ seen. @node Extract Program @subsection Extracting Programs from Texinfo Source Files -@cindex Texinfo, extracting programs from source files -@cindex files, Texinfo@comma{} extracting programs from +@cindex Texinfo @subentry extracting programs from source files +@cindex files @subentry Texinfo, extracting programs from @ifnotinfo Both this chapter and the previous chapter (@ref{Library Functions}) @@ -26279,8 +26367,8 @@ value of @code{RT}. @node Igawk Program @subsection An Easy Way to Use Library Functions -@cindex libraries of @command{awk} functions, example program for using -@cindex functions, library, example program for using +@cindex libraries of @command{awk} functions @subentry example program for using +@cindex functions @subentry library @subentry example program for using In @ref{Include Files}, we saw how @command{gawk} provides a built-in file-inclusion capability. However, this is a @command{gawk} extension. This @value{SECTION} provides the motivation for making file inclusion @@ -27112,10 +27200,10 @@ things considerably. What problem does this engender though? @c answer, reading from "-" or /dev/stdin @cindex search paths -@cindex search paths, for source files -@cindex source files@comma{} search path for -@cindex files, source@comma{} search path for -@cindex directories, searching +@cindex search paths @subentry for source files +@cindex source files, search path for +@cindex files @subentry source, search path for +@cindex directories @subentry searching @subentry for source files @item As an additional example of the idea that it is not always necessary to add new features to a program, consider the idea of having two files in @@ -27181,8 +27269,8 @@ It contains the following chapters: @node Advanced Features @chapter Advanced Features of @command{gawk} -@cindex @command{gawk}, features, advanced -@cindex advanced features, @command{gawk} +@cindex @command{gawk} @subentry features @subentry advanced +@cindex advanced features @subentry @command{gawk} @ignore Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com> @@ -27250,9 +27338,9 @@ discusses the ability to dynamically add new built-in functions to @node Nondecimal Data @section Allowing Nondecimal Input Data @cindex @option{--non-decimal-data} option -@cindex advanced features, nondecimal input data -@cindex input, data@comma{} nondecimal -@cindex constants, nondecimal +@cindex advanced features @subentry nondecimal input data +@cindex input @subentry data, nondecimal +@cindex constants @subentry nondecimal If you run @command{gawk} with the @option{--non-decimal-data} option, you can have nondecimal values in your input data: @@ -27290,9 +27378,9 @@ Because it is common to have decimal data with leading zeros, and because using this facility could lead to surprising results, the default is to leave it disabled. If you want it, you must explicitly request it. -@cindex programming conventions, @code{--non-decimal-data} option -@cindex @option{--non-decimal-data} option, @code{strtonum()} function and -@cindex @code{strtonum()} function (@command{gawk}), @code{--non-decimal-data} option and +@cindex programming conventions @subentry @code{--non-decimal-data} option +@cindex @option{--non-decimal-data} option @subentry @code{strtonum()} function and +@cindex @code{strtonum()} function (@command{gawk}) @subentry @code{--non-decimal-data} option and @quotation CAUTION @emph{Use of this option is not recommended.} It can break old programs very badly. @@ -27583,6 +27671,7 @@ function should use the @code{isarray()} function (@pxref{Type Functions}) to check for this, and choose a defined sorting order for subarrays. +@cindex POSIX mode All sorting based on @code{PROCINFO["sorted_in"]} is disabled in POSIX mode, because the @code{PROCINFO} array is not special in that case. @@ -27599,11 +27688,14 @@ sorted array traversal is not the default. @node Array Sorting Functions @subsection Sorting Array Values and Indices with @command{gawk} -@cindex arrays, sorting +@cindex arrays @subentry sorting @subentry @code{asort()} function (@command{gawk}) +@cindex arrays @subentry sorting @subentry @code{asorti()} function (@command{gawk}) @cindexgawkfunc{asort} -@cindex @code{asort()} function (@command{gawk}), arrays@comma{} sorting +@cindex @code{asort()} function (@command{gawk}) @subentry arrays, sorting +@cindex @code{asort()} function (@command{gawk}) @subentry side effects @cindexgawkfunc{asorti} -@cindex @code{asorti()} function (@command{gawk}), arrays@comma{} sorting +@cindex @code{asorti()} function (@command{gawk}) @subentry arrays, sorting +@cindex @code{asorti()} function (@command{gawk}) @subentry side effects @cindex sort function, arrays, sorting In most @command{awk} implementations, sorting an array requires writing a @code{sort()} function. This can be educational for exploring @@ -27627,7 +27719,8 @@ The default comparison is based on the type of the elements All numeric values come before all string values, which in turn come before all subarrays. -@cindex side effects, @code{asort()} function +@cindex side effects @subentry @code{asort()} function +@cindex side effects @subentry @code{asorti()} function An important side effect of calling @code{asort()} is that @emph{the array's original indices are irrevocably lost}. As this isn't always desirable, @code{asort()} accepts a @@ -27696,9 +27789,9 @@ both arrays use the values. @end quotation @c Document It And Call It A Feature. Sigh. -@cindex @command{gawk}, @code{IGNORECASE} variable in -@cindex arrays, sorting, @code{IGNORECASE} variable and -@cindex @code{IGNORECASE} variable, array sorting functions and +@cindex @command{gawk} @subentry @code{IGNORECASE} variable in +@cindex arrays @subentry sorting @subentry @code{IGNORECASE} variable and +@cindex @code{IGNORECASE} variable @subentry array sorting functions and Because @code{IGNORECASE} affects string comparisons, the value of @code{IGNORECASE} also affects sorting for both @code{asort()} and @code{asorti()}. Note also that the locale's sorting order does @emph{not} @@ -27797,7 +27890,7 @@ Mike Brennan @end smallexample @end ignore -@cindex advanced features, processes@comma{} communicating with +@cindex advanced features @subentry processes, communicating with @cindex processes, two-way communications with It is often useful to be able to send data to a separate program for @@ -27828,10 +27921,10 @@ Brennan suggests the use of @command{rand()} to generate unique remain more difficult to use than two-way pipes.} @c 8/2014 @cindex coprocesses -@cindex input/output, two-way -@cindex @code{|} (vertical bar), @code{|&} operator (I/O) -@cindex vertical bar (@code{|}), @code{|&} operator (I/O) -@cindex @command{csh} utility, @code{|&} operator, comparison with +@cindex input/output @subentry two-way +@cindex @code{|} (vertical bar) @subentry @code{|&} operator (I/O) +@cindex vertical bar (@code{|}) @subentry @code{|&} operator (I/O) +@cindex @command{csh} utility @subentry @code{|&} operator, comparison with However, with @command{gawk}, it is possible to open a @emph{two-way} pipe to another process. The second process is termed a @dfn{coprocess}, as it runs in parallel with @command{gawk}. @@ -27867,8 +27960,8 @@ standard error goes. It is not possible to read the child's standard error separately. @cindex deadlocks -@cindex buffering, input/output -@cindex @code{getline} command, deadlock and +@cindex buffering @subentry input/output +@cindex @code{getline} command @subentry deadlock and @item I/O buffering may be a problem. @command{gawk} automatically flushes all output down the pipe to the coprocess. @@ -27879,7 +27972,7 @@ known as @dfn{deadlock}, where each process is waiting for the other one to do something. @end itemize -@cindex @code{close()} function, two-way pipes and +@cindex @code{close()} function @subentry two-way pipes and It is possible to close just one end of the two-way pipe to a coprocess, by supplying a second argument to the @code{close()} function of either @code{"to"} or @code{"from"} @@ -27888,7 +27981,7 @@ These strings tell @command{gawk} to close the end of the pipe that sends data to the coprocess or the end that reads from it, respectively. -@cindex @command{sort} utility, coprocesses and +@cindex @command{sort} utility @subentry coprocesses and This is particularly necessary in order to use the system @command{sort} utility as part of a coprocess; @command{sort} must read @emph{all} of its input @@ -27923,6 +28016,7 @@ indication. This causes @command{sort} to sort the data and write the sorted data back to the @command{gawk} program. Once all of the data has been read, @command{gawk} terminates the coprocess and exits. +@cindex ASCII As a side note, the assignment @samp{LC_ALL=C} in the @command{sort} command ensures traditional Unix (ASCII) sorting from @command{sort}. This is not strictly necessary here, but it's good to know how to do this. @@ -27945,8 +28039,8 @@ or @code{getline}. @xref{Nonfatal}, for more information. @end quotation -@cindex @command{gawk}, @code{PROCINFO} array in -@cindex @code{PROCINFO} array, communications via ptys and +@cindex @command{gawk} @subentry @code{PROCINFO} array in +@cindex @code{PROCINFO} array @subentry communications via ptys and You may also use pseudo-ttys (ptys) for two-way communication instead of pipes, if your system supports them. This is done on a per-command basis, by setting a special element @@ -28033,15 +28127,15 @@ And now, magically, it works! @node TCP/IP Networking @section Using @command{gawk} for Network Programming -@cindex advanced features, network programming -@cindex networks, programming +@cindex advanced features @subentry network programming +@cindex networks @subentry programming @cindex TCP/IP @cindex @code{/inet/@dots{}} special files (@command{gawk}) -@cindex files, @code{/inet/@dots{}} (@command{gawk}) +@cindex files @subentry @code{/inet/@dots{}} (@command{gawk}) @cindex @code{/inet4/@dots{}} special files (@command{gawk}) -@cindex files, @code{/inet4/@dots{}} (@command{gawk}) +@cindex files @subentry @code{/inet4/@dots{}} (@command{gawk}) @cindex @code{/inet6/@dots{}} special files (@command{gawk}) -@cindex files, @code{/inet6/@dots{}} (@command{gawk}) +@cindex files @subentry @code{/inet6/@dots{}} (@command{gawk}) @cindex @code{EMRED} @ifnotdocbook @quotation @@ -28096,6 +28190,7 @@ respectively. TCP should be used for most applications. @item local-port @cindex @code{getaddrinfo()} function (C library) +@cindex C library functions @subentry @code{getaddrinfo()} The local TCP or UDP port number to use. Use a port number of @samp{0} when you want the system to pick a port. This is what you should do when writing a TCP or UDP client. @@ -28113,7 +28208,7 @@ Again, use @samp{0} if you don't care, or else a well-known service name. @end table -@cindex @command{gawk}, @code{ERRNO} variable in +@cindex @command{gawk} @subentry @code{ERRNO} variable in @cindex @code{ERRNO} variable @quotation NOTE Failure in opening a two-way socket will result in a nonfatal error @@ -28160,10 +28255,10 @@ no way to access services available over Secure Socket Layer @node Profiling @section Profiling Your @command{awk} Programs -@cindex @command{awk} programs, profiling +@cindex @command{awk} programs @subentry profiling @cindex profiling @command{awk} programs @cindex @code{awkprof.out} file -@cindex files, @code{awkprof.out} +@cindex files @subentry @code{awkprof.out} You may produce execution traces of your @command{awk} programs. This is done by passing the option @option{--profile} to @command{gawk}. @@ -28231,8 +28326,8 @@ Here is the @file{awkprof.out} that results from running the illustrates that @command{awk} programmers sometimes get up very early in the morning to work): -@cindex @code{BEGIN} pattern, profiling and -@cindex @code{END} pattern, profiling and +@cindex @code{BEGIN} pattern @subentry profiling and +@cindex @code{END} pattern @subentry profiling and @example # gawk profile, created Mon Sep 29 05:16:21 2014 @@ -28296,7 +28391,7 @@ Multiple @code{BEGIN} and @code{END} rules retain their separate identities, as do multiple @code{BEGINFILE} and @code{ENDFILE} rules. -@cindex patterns, counts, in a profile +@cindex patterns @subentry counts, in a profile @item Pattern--action rules have two counts. The first count, to the left of the rule, shows how many times @@ -28316,7 +28411,7 @@ is a count showing how many times the condition was true. The count for the @code{else} indicates how many times the test failed. -@cindex loops, count for header, in a profile +@cindex loops @subentry count for header, in a profile @item The count for a loop header (such as @code{for} or @code{while}) shows how many times the loop test was executed. @@ -28324,8 +28419,8 @@ or @code{while}) shows how many times the loop test was executed. statement in a rule to determine how many times the rule was executed. If the first statement is a loop, the count is misleading.) -@cindex functions, user-defined, counts@comma{} in a profile -@cindex user-defined, functions, counts@comma{} in a profile +@cindex functions @subentry user-defined @subentry counts, in a profile +@cindex user-defined @subentry functions @subentry counts, in a profile @item For user-defined functions, the count next to the @code{function} keyword indicates how many times the function was called. @@ -28339,8 +28434,8 @@ The layout uses ``K&R'' style with TABs. Braces are used everywhere, even when the body of an @code{if}, @code{else}, or loop is only a single statement. -@cindex @code{()} (parentheses), in a profile -@cindex parentheses @code{()}, in a profile +@cindex @code{()} (parentheses) @subentry in a profile +@cindex parentheses @code{()} @subentry in a profile @item Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. @@ -28395,8 +28490,9 @@ which is correct, but possibly unexpected. (If a program uses both @samp{print $0} and plain @samp{print}, that distinction is retained.) -@cindex profiling @command{awk} programs, dynamically -@cindex @command{gawk} program, dynamic profiling +@cindex profiling @command{awk} programs @subentry dynamically +@cindex @command{gawk} @subentry dynamic profiling +@cindex @command{gawk} @subentry profiling programs @cindex dynamic profiling Besides creating profiles when a program has completed, @command{gawk} can produce a profile while it is running. @@ -28410,10 +28506,10 @@ $ @kbd{gawk --profile -f myprog &} [1] 13992 @end example -@cindex @command{kill} command@comma{} dynamic profiling +@cindex @command{kill} command, dynamic profiling @cindex @code{USR1} signal, for dynamic profiling @cindex @code{SIGUSR1} signal, for dynamic profiling -@cindex signals, @code{USR1}/@code{SIGUSR1}, for profiling +@cindex signals @subentry @code{USR1}/@code{SIGUSR1}, for profiling @noindent The shell prints a job number and process ID number; in this case, 13992. Use the @command{kill} command to send the @code{USR1} signal @@ -28446,16 +28542,16 @@ profile file. @cindex @code{HUP} signal, for dynamic profiling @cindex @code{SIGHUP} signal, for dynamic profiling -@cindex signals, @code{HUP}/@code{SIGHUP}, for profiling +@cindex signals @subentry @code{HUP}/@code{SIGHUP}, for profiling If you use the @code{HUP} signal instead of the @code{USR1} signal, @command{gawk} produces the profile and the function call trace and then exits. @cindex @code{INT} signal (MS-Windows) @cindex @code{SIGINT} signal (MS-Windows) -@cindex signals, @code{INT}/@code{SIGINT} (MS-Windows) +@cindex signals @subentry @code{INT}/@code{SIGINT} (MS-Windows) @cindex @code{QUIT} signal (MS-Windows) @cindex @code{SIGQUIT} signal (MS-Windows) -@cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) +@cindex signals @subentry @code{QUIT}/@code{SIGQUIT} (MS-Windows) When @command{gawk} runs on MS-Windows systems, it uses the @code{INT} and @code{QUIT} signals for producing the profile, and in the case of the @code{INT} signal, @command{gawk} exits. This is @@ -28475,8 +28571,8 @@ Once upon a time, the @option{--pretty-print} option would also run your program. This is no longer the case. @end quotation -@cindex profiling, pretty-printing, difference with -@cindex pretty-printing, profiling, difference with +@cindex profiling, pretty printing, difference with +@cindex pretty printing @subentry profiling, difference with There is a significant difference between the output created when profiling, and that created when pretty-printing. Pretty-printed output preserves the original comments that were in the program, although their @@ -28560,9 +28656,9 @@ countries, they were able to sell more systems. As a result, internationalization and localization of programs and software systems became a common practice. -@cindex internationalization, localization -@cindex @command{gawk}, internationalization and, See internationalization -@cindex internationalization, localization, @command{gawk} and +@cindex internationalization @subentry localization +@cindex @command{gawk} @subentry internationalization @seeentry{internationalization} +@cindex internationalization @subentry localization @subentry @command{gawk} and For many years, the ability to provide internationalization was largely restricted to programs written in C and C++. This @value{CHAPTER} describes the underlying library @command{gawk} @@ -28588,8 +28684,8 @@ a requirement. @section Internationalization and Localization @cindex internationalization -@cindex localization, See internationalization@comma{} localization -@cindex localization +@cindex localization @seeentry{internationalization, localization} +@cindex internationalization @subentry localization @dfn{Internationalization} means writing (or modifying) a program once, in such a way that it can use multiple languages without requiring further source code changes. @@ -28614,7 +28710,7 @@ port doesn't support GNU @command{gettext}. Therefore, these features are not available if you are using one of those operating systems. Sorry.} -@cindex portability, @command{gettext} library and +@cindex portability @subentry @command{gettext} library and When using GNU @command{gettext}, each application has its own @dfn{text domain}. This is a unique name, such as @samp{kpilot} or @samp{gawk}, that identifies the application. @@ -28636,15 +28732,16 @@ A table with strings of option names is not (e.g., @command{gawk}'s language). @cindex @code{textdomain()} function (C library) +@cindex C library functions @subentry @code{textdomain()} @item The programmer indicates the application's text domain (@command{"guide"}) to the @command{gettext} library, by calling the @code{textdomain()} function. @cindex @code{.pot} files -@cindex files, @code{.pot} -@cindex portable object template files -@cindex files, portable object template +@cindex files @subentry @code{.pot} +@cindex portable object @subentry template files +@cindex files @subentry portable object @subentry template file (@file{.pot}) @item Messages from the application are extracted from the source code and collected into a portable object template file (@file{guide.pot}), @@ -28654,9 +28751,9 @@ The original (usually English) messages serve as the key for lookup of the translations. @cindex @code{.po} files -@cindex files, @code{.po} -@cindex portable object files -@cindex files, portable object +@cindex files @subentry @code{.po} +@cindex portable object @subentry files +@cindex files @subentry portable object @item For each language with a translator, @file{guide.pot} is copied to a portable object file (@code{.po}) @@ -28664,9 +28761,9 @@ and translations are created and shipped with the application. For example, there might be a @file{fr.po} for a French translation. @cindex @code{.gmo} files -@cindex files, @code{.gmo} +@cindex files @subentry @code{.gmo} @cindex message object files -@cindex files, message object +@cindex files @subentry message object @item Each language's @file{.po} file is converted into a binary message object (@file{.gmo}) file. @@ -28679,15 +28776,16 @@ When @command{guide} is built and installed, the binary translation files are installed in a standard place. @cindex @code{bindtextdomain()} function (C library) +@cindex C library functions @subentry @code{bindtextdomain()} @item For testing and development, it is possible to tell @command{gettext} to use @file{.gmo} files in a different directory than the standard one by using the @code{bindtextdomain()} function. -@cindex @code{.gmo} files, specifying directory of -@cindex files, @code{.gmo}, specifying directory of -@cindex message object files, specifying directory of -@cindex files, message object, specifying directory of +@cindex @code{.gmo} files @subentry specifying directory of +@cindex files @subentry @code{.gmo} @subentry specifying directory of +@cindex message object files @subentry specifying directory of +@cindex files @subentry message object @subentry specifying directory of @item At runtime, @command{guide} looks up each string via a call to @code{gettext()}. The returned string is the translated string @@ -28701,6 +28799,7 @@ and forth. @end enumerate @cindex @code{gettext()} function (C library) +@cindex C library functions @subentry @code{gettext()} In C (or C++), the string marking and dynamic translation lookup are accomplished by wrapping each string in a call to @code{gettext()}: @@ -28711,8 +28810,8 @@ printf("%s", gettext("Don't Panic!\n")); The tools that extract messages from source code pull out all strings enclosed in calls to @code{gettext()}. -@cindex @code{_} (underscore), C macro -@cindex underscore (@code{_}), C macro +@cindex @code{_} (underscore) @subentry C macro +@cindex underscore (@code{_}) @subentry C macro The GNU @command{gettext} developers, recognizing that typing @samp{gettext(@dots{})} over and over again is both painful and ugly to look at, use the macro @samp{_} (an underscore) to make things easier: @@ -28725,8 +28824,8 @@ at, use the macro @samp{_} (an underscore) to make things easier: printf("%s", _("Don't Panic!\n")); @end example -@cindex internationalization, localization, locale categories -@cindex @command{gettext} library, locale categories +@cindex internationalization @subentry localization @subentry locale categories +@cindex @command{gettext} library @subentry locale categories @cindex locale categories @noindent This reduces the typing overhead to just three extra characters per string @@ -28769,6 +28868,8 @@ such as @code{/[[:alnum:]]/} @cindex monetary information, localization @cindex currency symbols, localization +@cindex internationalization @subentry localization @subentry monetary information +@cindex internationalization @subentry localization @subentry currency symbols @cindex @code{LC_MONETARY} locale category @item LC_MONETARY Monetary information, such as the currency symbol, and whether the @@ -28782,8 +28883,8 @@ use a comma every three decimal places and a period for the decimal point, while many Europeans do exactly the opposite: 1,234.56 versus 1.234,56.} -@cindex time, localization and -@cindex dates, information related to@comma{} localization +@cindex time @subentry localization and +@cindex dates @subentry information related to, localization @cindex @code{LC_TIME} locale category @item LC_TIME Time- and date-related information, such as 12- or 24-hour clock, month printed @@ -28796,6 +28897,7 @@ All of the above. (Not too useful in the context of @command{gettext}.) @quotation NOTE @cindex @env{LANGUAGE} environment variable +@cindex environment variables @subentry @env{LANGUAGE} As described in @ref{Locales}, environment variables with the same name as the locale categories (@env{LC_CTYPE}, @env{LC_ALL}, etc.) influence @command{gawk}'s behavior (and that of other utilities). @@ -28809,6 +28911,8 @@ look to see if @env{LANGUAGE} is defined, and if so, use the shell's @command{unset} command to remove it. @end quotation +@cindex @env{GAWK_LOCALE_DIR} environment variable +@cindex environment variables @subentry @env{GAWK_LOCALE_DIR} For testing translations of @command{gawk} itself, you can set the @env{GAWK_LOCALE_DIR} environment variable. See the documentation for the C @code{bindtextdomain()} function and also see @@ -28816,7 +28920,7 @@ for the C @code{bindtextdomain()} function and also see @node Programmer i18n @section Internationalizing @command{awk} Programs -@cindex @command{awk} programs, internationalizing +@cindex @command{awk} programs @subentry internationalizing @command{gawk} provides the following variables for internationalization: @@ -28828,8 +28932,8 @@ This variable indicates the application's text domain. For compatibility with GNU @command{gettext}, the default value is @code{"messages"}. -@cindex internationalization, localization, marked strings -@cindex strings, for localization +@cindex internationalization @subentry localization @subentry marked strings +@cindex strings @subentry for localization @item _"your message here" String constants marked with a leading underscore are candidates for translation at runtime. @@ -28878,10 +28982,10 @@ The default value for @var{category} is @code{"LC_MESSAGES"}. The same remarks about argument order as for the @code{dcgettext()} function apply. -@cindex @code{.gmo} files, specifying directory of -@cindex files, @code{.gmo}, specifying directory of -@cindex message object files, specifying directory of -@cindex files, message object, specifying directory of +@cindex @code{.gmo} files @subentry specifying directory of +@cindex files @subentry @code{.gmo} @subentry specifying directory of +@cindex message object files @subentry specifying directory of +@cindex files @subentry message object @subentry specifying directory of @cindexgawkfunc{bindtextdomain} @item @code{bindtextdomain(@var{directory}} [@code{,} @var{domain} ]@code{)} Change the directory in which @@ -28899,8 +29003,8 @@ given @var{domain}. To use these facilities in your @command{awk} program, follow these steps: @enumerate -@cindex @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and -@cindex @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and +@cindex @code{BEGIN} pattern @subentry @code{TEXTDOMAIN} variable and +@cindex @code{TEXTDOMAIN} variable @subentry @code{BEGIN} pattern and @item Set the variable @code{TEXTDOMAIN} to the text domain of your program. This is best done in a @code{BEGIN} rule @@ -28915,8 +29019,8 @@ BEGIN @{ @} @end example -@cindex @code{_} (underscore), translatable string -@cindex underscore (@code{_}), translatable string +@cindex @code{_} (underscore) @subentry translatable strings +@cindex underscore (@code{_}) @subentry translatable strings @item Mark all translatable strings with a leading underscore (@samp{_}) character. It @emph{must} be adjacent to the opening @@ -28962,7 +29066,7 @@ printf(message, ncustomers) @end example -@cindex @code{LC_MESSAGES} locale category, @code{bindtextdomain()} function (@command{gawk}) +@cindex @code{LC_MESSAGES} locale category @subentry @code{bindtextdomain()} function (@command{gawk}) @item During development, you might want to put the @file{.gmo} file in a private directory for testing. This is done @@ -28991,9 +29095,9 @@ and use translations from @command{awk}. @section Translating @command{awk} Programs @cindex @code{.po} files -@cindex files, @code{.po} -@cindex portable object files -@cindex files, portable object +@cindex files @subentry @code{.po} +@cindex portable object @subentry files +@cindex files @subentry portable object Once a program's translatable strings have been marked, they must be extracted to create the initial @file{.pot} file. As part of translation, it is often helpful to rearrange the order @@ -29013,10 +29117,10 @@ is covered. @node String Extraction @subsection Extracting Marked Strings -@cindex strings, extracting +@cindex strings @subentry extracting @cindex @option{--gen-pot} option -@cindex command-line options, string extraction -@cindex string extraction (internationalization) +@cindex command line @subentry options @subentry string extraction +@cindex string @subentry extraction (internationalization) @cindex marked string extraction (internationalization) @cindex extraction, of marked strings (internationalization) @@ -29050,7 +29154,7 @@ translations for @command{guide}. @node Printf Ordering @subsection Rearranging @code{printf} Arguments -@cindex @code{printf} statement, positional specifiers +@cindex @code{printf} statement @subentry positional specifiers @cindex positional specifiers, @code{printf} statement Format strings for @code{printf} and @code{sprintf()} (@pxref{Printf}) @@ -29118,9 +29222,9 @@ comes first, then the integer position, and then the @samp{$}. This is somewhat counterintuitive. @end quotation -@cindex @code{printf} statement, positional specifiers, mixing with regular formats -@cindex positional specifiers, @code{printf} statement, mixing with regular formats -@cindex format specifiers, mixing regular with positional specifiers +@cindex @code{printf} statement @subentry positional specifiers @subentry mixing with regular formats +@cindex positional specifiers, @code{printf} statement @subentry mixing with regular formats +@cindex format specifiers @subentry mixing regular with positional specifiers @command{gawk} does not allow you to mix regular format specifiers and those with positional specifiers in the same string: @@ -29146,8 +29250,8 @@ is first written. @node I18N Portability @subsection @command{awk} Portability Issues -@cindex portability, internationalization and -@cindex internationalization, localization, portability and +@cindex portability @subentry internationalization and +@cindex internationalization @subentry localization @subentry portability and @command{gawk}'s internationalization features were purposely chosen to have as little impact as possible on the portability of @command{awk} programs that use them to other versions of @command{awk}. @@ -29168,7 +29272,7 @@ However, it is actually almost portable, requiring very little change: @itemize @value{BULLET} -@cindex @code{TEXTDOMAIN} variable, portability and +@cindex @code{TEXTDOMAIN} variable @subentry portability and @item Assignments to @code{TEXTDOMAIN} won't have any effect, because @code{TEXTDOMAIN} is not special in other @command{awk} implementations. @@ -29187,9 +29291,9 @@ and @code{bindtextdomain()}, the @command{awk} program can be made to run, but all the messages are output in the original language. For example: -@cindex @code{bindtextdomain()} function (@command{gawk}), portability and -@cindex @code{dcgettext()} function (@command{gawk}), portability and -@cindex @code{dcngettext()} function (@command{gawk}), portability and +@cindex @code{bindtextdomain()} function (@command{gawk}) @subentry portability and +@cindex @code{dcgettext()} function (@command{gawk}) @subentry portability and +@cindex @code{dcngettext()} function (@command{gawk}) @subentry portability and @example @c file eg/lib/libintl.awk function bindtextdomain(dir, domain) @@ -29303,7 +29407,6 @@ msgstr "Like, the scoop is" @c endfile @end example -@cindex Linux @cindex GNU/Linux @quotation NOTE The following instructions apply to GNU/Linux with the GNU C Library. Be @@ -29317,6 +29420,14 @@ file and then to create the @file{guide.mo} file. The directory has the form @file{@var{locale}/LC_MESSAGES}, where @var{locale} is a locale name known to the C @command{gettext} routines. +@cindex @env{LANGUAGE} environment variable +@cindex environment variables @subentry @env{LANGUAGE} +@cindex @env{LC_ALL} environment variable +@cindex environment variables @subentry @env{LC_ALL} +@cindex @env{LANG} environment variable +@cindex environment variables @subentry @env{LANG} +@cindex @env{LC_MESSAGES} environment variable +@cindex environment variables @subentry @env{LC_MESSAGES} How do we know which locale to use? It turns out that there are four different environment variables used by the C @command{gettext} routines. In order, they are @env{$LANGUAGE}, @env{$LC_ALL}, @env{$LANG}, and @@ -29336,14 +29447,14 @@ We next make the directories: $ @kbd{mkdir en_US.UTF-8 en_US.UTF-8/LC_MESSAGES} @end example -@cindex @code{.po} files, converting to @code{.mo} -@cindex files, @code{.po}, converting to @code{.mo} +@cindex @code{.po} files @subentry converting to @code{.mo} +@cindex files @subentry @code{.po} @subentry converting to @code{.mo} @cindex @code{.mo} files, converting from @code{.po} -@cindex files, @code{.mo}, converting from @code{.po} -@cindex portable object files, converting to message object files -@cindex files, portable object, converting to message object files -@cindex message object files, converting from portable object files -@cindex files, message object, converting from portable object files +@cindex files @subentry @code{.mo}, converting from @code{.po} +@cindex portable object @subentry files @subentry converting to message object files +@cindex files @subentry portable object @subentry converting to message object files +@cindex message object files @subentry converting from portable object files +@cindex files @subentry message object @subentry converting from portable object files @cindex @command{msgfmt} utility The @command{msgfmt} utility converts the human-readable @file{.po} file into a machine-readable @file{.mo} file. @@ -29440,7 +29551,7 @@ a number of translations for its messages. @node Debugger @chapter Debugging @command{awk} Programs -@cindex debugging @command{awk} programs +@cindex debugging @subentry @command{awk} programs @c The original text for this chapter was contributed by Efraim Yawitz. @@ -29488,7 +29599,7 @@ In that case, what can you expect from such a tool? The answer to that depends on the language being debugged, but in general, you can expect at least the following: -@cindex debugger capabilities +@cindex debugger @subentry capabilities @itemize @value{BULLET} @item The ability to watch a program execute its instructions one by one, @@ -29521,14 +29632,14 @@ functional program that you or someone else wrote). @node Debugging Terms @subsection Debugging Concepts -@cindex debugger, concepts +@cindex debugger @subentry concepts Before diving in to the details, we need to introduce several important concepts that apply to just about all debuggers. The following list defines terms used throughout the rest of this @value{CHAPTER}: @table @dfn -@cindex call stack (debugger) +@cindex call stack @subentry explanation of @cindex stack frame (debugger) @item Stack frame Programs generally call functions during the course of their execution. @@ -29551,7 +29662,7 @@ invoked. Commands that print the call stack print information about each stack frame (as detailed later on). @item Breakpoint -@cindex breakpoint (debugger) +@cindex breakpoint During debugging, you often wish to let the program run until it reaches a certain point, and then continue execution from there one statement (or instruction) at a time. The way to do this is to set @@ -29597,7 +29708,7 @@ does not work at the level of machine instructions.} @section Sample @command{gawk} Debugging Session @cindex sample debugging session @cindex example debugging session -@cindex debugging, example session +@cindex debugging @subentry example session In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample debugging session. We will use the @command{awk} implementation of the @@ -29612,7 +29723,7 @@ as our example. @node Debugger Invocation @subsection How to Start the Debugger @cindex starting the debugger -@cindex debugger, how to start +@cindex debugger @subentry how to start Starting the debugger is almost exactly like running @command{gawk} normally, except you have to pass an additional option, @option{--debug}, or the @@ -29635,7 +29746,7 @@ in the command line to the debugger rather than as part of the @code{run} command at the debugger prompt.) The @option{-1} is an option to @file{uniq.awk}. -@cindex debugger, prompt +@cindex debugger @subentry prompt Instead of immediately running the program on @file{inputfile}, as @command{gawk} would ordinarily do, the debugger merely loads all the program source files, compiles them internally, and then gives @@ -29685,10 +29796,10 @@ a breakpoint in @file{uniq.awk} is at the beginning of the function @code{are_equal()}, which compares the current line with the previous one. To set the breakpoint, use the @code{b} (breakpoint) command: -@cindex debugger, setting a breakpoint -@cindex debugger, @code{breakpoint} command -@cindex debugger, @code{break} command -@cindex debugger, @code{b} command +@cindex debugger @subentry setting a breakpoint +@cindex debugger @subentry commands @subentry @code{breakpoint} +@cindex debugger @subentry commands @subentry @code{break} +@cindex debugger @subentry commands @subentry @code{b} (@code{break}) @example gawk> @kbd{b are_equal} @print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 63 @@ -29698,8 +29809,8 @@ The debugger tells us the file and line number where the breakpoint is. Now type @samp{r} or @samp{run} and the program runs until it hits the breakpoint for the first time: -@cindex debugger, running the program -@cindex debugger, @code{run} command +@cindex debugger @subentry running the program +@cindex debugger @subentry commands @subentry @code{run} @example gawk> @kbd{r} @print{} Starting program: @@ -29715,9 +29826,9 @@ let's see how we got to where we are. At the prompt, we type @samp{bt} (short for ``backtrace''), and the debugger responds with a listing of the current stack frames: -@cindex debugger, stack frames, showing -@cindex debugger, @code{bt} command -@cindex debugger, @code{backtrace} command +@cindex debugger @subentry stack frames, showing +@cindex debugger @subentry commands @subentry @code{bt} (@code{backtrace}) +@cindex debugger @subentry commands @subentry @code{backtrace} @example gawk> @kbd{bt} @print{} #0 are_equal(n, m, clast, cline, alast, aline) @@ -29737,8 +29848,8 @@ of some variables. Let's say we type @samp{p n} @code{n}, a parameter to @code{are_equal()}. Actually, the debugger gives us: -@cindex debugger, @code{print} command -@cindex debugger, @code{p} command +@cindex debugger @subentry commands @subentry @code{print} +@cindex debugger @subentry commands @subentry @code{p} (@code{print}) @example gawk> @kbd{p n} @print{} n = untyped variable @@ -29789,8 +29900,8 @@ be inside this function. To investigate further, we must begin ``stepping through'' the lines of @code{are_equal()}. We start by typing @samp{n} (for ``next''): -@cindex debugger, @code{n} command -@cindex debugger, @code{next} command +@cindex debugger @subentry commands @subentry @code{n} (@code{next}) +@cindex debugger @subentry commands @subentry @code{next} @example @group gawk> @kbd{n} @@ -29836,7 +29947,7 @@ This information is useful enough (we now know that none of the words were accidentally left out), but what if we want to see inside the array? -@cindex debugger, printing single array elements +@cindex debugger @subentry printing single array elements The first choice would be to use subscripts: @example @@ -29856,7 +29967,7 @@ This would be kind of slow for a 100-member array, though, so @command{gawk} provides a shortcut (reminiscent of another language not to be mentioned): -@cindex debugger, printing all array elements +@cindex debugger @subentry printing all array elements @example gawk> @kbd{p @@alast} @print{} alast["1"] = "awk" @@ -29936,7 +30047,7 @@ Getting information Miscellaneous @end itemize -@cindex debugger, repeating commands +@cindex debugger @subentry repeating commands Each of these are discussed in the following subsections. In the following descriptions, commands that may be abbreviated show the abbreviation on a second description line. @@ -29966,12 +30077,12 @@ will otherwise just run as if it was not under the debugger. The commands for controlling breakpoints are: @table @asis -@cindex debugger commands, @code{b} (@code{break}) -@cindex debugger commands, @code{break} +@cindex debugger @subentry commands @subentry @code{b} (@code{break}) +@cindex debugger @subentry commands @subentry @code{break} @cindex @code{break} debugger command @cindex @code{b} debugger command (alias for @code{break}) @cindex set breakpoint -@cindex breakpoint, setting +@cindex breakpoint @subentry setting @item @code{break} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] @itemx @code{b} [[@var{filename}@code{:}]@var{n} | @var{function}] [@code{"@var{expression}"}] Without any argument, set a breakpoint at the next instruction @@ -30001,10 +30112,10 @@ evaluates whenever the breakpoint is reached. If the condition is true, then the debugger stops execution and prompts for a command. Otherwise, it continues executing the program. -@cindex debugger commands, @code{clear} +@cindex debugger @subentry commands @subentry @code{clear} @cindex @code{clear} debugger command -@cindex delete breakpoint, at location -@cindex breakpoint at location, how to delete +@cindex delete breakpoint @subentry at location +@cindex breakpoint @subentry at location, how to delete @item @code{clear} [[@var{filename}@code{:}]@var{n} | @var{function}] Without any argument, delete any breakpoint at the next instruction to be executed in the selected stack frame. If the program stops at @@ -30023,9 +30134,9 @@ Delete breakpoint(s) set at line number @var{n} in source file @var{filename}. Delete breakpoint(s) set at entry to function @var{function}. @end table -@cindex debugger commands, @code{condition} +@cindex debugger @subentry commands @subentry @code{condition} @cindex @code{condition} debugger command -@cindex breakpoint condition +@cindex breakpoint @subentry condition @item @code{condition} @var{n} @code{"@var{expression}"} Add a condition to existing breakpoint or watchpoint @var{n}. The condition is an @command{awk} expression @emph{enclosed in double quotes} @@ -30036,27 +30147,27 @@ the debugger continues executing the program. If the condition expression is not specified, any existing condition is removed (i.e., the breakpoint or watchpoint is made unconditional). -@cindex debugger commands, @code{d} (@code{delete}) -@cindex debugger commands, @code{delete} +@cindex debugger @subentry commands @subentry @code{d} (@code{delete}) +@cindex debugger @subentry commands @subentry @code{delete} @cindex @code{delete} debugger command @cindex @code{d} debugger command (alias for @code{delete}) -@cindex delete breakpoint, by number -@cindex breakpoint, delete by number +@cindex delete breakpoint @subentry by number +@cindex breakpoint @subentry delete by number @item @code{delete} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] @itemx @code{d} [@var{n1 n2} @dots{}] [@var{n}--@var{m}] Delete specified breakpoints or a range of breakpoints. Delete all defined breakpoints if no argument is supplied. -@cindex debugger commands, @code{disable} +@cindex debugger @subentry commands @subentry @code{disable} @cindex @code{disable} debugger command @cindex disable breakpoint -@cindex breakpoint, how to disable or enable +@cindex breakpoint @subentry how to disable or enable @item @code{disable} [@var{n1 n2} @dots{} | @var{n}--@var{m}] Disable specified breakpoints or a range of breakpoints. Without any argument, disable all breakpoints. -@cindex debugger commands, @code{e} (@code{enable}) -@cindex debugger commands, @code{enable} +@cindex debugger @subentry commands @subentry @code{e} (@code{enable}) +@cindex debugger @subentry commands @subentry @code{enable} @cindex @code{enable} debugger command @cindex @code{e} debugger command (alias for @code{enable}) @cindex enable breakpoint @@ -30077,15 +30188,15 @@ Enable the breakpoints temporarily, then disable each one when the program stops at it. @end table -@cindex debugger commands, @code{ignore} +@cindex debugger @subentry commands @subentry @code{ignore} @cindex @code{ignore} debugger command @cindex ignore breakpoint @item @code{ignore} @var{n} @var{count} Ignore breakpoint number @var{n} the next @var{count} times it is hit. -@cindex debugger commands, @code{t} (@code{tbreak}) -@cindex debugger commands, @code{tbreak} +@cindex debugger @subentry commands @subentry @code{t} (@code{tbreak}) +@cindex debugger @subentry commands @subentry @code{tbreak} @cindex @code{tbreak} debugger command @cindex @code{t} debugger command (alias for @code{tbreak}) @cindex temporary breakpoint @@ -30103,13 +30214,13 @@ and observing its behavior. There are more commands for controlling execution of the program than we saw in our earlier example: @table @asis -@cindex debugger commands, @code{commands} +@cindex debugger @subentry commands @subentry @code{commands} @cindex @code{commands} debugger command -@cindex debugger commands, @code{silent} +@cindex debugger @subentry commands @subentry @code{silent} @cindex @code{silent} debugger command -@cindex debugger commands, @code{end} +@cindex debugger @subentry commands @subentry @code{end} @cindex @code{end} debugger command -@cindex breakpoint commands +@cindex breakpoint @subentry commands to execute at @cindex commands to execute at breakpoint @item @code{commands} [@var{n}] @itemx @code{silent} @@ -30136,8 +30247,8 @@ gawk> @kbd{commands} gawk> @end example -@cindex debugger commands, @code{c} (@code{continue}) -@cindex debugger commands, @code{continue} +@cindex debugger @subentry commands @subentry @code{c} (@code{continue}) +@cindex debugger @subentry commands @subentry @code{continue} @cindex continue program, in debugger @cindex @code{continue} debugger command @item @code{continue} [@var{count}] @@ -30146,14 +30257,14 @@ Resume program execution. If continued from a breakpoint and @var{count} is specified, ignore the breakpoint at that location the next @var{count} times before stopping. -@cindex debugger commands, @code{finish} +@cindex debugger @subentry commands @subentry @code{finish} @cindex @code{finish} debugger command @item @code{finish} Execute until the selected stack frame returns. Print the returned value. -@cindex debugger commands, @code{n} (@code{next}) -@cindex debugger commands, @code{next} +@cindex debugger @subentry commands @subentry @code{n} (@code{next}) +@cindex debugger @subentry commands @subentry @code{next} @cindex @code{next} debugger command @cindex @code{n} debugger command (alias for @code{next}) @cindex single-step execution, in the debugger @@ -30163,15 +30274,15 @@ Continue execution to the next source line, stepping over function calls. The argument @var{count} controls how many times to repeat the action, as in @code{step}. -@cindex debugger commands, @code{ni} (@code{nexti}) -@cindex debugger commands, @code{nexti} +@cindex debugger @subentry commands @subentry @code{ni} (@code{nexti}) +@cindex debugger @subentry commands @subentry @code{nexti} @cindex @code{nexti} debugger command @cindex @code{ni} debugger command (alias for @code{nexti}) @item @code{nexti} [@var{count}] @itemx @code{ni} [@var{count}] Execute one (or @var{count}) instruction(s), stepping over function calls. -@cindex debugger commands, @code{return} +@cindex debugger @subentry commands @subentry @code{return} @cindex @code{return} debugger command @item @code{return} [@var{value}] Cancel execution of a function call. If @var{value} (either a string or a @@ -30180,8 +30291,8 @@ frame other than the innermost one (the currently executing function; i.e., frame number 0), discard all inner frames in addition to the selected one, and the caller of that frame becomes the innermost frame. -@cindex debugger commands, @code{r} (@code{run}) -@cindex debugger commands, @code{run} +@cindex debugger @subentry commands @subentry @code{r} (@code{run}) +@cindex debugger @subentry commands @subentry @code{run} @cindex @code{run} debugger command @cindex @code{r} debugger command (alias for @code{run}) @item @code{run} @@ -30190,8 +30301,8 @@ Start/restart execution of the program. When restarting, the debugger retains the current breakpoints, watchpoints, command history, automatic display variables, and debugger options. -@cindex debugger commands, @code{s} (@code{step}) -@cindex debugger commands, @code{step} +@cindex debugger @subentry commands @subentry @code{s} (@code{step}) +@cindex debugger @subentry commands @subentry @code{step} @cindex @code{step} debugger command @cindex @code{s} debugger command (alias for @code{step}) @item @code{step} [@var{count}] @@ -30201,8 +30312,8 @@ current stack frame, stepping inside any function called within the line. If the argument @var{count} is supplied, steps that many times before stopping, unless it encounters a breakpoint or watchpoint. -@cindex debugger commands, @code{si} (@code{stepi}) -@cindex debugger commands, @code{stepi} +@cindex debugger @subentry commands @subentry @code{si} (@code{stepi}) +@cindex debugger @subentry commands @subentry @code{stepi} @cindex @code{stepi} debugger command @cindex @code{si} debugger command (alias for @code{stepi}) @item @code{stepi} [@var{count}] @@ -30211,8 +30322,8 @@ Execute one (or @var{count}) instruction(s), stepping inside function calls. (For illustration of what is meant by an ``instruction'' in @command{gawk}, see the output shown under @code{dump} in @ref{Miscellaneous Debugger Commands}.) -@cindex debugger commands, @code{u} (@code{until}) -@cindex debugger commands, @code{until} +@cindex debugger @subentry commands @subentry @code{u} (@code{until}) +@cindex debugger @subentry commands @subentry @code{until} @cindex @code{until} debugger command @cindex @code{u} debugger command (alias for @code{until}) @item @code{until} [[@var{filename}@code{:}]@var{n} | @var{function}] @@ -30229,7 +30340,7 @@ stack frame returns. The commands for viewing and changing variables inside of @command{gawk} are: @table @asis -@cindex debugger commands, @code{display} +@cindex debugger @subentry commands @subentry @code{display} @cindex @code{display} debugger command @item @code{display} [@var{var} | @code{$}@var{n}] Add variable @var{var} (or field @code{$@var{n}}) to the display list. @@ -30249,7 +30360,7 @@ no such variable of the given name exists. Without argument, @code{display} displays the current values of items on the list. -@cindex debugger commands, @code{eval} +@cindex debugger @subentry commands @subentry @code{eval} @cindex @code{eval} debugger command @cindex evaluate expressions, in debugger @item @code{eval "@var{awk statements}"} @@ -30276,8 +30387,8 @@ This form of @code{eval} is similar, but it allows you to define @var{awk statements}, instead of using variables or function parameters defined by the program. -@cindex debugger commands, @code{p} (@code{print}) -@cindex debugger commands, @code{print} +@cindex debugger @subentry commands @subentry @code{p} (@code{print}) +@cindex debugger @subentry commands @subentry @code{print} @cindex @code{print} debugger command @cindex @code{p} debugger command (alias for @code{print}) @cindex print variables, in debugger @@ -30304,7 +30415,7 @@ gawk> @kbd{print @@a} This prints the indices and the corresponding values for all elements in the array @code{a}. -@cindex debugger commands, @code{printf} +@cindex debugger @subentry commands @subentry @code{printf} @cindex @code{printf} debugger command @item @code{printf} @var{format} [@code{,} @var{arg} @dots{}] Print formatted text. The @var{format} may include escape sequences, @@ -30312,7 +30423,7 @@ such as @samp{\n} (@pxref{Escape Sequences}). No newline is printed unless one is specified. -@cindex debugger commands, @code{set} +@cindex debugger @subentry commands @subentry @code{set} @cindex @code{set} debugger command @cindex assign values to variables, in debugger @item @code{set} @var{var}@code{=}@var{value} @@ -30323,8 +30434,8 @@ String values must be enclosed between double quotes (@code{"}@dots{}@code{"}). You can also set special @command{awk} variables, such as @code{FS}, @code{NF}, @code{NR}, and so on. -@cindex debugger commands, @code{w} (@code{watch}) -@cindex debugger commands, @code{watch} +@cindex debugger @subentry commands @subentry @code{w} (@code{watch}) +@cindex debugger @subentry commands @subentry @code{watch} @cindex @code{watch} debugger command @cindex @code{w} debugger command (alias for @code{watch}) @cindex set watchpoint @@ -30342,14 +30453,14 @@ evaluates whenever the watchpoint is reached. If the condition is true, then the debugger stops execution and prompts for a command. Otherwise, @command{gawk} continues executing the program. -@cindex debugger commands, @code{undisplay} +@cindex debugger @subentry commands @subentry @code{undisplay} @cindex @code{undisplay} debugger command @cindex stop automatic display, in debugger @item @code{undisplay} [@var{n}] Remove item number @var{n} (or all items, if no argument) from the automatic display list. -@cindex debugger commands, @code{unwatch} +@cindex debugger @subentry commands @subentry @code{unwatch} @cindex @code{unwatch} debugger command @cindex delete watchpoint @item @code{unwatch} [@var{n}] @@ -30368,13 +30479,13 @@ and also move around in the stack to see what the state of things was in the functions that called the one you are in. The commands for doing this are: @table @asis -@cindex debugger commands, @code{bt} (@code{backtrace}) -@cindex debugger commands, @code{backtrace} -@cindex debugger commands, @code{where} (@code{backtrace}) +@cindex debugger @subentry commands @subentry @code{bt} (@code{backtrace}) +@cindex debugger @subentry commands @subentry @code{backtrace} +@cindex debugger @subentry commands @subentry @code{where} (@code{backtrace}) @cindex @code{backtrace} debugger command @cindex @code{bt} debugger command (alias for @code{backtrace}) @cindex @code{where} debugger command (alias for @code{backtrace}) -@cindex call stack, display in debugger +@cindex call stack @subentry display in debugger @cindex traceback, display in debugger @item @code{backtrace} [@var{count}] @itemx @code{bt} [@var{count}] @@ -30386,14 +30497,14 @@ function, the source @value{FN}, and the line number. The alias @code{where} for @code{backtrace} is provided for longtime GDB users who may be used to that command. -@cindex debugger commands, @code{down} +@cindex debugger @subentry commands @subentry @code{down} @cindex @code{down} debugger command @item @code{down} [@var{count}] Move @var{count} (default 1) frames down the stack toward the innermost frame. Then select and print the frame. -@cindex debugger commands, @code{f} (@code{frame}) -@cindex debugger commands, @code{frame} +@cindex debugger @subentry commands @subentry @code{f} (@code{frame}) +@cindex debugger @subentry commands @subentry @code{frame} @cindex @code{frame} debugger command @cindex @code{f} debugger command (alias for @code{frame}) @item @code{frame} [@var{n}] @@ -30404,7 +30515,7 @@ called the innermost one. The highest-numbered frame is the one for the main program. The printed information consists of the frame number, function and argument names, source file, and the source line. -@cindex debugger commands, @code{up} +@cindex debugger @subentry commands @subentry @code{up} @cindex @code{up} debugger command @item @code{up} [@var{count}] Move @var{count} (default 1) frames up the stack toward the outermost frame. @@ -30422,8 +30533,8 @@ is used with one of a number of arguments that tell it exactly what you want to know: @table @asis -@cindex debugger commands, @code{i} (@code{info}) -@cindex debugger commands, @code{info} +@cindex debugger @subentry commands @subentry @code{i} (@code{info}) +@cindex debugger @subentry commands @subentry @code{info} @cindex @code{info} debugger command @cindex @code{i} debugger command (alias for @code{info}) @item @code{info} @var{what} @@ -30433,11 +30544,13 @@ The value for @var{what} should be one of the following: @c nested table @table @code @item args -@cindex show function arguments, in debugger +@cindex show in debugger @subentry function arguments +@cindex function arguments, show in debugger List arguments of the selected frame. @item break -@cindex show breakpoints +@cindex show in debugger @subentry breakpoints +@cindex breakpoint @subentry show all in debugger List all currently set breakpoints. @item display @@ -30450,15 +30563,19 @@ Give a description of the selected stack frame. @item functions @cindex list function definitions, in debugger +@cindex function definitions, list in debugger List all function definitions including source @value{FN}s and line numbers. @item locals -@cindex show local variables, in debugger +@cindex show in debugger @subentry local variables +@cindex local variables @subentry show in debugger List local variables of the selected frame. @item source -@cindex show name of current source file, in debugger +@cindex show in debugger @subentry name of current source file +@cindex current source file, show in debugger +@cindex source file, show in debugger Print the name of the current source file. Each time the program stops, the current source file is the file containing the current instruction. When the debugger first starts, the current source file is the first file @@ -30467,15 +30584,18 @@ included via the @option{-f} option. The be used at any time to change the current source. @item sources -@cindex show all source files, in debugger +@cindex show in debugger @subentry all source files +@cindex all source files, show in debugger List all program sources. @item variables @cindex list all global variables, in debugger +@cindex global variables, show in debugger List all global variables. @item watch -@cindex show watchpoints +@cindex show in debugger @subentry watchpoints +@cindex watchpoints, show in debugger List all items in the watch list. @end table @end table @@ -30485,12 +30605,12 @@ save the debugger's state, and the ability to run debugger commands from a file. The commands are: @table @asis -@cindex debugger commands, @code{o} (@code{option}) -@cindex debugger commands, @code{option} +@cindex debugger @subentry commands @subentry @code{o} (@code{option}) +@cindex debugger @subentry commands @subentry @code{option} @cindex @code{option} debugger command @cindex @code{o} debugger command (alias for @code{option}) @cindex display debugger options -@cindex debugger, options +@cindex debugger @subentry options @item @code{option} [@var{name}[@code{=}@var{value}]] @itemx @code{o} [@var{name}[@code{=}@var{value}]] Without an argument, display the available debugger options @@ -30503,12 +30623,12 @@ The available options are: @c asis for docbook @table @asis @item @code{history_size} -@cindex debugger, history size +@cindex debugger @subentry history size Set the maximum number of lines to keep in the history file @file{./.gawk_history}. The default is 100. @item @code{listsize} -@cindex debugger, default list amount +@cindex debugger @subentry default list amount Specify the number of lines that @code{list} prints. The default is 15. @item @code{outfile} @@ -30518,11 +30638,11 @@ to standard output. An empty string (@code{""}) resets output to standard output. @item @code{prompt} -@cindex debugger, prompt +@cindex debugger @subentry prompt Change the debugger prompt. The default is @samp{@w{gawk> }}. @item @code{save_history} [@code{on} | @code{off}] -@cindex debugger, history file +@cindex debugger @subentry history file Save command history to file @file{./.gawk_history}. The default is @code{on}. @@ -30534,17 +30654,17 @@ Options are read back into the next session upon startup. @item @code{trace} [@code{on} | @code{off}] @cindex instruction tracing, in debugger -@cindex debugger, instruction tracing +@cindex debugger @subentry instruction tracing Turn instruction tracing on or off. The default is @code{off}. @end table -@cindex debugger, save commands to a file +@cindex debugger @subentry save commands to a file @item @code{save} @var{filename} Save the commands from the current session to the given @value{FN}, so that they can be replayed using the @command{source} command. @item @code{source} @var{filename} -@cindex debugger, read commands from a file +@cindex debugger @subentry read commands from a file Run command(s) from a file; an error in any command does not terminate execution of subsequent commands. Comments (lines starting with @samp{#}) are allowed in a command file. @@ -30568,7 +30688,7 @@ There are a few more commands that do not fit into the previous categories, as follows: @table @asis -@cindex debugger commands, @code{dump} +@cindex debugger @subentry commands @subentry @code{dump} @cindex @code{dump} debugger command @item @code{dump} [@var{filename}] Dump byte code of the program to standard output or to the file @@ -30638,8 +30758,8 @@ gawk> Exit the debugger. See the entry for @samp{quit}, later in this list. -@cindex debugger commands, @code{h} (@code{help}) -@cindex debugger commands, @code{help} +@cindex debugger @subentry commands @subentry @code{h} (@code{help}) +@cindex debugger @subentry commands @subentry @code{help} @cindex @code{help} debugger command @cindex @code{h} debugger command (alias for @code{help}) @item @code{help} @@ -30648,8 +30768,8 @@ Print a list of all of the @command{gawk} debugger commands with a short summary of their usage. @samp{help @var{command}} prints the information about the command @var{command}. -@cindex debugger commands, @code{l} (@code{list}) -@cindex debugger commands, @code{list} +@cindex debugger @subentry commands @subentry @code{l} (@code{list}) +@cindex debugger @subentry commands @subentry @code{list} @cindex @code{list} debugger command @cindex @code{l} debugger command (alias for @code{list}) @item @code{list} [@code{-} | @code{+} | @var{n} | @var{filename}@code{:}@var{n} | @var{n}--@var{m} | @var{function}] @@ -30682,8 +30802,8 @@ Print lines centered around the beginning of the function @var{function}. This command may change the current source file. @end table -@cindex debugger commands, @code{q} (@code{quit}) -@cindex debugger commands, @code{quit} +@cindex debugger @subentry commands @subentry @code{q} (@code{quit}) +@cindex debugger @subentry commands @subentry @code{quit} @cindex @code{quit} debugger command @cindex @code{q} debugger command (alias for @code{quit}) @cindex exit the debugger @@ -30695,7 +30815,7 @@ and are free to go on to the next one! As we saw earlier, if you are running a program, the debugger warns you when you type @samp{q} or @samp{quit}, to make sure you really want to quit. -@cindex debugger commands, @code{trace} +@cindex debugger @subentry commands @subentry @code{trace} @cindex @code{trace} debugger command @item @code{trace} [@code{on} | @code{off}] Turn on or off continuous printing of the instructions that are about to @@ -30711,9 +30831,9 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while @node Readline Support @section Readline Support @cindex command completion, in debugger -@cindex debugger, command completion +@cindex debugger @subentry command completion @cindex history expansion, in debugger -@cindex debugger, history expansion +@cindex debugger @subentry history expansion If @command{gawk} is compiled with @uref{http://cnswww.cns.cwru.edu/php/chet/readline/readline.html, @@ -30752,7 +30872,7 @@ and @node Limitations @section Limitations -@cindex debugger, limitations +@cindex debugger @subentry limitations We hope you find the @command{gawk} debugger useful and enjoyable to work with, but as with any program, especially in its early releases, it still has some limitations. A few that it's worth being aware of are: @@ -30944,8 +31064,8 @@ please report them (@xref{Bugs}). @node Global Namespace @section Standard @command{awk}'s Single Namespace -@cindex namespace, definition of -@cindex namespace, standard @command{awk}, global +@cindex namespace @subentry definition of +@cindex namespace @subentry standard @command{awk}, global In standard @command{awk}, there is a single, global, @dfn{namespace}. This means that @emph{all} function names and global variable names must be unique. For example, two different @command{awk} source files cannot @@ -30975,9 +31095,10 @@ simple mechanism to put functions and global variables into separate namespaces. @node Qualified Names @section Qualified Names -@cindex qualified name, definition of -@cindex namespaces, qualified names -@cindex @code{::}, namespace separator +@cindex qualified name @subentry definition of +@cindex namespaces @subentry qualified names +@cindex @code{:} (colon) @subentry @code{::} namespace separator +@cindex colon (@code{:}) @subentry @code{::} namespace separator @cindex component name A @dfn{qualified name} is an identifier that includes a namespace name, the namespace separator @code{::}, and a @dfn{component} name. For example, one @@ -30991,7 +31112,7 @@ Unlike C++, the @code{::} is @emph{not} an operator. No spaces are allowed between the namespace name, the @code{::}, and the component name. @end quotation -@cindex qualified name, use of +@cindex qualified name @subentry use of You must use qualified names from one namespace to access variables and functions in another. This is especially important when using variable names to index the special @code{SYMTAB} array (@pxref{Auto-set}), @@ -31000,9 +31121,9 @@ and when making indirect function calls (@pxref{Indirect Calls}). @node Default Namespace @section The Default Namespace -@cindex namespace, default -@cindex namespace, @code{awk} -@cindex @code{awk} namespace +@cindex namespace @subentry default +@cindex namespace @subentry @code{awk} +@cindex @code{awk} @subentry namespace The default namespace, not surprisingly, is @code{awk}. All of the predefined @command{awk} and @command{gawk} variables are in this namespace, and thus have qualified names like @@ -31019,8 +31140,10 @@ It also keeps your code looking natural. @node Changing The Namespace @section Changing The Namespace -@cindex namespaces, changing -@cindex @code{@@namespace} directive +@cindex namespaces @subentry changing +@cindex @code{@@} (at-sign) @subentry @code{@@namespace} directive +@cindex at-sign (@code{@@}) @subentry @code{@@namespace} directive +@cindex @code{@@namespace} directive @sortas{namespace directive} In order to set the current namespace, use an @code{@@namespace} directive at the top level of your program: @@ -31045,7 +31168,7 @@ no concept of a ``current'' namespace once your program starts executing. Be sure you understand this. @end quotation -@cindex namespace, implicit +@cindex namespace @subentry implicit @cindex implicit namespace Each source file for @option{-i} and @option{-f} starts out with an implicit @samp{@@namespace "awk"}. Similarly, each chunk of @@ -31053,7 +31176,7 @@ command-line code supplied with @option{-e} has such an implicit initial statement (@pxref{Options}). @cindex current namespace, pushing and popping -@cindex namespace, pushing and popping +@cindex namespace @subentry pushing and popping Files included with @code{@@include} (@pxref{Include Files}) ``push'' and ``pop'' the current namespace. That is, each @code{@@include} saves the current namespace and starts over with an implicit @samp{@@namespace @@ -31062,20 +31185,23 @@ directive is seen. When @command{gawk} finishes processing the included file, the saved namespace is restored and processing continues where it left off in the original file. -@cindex @code{@@namespace}, no effect on @code{BEGIN}@comma{} @code{BEGINFILE}@comma{} @code{END}@comma{} and @code{ENDFILE} -@cindex @code{BEGIN} pattern, execution order not affected by @code{@@namespace} -@cindex @code{BEGINFILE} pattern, execution order not affected by @code{@@namespace} -@cindex @code{END} pattern, execution order not affected by @code{@@namespace} -@cindex @code{ENDFILE} pattern, execution order not affected by @code{@@namespace} +@cindex @code{@@} (at-sign) @subentry @code{@@namespace} directive @subentry @code{BEGIN}, @code{BEGINFILE}, @code{END}, @code{ENDFILE} and +@cindex at-sign (@code{@@}) @subentry @code{@@namespace} directive @subentry @code{BEGIN}, @code{BEGINFILE}, @code{END}, @code{ENDFILE} and +@cindex @code{BEGIN} pattern @subentry @code{@@namespace} directive and +@cindex @code{BEGINFILE} pattern @subentry @code{@@namespace} directive and +@cindex @code{END} pattern @subentry @code{@@namespace} directive and +@cindex @code{ENDFILE} pattern @subentry @code{@@namespace} directive and +@cindex @code{@@namespace} directive @sortas{namespace directive} The use of @code{@@namespace} has no influence upon the order of execution of @code{BEGIN}, @code{BEGINFILE}, @code{END}, and @code{ENDFILE} rules. @node Naming Rules @section Namespace and Component Naming Rules -@cindex naming rules, namespaces and component names -@cindex namespace names, naming rules -@cindex component names, naming rules +@cindex naming rules, namespace and component names +@cindex namespaces @subentry naming rules +@c not "component names" to merge with other index entry +@cindex component name @subentry naming rules A number of rules apply to the namespace and component names, as follows. @itemize @bullet @@ -31142,8 +31268,8 @@ $ @kbd{gawk -f systime.awk} @section Internal Name Management @cindex name management -@cindex @code{awk} namespace, identifier name storage -@cindex @code{awk} namespace, use for indirect function calls +@cindex @code{awk} @subentry namespace @subentry identifier name storage +@cindex @code{awk} @subentry namespace @subentry use for indirect function calls For backwards compatibility, all identifiers in the @code{awk} namespace are stored internally as unadorned identifiers (that is, without a leading @samp{awk::}). This is mainly relevant @@ -31174,7 +31300,7 @@ function compute() @ii{This is really} report::compute() @node Namespace Example @section Namespace Example -@cindex namespace, example code +@cindex namespace @subentry example code The following example is a revised version of the suite of routines developed in @ref{Passwd Functions}. See there for an explanation of how the code works. @@ -31300,8 +31426,8 @@ $ @kbd{gawk -f ns_passwd.awk -f testpasswd.awk} This @value{SECTION} looks briefly at how the namespace facility interacts with other important @command{gawk} features. -@cindex namespaces, interaction with profiler -@cindex namespaces, interaction with pretty printer +@cindex namespaces @subentry interaction with @subentry profiler +@cindex namespaces @subentry interaction with @subentry pretty printer @cindex profiler, interaction with namespaces @cindex pretty printer, interaction with namespaces The profiler and pretty-printer (@pxref{Profiling}) have been enhanced @@ -31311,15 +31437,15 @@ namespace together, and has @code{@@namespace} directives in front of rules as necessary. This allows component names to be simple identifiers, instead of using qualified identifiers everywhere. -@cindex namespaces, interaction with debugger -@cindex debugger, interaction with namespaces +@cindex namespaces @subentry interaction with @subentry debugger +@cindex debugger @subentry interaction with namespaces Interaction with the debugger (@pxref{Debugging}) has not had to change (at least as of this writing). Some of the internal byte codes changed in order to accommodate namespaces, and the debugger's @code{dump} command was adjusted to match. -@cindex namespaces, interaction with extension API -@cindex extension API interaction with namespaces +@cindex namespaces @subentry interaction with @subentry extension API +@cindex extension API @subentry interaction with namespaces The extension API (@pxref{Dynamic Extensions}) has always allowed for placing functions into a different namespace, although this was not previously implemented. However, the symbol lookup and symbol update @@ -31356,7 +31482,7 @@ namespaces smoothly with their operation. This applies most notably to the profiler / pretty-printer (@pxref{Profiling}) and to the extension facility (@pxref{Dynamic Extensions}). -@cindex namespaces, backwards compatibility +@cindex namespaces @subentry backwards compatibility @item Overall, the namespace facility was designed and implemented such that backwards compatibility is paramount. Programs that don't use namespaces @@ -31369,7 +31495,7 @@ version of @command{gawk}. @cindex arbitrary precision @cindex multiple precision @cindex infinite precision -@cindex floating-point, numbers@comma{} arbitrary-precision +@cindex floating-point @subentry numbers @subentry arbitrary-precision This @value{CHAPTER} introduces some basic concepts relating to how computers do arithmetic and defines some important terms. @@ -31438,7 +31564,7 @@ The advantage to integer numbers is that they represent values exactly. The disadvantage is that their range is limited. @cindex unsigned integers -@cindex integers, unsigned +@cindex integers @subentry unsigned In computers, integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. Signed values may be negative or positive, whereas unsigned values are always greater than or equal @@ -31448,7 +31574,7 @@ In computer systems, integer arithmetic is exact, but the possible range of values is limited. Integer arithmetic is generally faster than floating-point arithmetic. -@cindex floating-point, numbers +@cindex floating-point @subentry numbers @item Floating-point arithmetic Floating-point numbers represent what were called in school ``real'' numbers (i.e., those that have a fractional part, such as 3.1415927). @@ -31460,9 +31586,9 @@ Modern systems support floating-point arithmetic in hardware, with a limited range of values. There are software libraries that allow the use of arbitrary-precision floating-point calculations. -@cindex floating-point, numbers@comma{} single-precision -@cindex floating-point, numbers@comma{} double-precision -@cindex floating-point, numbers@comma{} arbitrary-precision +@cindex floating-point @subentry numbers @subentry single-precision +@cindex floating-point @subentry numbers @subentry double-precision +@cindex floating-point @subentry numbers @subentry arbitrary-precision @cindex single-precision @cindex double-precision @cindex arbitrary precision @@ -32132,8 +32258,8 @@ output when you change the rounding mode to be sure. @node Arbitrary Precision Integers @section Arbitrary-Precision Integer Arithmetic with @command{gawk} -@cindex integers, arbitrary precision -@cindex arbitrary precision integers +@cindex integers @subentry arbitrary precision +@cindex arbitrary precision @subentry integers When given the @option{-M} option, @command{gawk} performs all integer arithmetic using GMP arbitrary-precision @@ -32365,7 +32491,6 @@ word sizes. See @node Checking for MPFR @section How To Check If MPFR Is Available -@cindex MPFR, checking availability of @cindex checking for MPFR @cindex MPFR, checking for Occasionally, you might like to be able to check if @command{gawk} @@ -32519,6 +32644,7 @@ $ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} Thus, @samp{+nan} and @samp{+NaN} are the same. @end itemize +@cindex POSIX mode Besides handling input, @command{gawk} also needs to print ``correct'' values on output when a value is either NaN or infinity. Starting with @value{PVERSION} 4.2.2, for such values @command{gawk} prints one of the four strings @@ -33014,7 +33140,7 @@ the macros as if they were functions. @node General Data Types @subsection General-Purpose Data Types -@cindex Robbins, Arnold +@cindex Robbins @subentry Arnold @cindex Ramey, Chet @quotation @i{I have a true love/hate relationship with unions.} @@ -33224,7 +33350,7 @@ process and reduces the time needed to create the value. @node Memory Allocation Functions @subsection Memory Allocation Functions and Convenience Macros @cindex allocating memory for extensions -@cindex extensions, allocating memory +@cindex extensions @subentry loadable @subentry allocating memory @cindex memory, allocating for extensions The API provides a number of @dfn{memory allocation} functions for @@ -33417,8 +33543,8 @@ to be a @samp{char *} value pointing to data previously obtained from @node Registration Functions @subsection Registration Functions -@cindex register extension -@cindex extension registration +@cindex register loadable extension +@cindex extensions @subentry loadable @subentry registration This @value{SECTION} describes the API functions for registering parts of your extension with @command{gawk}. @@ -34097,7 +34223,7 @@ Register the two-way processor pointed to by @code{two_way_processor} with @node Printing Messages @subsection Printing Messages -@cindex printing messages from extensions +@cindex printing @subentry messages from extensions @cindex messages from extensions You can print different kinds of warning messages from your @@ -34646,7 +34772,7 @@ you should release any cached values that you created, using @node Array Manipulation @subsection Array Manipulation @cindex array manipulation in extensions -@cindex extensions, array manipulation in +@cindex extensions @subentry loadable @subentry array manipulation in The primary data structure@footnote{OK, the only data structure.} in @command{awk} is the associative array (@pxref{Arrays}). @@ -35322,8 +35448,8 @@ information about how @command{gawk} was invoked. @node Extension Versioning @subsubsection API Version Constants and Variables -@cindex API version -@cindex extension API version +@cindex API @subentry version +@cindex extension API @subentry version number The API provides both a ``major'' and a ``minor'' version number. The API versions are available at compile time as C preprocessor defines @@ -35423,8 +35549,8 @@ calls @code{check_mpfr_version()}. @node Extension API Informational Variables @subsubsection Informational Variables -@cindex API informational variables -@cindex extension API informational variables +@cindex API @subentry informational variables +@cindex extension API @subentry informational variables The API provides access to several variables that describe whether the corresponding command-line options were enabled when @@ -35636,7 +35762,7 @@ The @code{get_file()} API is new @node Finding Extensions @section How @command{gawk} Finds Extensions -@cindex extension search path +@cindex extensions @subentry loadable @subentry search path @cindex finding extensions Compiled extensions have to be installed in a directory where @@ -35648,7 +35774,7 @@ path with a list of directories to search for compiled extensions. @node Extension Example @section Example: Some File Functions -@cindex extension example +@cindex extensions @subentry loadable @subentry example @quotation @i{No matter where you go, there you are.} @@ -35862,7 +35988,7 @@ static const char *ext_version = "filefuncs extension: version 1.0"; int plugin_is_GPL_compatible; @end example -@cindex programming conventions, @command{gawk} extensions +@cindex programming conventions @subentry @command{gawk} extensions By convention, for an @command{awk} function @code{foo()}, the C function that implements it is called @code{do_foo()}. The function should have two arguments. The first is an @code{int}, usually called @code{nargs}, @@ -36204,7 +36330,7 @@ And that's it! @node Using Internal File Ops @subsection Integrating the Extensions -@cindex @command{gawk}, interpreter@comma{} adding code to +@cindex @command{gawk} @subentry interpreter, adding code to Now that the code is written, it must be possible to add it at runtime to the running @command{gawk} interpreter. First, the code must be compiled. Assuming that the functions are in @@ -36286,7 +36412,7 @@ $ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk} @node Extension Samples @section The Sample Extensions in the @command{gawk} Distribution -@cindex extensions distributed with @command{gawk} +@cindex extensions @subentry loadable @subentry distributed with @command{gawk} This @value{SECTION} provides a brief overview of the sample extensions that come in the @command{gawk} distribution. Some of them are intended @@ -36996,8 +37122,7 @@ for more information. @node gawkextlib @section The @code{gawkextlib} Project -@cindex @code{gawkextlib} -@cindex extensions, where to find +@cindex extensions @subentry loadable @subentry @code{gawkextlib} project @cindex @code{gawkextlib} project The @uref{https://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}} @@ -37317,8 +37442,8 @@ online documentation}. @node V7/SVR3.1 @appendixsec Major Changes Between V7 and SVR3.1 -@cindex @command{awk}, versions of -@cindex @command{awk}, versions of, changes between V7 and SVR3.1 +@cindex @command{awk} @subentry versions of +@cindex @command{awk} @subentry versions of @subentry changes between V7 and SVR3.1 The @command{awk} language evolved considerably between the release of Version 7 Unix (1978) and the new version that was first made generally available in @@ -37408,7 +37533,7 @@ Multidimensional arrays @node SVR4 @appendixsec Changes Between SVR3.1 and SVR4 -@cindex @command{awk}, versions of, changes between SVR3.1 and SVR4 +@cindex @command{awk} @subentry versions of @subentry changes between SVR3.1 and SVR4 The System V Release 4 (1989) version of Unix @command{awk} added these features (some of which originated in @command{gawk}): @@ -37466,8 +37591,8 @@ Processing of escape sequences inside command-line variable assignments @node POSIX @appendixsec Changes Between SVR4 and POSIX @command{awk} -@cindex @command{awk}, versions of, changes between SVR4 and POSIX @command{awk} -@cindex POSIX @command{awk}, changes in @command{awk} versions +@cindex @command{awk} @subentry versions of @subentry changes between SVR4 and POSIX @command{awk} +@cindex POSIX @command{awk} @subentry changes in @command{awk} versions The POSIX Command Language and Utilities standard for @command{awk} (1992) introduced the following changes into the language: @@ -37522,9 +37647,9 @@ The 2008 POSIX standard can be found online at @node BTL @appendixsec Extensions in Brian Kernighan's @command{awk} -@cindex @command{awk}, versions of, See Also Brian Kernighan's @command{awk} -@cindex extensions, Brian Kernighan's @command{awk} -@cindex Brian Kernighan's @command{awk}, extensions +@cindex @command{awk} @subentry versions of @seealso{Brian Kernighan's @command{awk}} +@cindex extensions @subentry Brian Kernighan's @command{awk} +@cindex Brian Kernighan's @command{awk} @subentry extensions @cindex Kernighan, Brian Brian Kernighan has made his version available via his home page @@ -37563,9 +37688,9 @@ available in his @command{awk}. @node POSIX/GNU @appendixsec Extensions in @command{gawk} Not in POSIX @command{awk} -@cindex compatibility mode (@command{gawk}), extensions -@cindex extensions, in @command{gawk}, not in POSIX @command{awk} -@cindex POSIX, @command{gawk} extensions not included in +@cindex compatibility mode (@command{gawk}) @subentry extensions +@cindex extensions @subentry in @command{gawk}, not in POSIX @command{awk} +@cindex POSIX @subentry @command{gawk} extensions not included in The GNU implementation, @command{gawk}, adds a large number of features. They can all be disabled with either the @option{--traditional} or @option{--posix} options @@ -38403,6 +38528,7 @@ a global symbol indicating that they are GPL-compatible (@pxref{Plugin License}). @item +@cindex POSIX mode In POSIX mode, string comparisons use @code{strcoll()} / @code{wcscoll()} (@pxref{POSIX String Comparison}). @@ -38661,8 +38787,8 @@ unfortunately. @node Common Extensions @appendixsec Common Extensions Summary -@cindex extensions, Brian Kernighan's @command{awk} -@cindex extensions, @command{mawk} +@cindex extensions @subentry Brian Kernighan's @command{awk} +@cindex extensions @subentry @command{mawk} The following table summarizes the common extensions supported by @command{gawk}, Brian Kernighan's @command{awk}, and @command{mawk}, the three most widely used freely available versions of @command{awk} @@ -38693,6 +38819,8 @@ This @value{SECTION} describes the confusing history of ranges within regular expressions and their interactions with locales, and how this affected different versions of @command{gawk}. +@cindex ASCII +@cindex EBCDIC The original Unix tools that worked with regular expressions defined character ranges (such as @samp{[a-z]}) to match any character between the first character in the range and the last character in the range, @@ -38750,6 +38878,7 @@ This result is due to the locale setting (and thus you may not see it on your system). @cindex Unicode +@cindex ASCII Similar considerations apply to other ranges. For example, @samp{["-/]} is perfectly valid in ASCII, but is not valid in many Unicode locales, such as @code{en_US.UTF-8}. @@ -38792,7 +38921,7 @@ In all cases @command{gawk} remains POSIX-compliant. @node Contributors @appendixsec Major Contributors to @command{gawk} -@cindex @command{gawk}, list of contributors to +@cindex @command{gawk} @subentry list of contributors to @quotation @i{Always give credit where credit is due.} @author Anonymous @@ -38979,7 +39108,7 @@ of IBM in Japan, contributed support for multibyte characters. Michael Benzinger contributed the initial code for @code{switch} statements. @item -@cindex McPhee, Patrick +@cindex McPhee, Patrick T.J.@: Patrick T.J.@: McPhee contributed the code for dynamic loading in Windows32 environments. (This is no longer supported.) @@ -39061,7 +39190,7 @@ distribution. Juan Manuel Guerrero took over maintenance of the DJGPP port. @item -@cindex Robbins, Arnold +@cindex Robbins @subentry Arnold Arnold Robbins has been working on @command{gawk} since 1988, at first helping David Trueman, and as the primary maintainer since around 1994. @@ -39089,6 +39218,8 @@ They can be disabled with either the @option{--traditional} or @option{--posix} options. @item +@cindex ASCII +@cindex EBCDIC The interaction of POSIX locales and regexp matching in @command{gawk} has been confusing over the years. Today, @command{gawk} implements Rational Range Interpretation, where ranges of the form @samp{[a-z]} match @emph{only} the characters numerically between @@ -39106,8 +39237,11 @@ the appropriate credit where credit is due. @appendix Installing @command{gawk} @c last two commas are part of see also -@cindex operating systems, See Also GNU/Linux@comma{} PC operating systems@comma{} Unix -@cindex @command{gawk}, installing +@cindex operating systems +@cindex operating systems @seealso{GNU/Linux} +@cindex operating systems @seealso{PC operating systems} +@cindex operating systems @seealso{Unix} +@cindex @command{gawk} @subentry installing @cindex installing @command{gawk} This appendix provides instructions for installing @command{gawk} on the various platforms that are supported by the developers. The primary @@ -39130,7 +39264,7 @@ the respective ports. @node Gawk Distribution @appendixsec The @command{gawk} Distribution -@cindex source code, @command{gawk} +@cindex source code @subentry @command{gawk} This @value{SECTION} describes how to get the @command{gawk} distribution, how to extract it, and then what is in the various files and @@ -39144,7 +39278,7 @@ subdirectories. @node Getting @appendixsubsec Getting the @command{gawk} Distribution -@cindex @command{gawk}, source code@comma{} obtaining +@cindex @command{gawk} @subentry source code, obtaining There are two ways to get GNU software: @itemize @value{BULLET} @@ -39220,7 +39354,7 @@ a local expert. @node Distribution contents @appendixsubsec Contents of the @command{gawk} Distribution -@cindex @command{gawk}, distribution +@cindex @command{gawk} @subentry distribution The @command{gawk} distribution has a number of C source files, documentation files, @@ -39276,7 +39410,7 @@ A description of behaviors in the POSIX standard for @command{awk} that are left undefined, or where @command{gawk} may not comply fully, as well as a list of things that the POSIX standard should describe but does not. -@cindex artificial intelligence@comma{} @command{gawk} and +@cindex artificial intelligence, @command{gawk} and @item doc/awkforai.txt Pointers to the original draft of a short article describing why @command{gawk} is a good language for @@ -39528,26 +39662,32 @@ on other platforms, the appropriate location may be different. @table @command @cindex @command{gawkpath_default} shell function +@cindex shell function @subentry @command{gawkpath_default} @item gawkpath_default Reset the @env{AWKPATH} environment variable to its default value. @cindex @command{gawkpath_prepend} shell function +@cindex shell function @subentry @command{gawkpath_prepend} @item gawkpath_prepend Add the argument to the front of the @env{AWKPATH} environment variable. @cindex @command{gawkpath_append} shell function +@cindex shell function @subentry @command{gawkpath_append} @item gawkpath_append Add the argument to the end of the @env{AWKPATH} environment variable. @cindex @command{gawklibpath_default} shell function +@cindex shell function @subentry @command{gawklibpath_default} @item gawklibpath_default Reset the @env{AWKLIBPATH} environment variable to its default value. @cindex @command{gawklibpath_prepend} shell function +@cindex shell function @subentry @command{gawklibpath_prepend} @item gawklibpath_prepend Add the argument to the front of the @env{AWKLIBPATH} environment variable. @cindex @command{gawklibpath_append} shell function +@cindex shell function @subentry @command{gawklibpath_append} @item gawklibpath_append Add the argument to the end of the @env{AWKLIBPATH} environment variable. @@ -39556,8 +39696,8 @@ Add the argument to the end of the @env{AWKLIBPATH} environment variable. @node Additional Configuration Options @appendixsubsec Additional Configuration Options -@cindex @command{gawk}, configuring, options -@cindex configuration options@comma{} @command{gawk} +@cindex @command{gawk} @subentry configuring @subentry options +@cindex configuration options, @command{gawk} There are several additional options you may use on the @command{configure} command line when compiling @command{gawk} from scratch, including: @@ -39565,7 +39705,7 @@ command line when compiling @command{gawk} from scratch, including: @table @code @cindex @option{--disable-extensions} configuration option -@cindex configuration option, @code{--disable-extensions} +@cindex configuration option @subentry @code{--disable-extensions} @item --disable-extensions Disable the extension mechanism within @command{gawk}. With this option, it is not possible to use dynamic extensions. This also @@ -39577,7 +39717,7 @@ The default action is to dynamically check if the extensions can be configured and compiled. @cindex @option{--disable-lint} configuration option -@cindex configuration option, @code{--disable-lint} +@cindex configuration option @subentry @code{--disable-lint} @item --disable-lint Disable all lint checking within @command{gawk}. The @option{--lint} and @option{--lint-old} options @@ -39600,21 +39740,21 @@ to fail. This option may be removed at a later date. @end quotation @cindex @option{--disable-mpfr} configuration option -@cindex configuration option, @code{--disable-mpfr} +@cindex configuration option @subentry @code{--disable-mpfr} @item --disable-mpfr Skip checking for the MPFR and GMP libraries. This is useful mainly for the developers, to make sure nothing breaks if MPFR support is not available. @cindex @option{--disable-nls} configuration option -@cindex configuration option, @code{--disable-nls} +@cindex configuration option @subentry @code{--disable-nls} @item --disable-nls Disable all message-translation facilities. This is usually not desirable, but it may bring you some slight performance improvement. @cindex @option{--enable-versioned-extension-dir} configuration option -@cindex configuration option, @code{--enable-versioned-extension-dir} +@cindex configuration option @subentry @code{--enable-versioned-extension-dir} @item --enable-versioned-extension-dir Use a versioned directory for extensions. The directory name will include the major and minor API versions in it. This makes it possible @@ -39629,7 +39769,7 @@ options supplied by @command{configure}. @node Configuration Philosophy @appendixsubsec The Configuration Process -@cindex @command{gawk}, configuring +@cindex @command{gawk} @subentry configuring This @value{SECTION} is of interest only if you know something about using the C language and Unix-like operating systems. @@ -39690,8 +39830,8 @@ various non-Unix systems. @node PC Installation @appendixsubsec Installation on MS-Windows -@cindex PC operating systems@comma{} @command{gawk} on, installing -@cindex operating systems, PC@comma{} @command{gawk} on, installing +@cindex PC operating systems, @command{gawk} on @subentry installing +@cindex operating systems @subentry PC, @command{gawk} on @subentry installing This @value{SECTION} covers installation and usage of @command{gawk} on Intel architecture machines running any version of MS-Windows. In this @value{SECTION}, the term ``Windows32'' @@ -39710,6 +39850,7 @@ See also the @file{README_d/README.pc} file in the distribution. @node PC Binary Installation @appendixsubsubsec Installing a Prepared Distribution for MS-Windows Systems +@cindex installing @command{gawk} @subentry MS-Windows The only supported binary distribution for MS-Windows systems is that provided by Eli Zaretskii's @uref{https://sourceforge.net/projects/ezwinports/, @@ -39723,7 +39864,7 @@ The file @file{README_d/README.pc} in the @command{gawk} distribution contains additional notes, and @file{pc/Makefile} contains important information on compilation options. -@cindex compiling @command{gawk} for MS-Windows +@cindex compiling @command{gawk} @subentry for MS-Windows To build @command{gawk} for Windows32, copy the files in the @file{pc} directory (@emph{except} for @file{ChangeLog}) to the directory with the rest of the @command{gawk} sources, then invoke @@ -39740,7 +39881,7 @@ type @samp{make mingw32}. @node PC Using @appendixsubsubsec Using @command{gawk} on PC Operating Systems -@cindex operating systems, PC, @command{gawk} on +@cindex operating systems @subentry PC, @command{gawk} on @cindex PC operating systems, @command{gawk} on Information in this section applies to the MinGW and @@ -39753,20 +39894,21 @@ both the @samp{|&} operator and TCP/IP networking The DJGPP environment does not support @samp{|&}. @cindex search paths -@cindex search paths, for source files -@cindex @command{gawk}, MS-Windows version of -@cindex @code{;} (semicolon), @env{AWKPATH} variable and -@cindex semicolon (@code{;}), @env{AWKPATH} variable and +@cindex search paths @subentry for source files +@cindex @command{gawk} @subentry MS-Windows version of +@cindex @code{;} (semicolon) @subentry @env{AWKPATH} variable and +@cindex semicolon (@code{;}) @subentry @env{AWKPATH} variable and @cindex @env{AWKPATH} environment variable +@cindex environment variables @subentry @env{AWKPATH} The MS-Windows version of @command{gawk} searches for program files as described in @ref{AWKPATH Variable}. However, semicolons (rather than colons) separate elements in the @env{AWKPATH} variable. If @env{AWKPATH} is not set or is empty, then the default search path is @samp{@w{.;c:/lib/awk;c:/gnu/lib/awk}}. -@cindex common extensions, @code{BINMODE} variable -@cindex extensions, common@comma{} @code{BINMODE} variable -@cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable +@cindex common extensions @subentry @code{BINMODE} variable +@cindex extensions @subentry common @subentry @code{BINMODE} variable +@cindex differences in @command{awk} and @command{gawk} @subentry @code{BINMODE} variable @cindex @code{BINMODE} variable Under MS-Windows, @command{gawk} (and many other text programs) silently @@ -39862,7 +40004,7 @@ moved into the @code{BEGIN} rule. @node Cygwin @appendixsubsubsec Using @command{gawk} In The Cygwin Environment -@cindex compiling @command{gawk} for Cygwin +@cindex compiling @command{gawk} @subentry for Cygwin @command{gawk} can be built and used ``out of the box'' under MS-Windows if you are using the @uref{http://www.cygwin.com, Cygwin environment}. @@ -39882,6 +40024,10 @@ When compared to GNU/Linux on the same system, the @samp{configure} step on Cygwin takes considerably longer. However, it does finish, and then the @samp{make} proceeds as usual. +@cindex installing @command{gawk} @subentry Cygwin +You may also install @command{gawk} using the regular Cygwin installer. +In general Cygwin supplies the latest released version. + Recent versions of Cygwin open all files in binary mode. This means that you should use @samp{RS = "\r?\n"} in order to be able to handle standard MS-Windows text files with carriage-return plus @@ -39909,8 +40055,8 @@ translation of @code{"\r\n"}, because it won't. @c now rankin@pactechdata.com @c now r.pat.rankin@gmail.com -@cindex @command{gawk}, VMS version of -@cindex installation, VMS +@cindex @command{gawk} @subentry VMS version of +@cindex installing @command{gawk} @subentry VMS This @value{SUBSECTION} describes how to compile and install @command{gawk} under VMS. The older designation ``VMS'' is used throughout to refer to OpenVMS. @@ -39926,7 +40072,7 @@ The older designation ``VMS'' is used throughout to refer to OpenVMS. @node VMS Compilation @appendixsubsubsec Compiling @command{gawk} on VMS -@cindex compiling @command{gawk} for VMS +@cindex compiling @command{gawk} @subentry for VMS To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that issues all the necessary @code{CC} and @code{LINK} commands. There is @@ -40124,7 +40270,7 @@ flag is required to force Unix-style parsing rather than @code{DCL} parsing. If any other dash-type options (or multiple parameters such as @value{DF}s to process) are present, there is no ambiguity and @option{--} can be omitted. -@cindex exit status, of @command{gawk}, on VMS +@cindex exit status, of @command{gawk} @subentry on VMS The @code{exit} value is a Unix-style value and is encoded into a VMS exit status value when the program exits. @@ -40150,7 +40296,7 @@ Older versions of @command{gawk} for VMS treated a Unix exit code 0 as 1, a failure as 2, a fatal error as 4, and passed all the other numbers through. This violated the VMS exit status coding requirements. -@cindex floating-point, VAX/VMS +@cindex floating-point @subentry numbers @subentry VAX/VMS VAX/VMS floating point uses unbiased rounding. @xref{Round Function}. VMS reports time values in GMT unless one of the @code{SYS$TIMEZONE_RULE} @@ -40160,7 +40306,7 @@ or @code{TZ} logical names is set. Older versions of VMS, such as VAX/VMS @c @cindex directory search @c @cindex path, search @cindex search paths -@cindex search paths, for source files +@cindex search paths @subentry for source files The default search path, when looking for @command{awk} program files specified by the @option{-f} option, is @code{"SYS$DISK:[],AWK_LIBRARY:"}. The logical name @env{AWKPATH} can be used to override this default. The format @@ -40168,7 +40314,7 @@ of @env{AWKPATH} is a comma-separated list of directory specifications. When defining it, the value should be quoted so that it retains a single translation and not a multitranslation @code{RMS} searchlist. -@cindex redirection on VMS +@cindex redirection @subentry on VMS This restriction also applies to running @command{gawk} under GNV, as redirection is always to a DCL command. @@ -40289,7 +40435,7 @@ recommend compiling and using the current version. @c the radio show, not the book. :-) @cindex debugging @command{gawk}, bug reports -@cindex troubleshooting, @command{gawk}, bug reports +@cindex troubleshooting @subentry @command{gawk} @subentry bug reports If you have problems with @command{gawk} or think that you have found a bug, report it to the developers; we cannot promise to do anything, but we might well want to fix it. @@ -40423,7 +40569,7 @@ The people maintaining the various @command{gawk} ports are: @cindex Malmberg, John @cindex Pitts, Dave @cindex G., Daniel Richard -@cindex Robbins, Arnold +@cindex Robbins @subentry Arnold @cindex Zaretskii, Eli @cindex Guerrero, Juan Manuel @multitable {MS-Windows with MinGW} {123456789012345678901234567890123456789001234567890} @@ -40448,7 +40594,7 @@ report to the @EMAIL{bug-gawk@@gnu.org,bug-gawk at gnu dot org} email list as we @node Other Versions @appendixsec Other Freely Available @command{awk} Implementations -@cindex @command{awk}, implementations +@cindex @command{awk} @subentry implementations @ignore From: emory!amc.com!brennan (Michael Brennan) Subject: C++ comments in awk programs @@ -40477,9 +40623,9 @@ This @value{SECTION} briefly describes where to get them: @table @asis @cindex Kernighan, Brian -@cindex source code, Brian Kernighan's @command{awk} -@cindex @command{awk}, versions of, See Also Brian Kernighan's @command{awk} -@cindex Brian Kernighan's @command{awk}, source code +@cindex source code @subentry Brian Kernighan's @command{awk} +@cindex @command{awk} @subentry versions of @seealso{Brian Kernighan's @command{awk}} +@cindex Brian Kernighan's @command{awk} @subentry source code @item Unix @command{awk} Brian Kernighan, one of the original designers of Unix @command{awk}, has made his implementation of @@ -40513,7 +40659,7 @@ available at @uref{git://github.com/danfuzz/one-true-awk}. @cindex Brennan, Michael @cindex @command{mawk} utility -@cindex source code, @command{mawk} +@cindex source code @subentry @command{mawk} @item @command{mawk} Michael Brennan wrote an independent implementation of @command{awk}, called @command{mawk}. It is available under the @@ -40550,7 +40696,7 @@ His development snapshots are available via Git from the project's @cindex Sumner, Andrew @cindex @command{awka} compiler for @command{awk} -@cindex source code, @command{awka} +@cindex source code @subentry @command{awka} @item @command{awka} Written by Andrew Sumner, @command{awka} translates @command{awk} programs into C, compiles them, @@ -40570,7 +40716,7 @@ since approximately 2001. @cindex Beebe, Nelson H.F.@: @cindex @command{pawk} (profiling version of Brian Kernighan's @command{awk}) -@cindex source code, @command{pawk} +@cindex source code @subentry @command{pawk} (profiling version of Brian Kernighan's @command{awk}) @item @command{pawk} Nelson H.F.@: Beebe at the University of Utah has modified BWK @command{awk} to provide timing and profiling information. @@ -40584,7 +40730,7 @@ or @item BusyBox @command{awk} @cindex BusyBox Awk -@cindex source code, BusyBox Awk +@cindex source code @subentry BusyBox Awk BusyBox is a GPL-licensed program providing small versions of many applications within a single executable. It is aimed at embedded systems. It includes a full implementation of POSIX @command{awk}. When building @@ -40594,7 +40740,7 @@ information, see the @uref{https://busybox.net, project's home page}. @cindex OpenSolaris @cindex Solaris, POSIX-compliant @command{awk} -@cindex source code, Solaris @command{awk} +@cindex source code @subentry Solaris @command{awk} @item The OpenSolaris POSIX @command{awk} The versions of @command{awk} in @file{/usr/xpg4/bin} and @file{/usr/xpg6/bin} on Solaris are more or less POSIX-compliant. @@ -40604,9 +40750,8 @@ with 1--2 hours of work. Making it more generally portable (using GNU Autoconf and/or Automake) would take more work, and this has not been done, at least to our knowledge. -@cindex Illumos @cindex Illumos, POSIX-compliant @command{awk} -@cindex source code, Illumos @command{awk} +@cindex source code @subentry Illumos @command{awk} The source code used to be available from the OpenSolaris website. However, that project was ended and the website shut down. Fortunately, the @uref{https://wiki.illumos.org/display/illumos/illumos+Home, Illumos project} @@ -40615,8 +40760,8 @@ makes this implementation available. You can view the files one at a time from @cindex @command{goawk} @cindex Go implementation of @command{awk} -@cindex source code, @command{goawk} -@cindex programming languages, Go +@cindex source code @subentry @command{goawk} +@cindex programming languages @subentry Go @item @command{goawk} This is an @command{awk} interpreter written in the @uref{https://golang.org/, Go programming language}. @@ -40628,7 +40773,7 @@ describing the implementation. @cindex @command{jawk} @cindex Java implementation of @command{awk} -@cindex source code, @command{jawk} +@cindex source code @subentry @command{jawk} @item @command{jawk} This is an interpreter for @command{awk} written in Java. It claims to be a full interpreter, although because it uses Java facilities @@ -40638,12 +40783,12 @@ from POSIX @command{awk}. More information is available on the @item Libmawk @cindex libmawk -@cindex source code, libmawk +@cindex source code @subentry libmawk This is an embeddable @command{awk} interpreter derived from @command{mawk}. For more information, see @uref{http://repo.hu/projects/libmawk/}. -@cindex source code, embeddable @command{awk} interpreter +@cindex source code @subentry embeddable @command{awk} interpreter @cindex Neacsu, Mircea @item Mircea Neacsu's Embeddable @command{awk} Mircea Neacsu has created an embeddable @command{awk} @@ -40651,7 +40796,7 @@ interpreter, based on BWK awk. It's available at @uref{https://github.com/neacsum/awk}. @item @code{pawk} -@cindex source code, @command{pawk} (Python version) +@cindex source code @subentry @command{pawk} (Python version) @cindex @code{pawk}, @command{awk}-like facilities for Python This is a Python module that claims to bring @command{awk}-like features to Python. See @uref{https://github.com/alecthomas/pawk} @@ -40660,13 +40805,13 @@ modified version of BWK @command{awk}, described earlier.) @item @w{QSE @command{awk}} @cindex QSE @command{awk} -@cindex source code, QSE @command{awk} +@cindex source code @subentry QSE @command{awk} This is an embeddable @command{awk} interpreter. For more information, see @uref{https://code.google.com/p/qse/}. @c and @uref{http://awk.info/?tools/qse}. @item @command{QTawk} @cindex QuikTrim Awk -@cindex source code, QuikTrim Awk +@cindex source code @subentry QuikTrim Awk This is an independent implementation of @command{awk} distributed under the GPL. It has a large number of extensions over standard @command{awk} and may not be 100% syntactically compatible with it. @@ -40730,7 +40875,7 @@ implementations. Many are POSIX-compliant; others are less so. @ifclear FOR_PRINT @node Notes @appendix Implementation Notes -@cindex @command{gawk}, implementation issues +@cindex @command{gawk} @subentry implementation issues @cindex implementation issues, @command{gawk} This appendix contains information mainly of interest to implementers and @@ -40749,10 +40894,10 @@ maintainers of @command{gawk}. Everything in it applies specifically to @node Compatibility Mode @appendixsec Downward Compatibility and Debugging -@cindex @command{gawk}, implementation issues, downward compatibility -@cindex @command{gawk}, implementation issues, debugging -@cindex troubleshooting, @command{gawk} -@cindex implementation issues@comma{} @command{gawk}, debugging +@cindex @command{gawk} @subentry implementation issues @subentry downward compatibility +@cindex @command{gawk} @subentry implementation issues @subentry debugging +@cindex troubleshooting @subentry @command{gawk} +@cindex implementation issues, @command{gawk} @subentry debugging @xref{POSIX/GNU}, for a summary of the GNU extensions to the @command{awk} language and program. @@ -40834,9 +40979,9 @@ that has a Git plug-in for working with Git repositories. @node Adding Code @appendixsubsec Adding New Features -@cindex adding, features to @command{gawk} -@cindex features, adding to @command{gawk} -@cindex @command{gawk}, features, adding +@cindex adding @subentry features to @command{gawk} +@cindex features @subentry adding to @command{gawk} +@cindex @command{gawk} @subentry features @subentry adding You are free to add any new features you like to @command{gawk}. However, if you want your changes to be incorporated into the @command{gawk} distribution, there are several steps that you need to take in order to @@ -40882,7 +41027,7 @@ the GNU Project's @uref{https://www.gnu.org/prep/standards/, website}. Texinfo, Info, and DVI versions are also available.) -@cindex @command{gawk}, coding style in +@cindex @command{gawk} @subentry coding style in @item Use the @command{gawk} coding style. The C code for @command{gawk} follows the instructions in the @@ -41008,8 +41153,8 @@ probably will not. @node New Ports @appendixsubsec Porting @command{gawk} to a New Operating System -@cindex portability, @command{gawk} -@cindex operating systems, porting @command{gawk} to +@cindex portability @subentry @command{gawk} +@cindex operating systems @subentry porting @command{gawk} to @cindex porting @command{gawk} If you want to port @command{gawk} to a new operating system, there are @@ -41358,7 +41503,7 @@ Larry @end ignore @cindex Perl @cindex Wall, Larry -@cindex Robbins, Arnold +@cindex Robbins @subentry Arnold @quotation @i{AWK is a language similar to PERL, only considerably more elegant.} @author Arnold Robbins @@ -41666,8 +41811,8 @@ removed from the code base with the 4.2 release. @node Basic Concepts @appendix Basic Programming Concepts -@cindex programming, concepts -@cindex programming, concepts +@cindex programming @subentry concepts +@cindex programming @subentry concepts This @value{APPENDIX} attempts to define some of the basic concepts and terms that are used throughout the rest of this @value{DOCUMENT}. @@ -41723,7 +41868,7 @@ or it may be @dfn{interpreted}. In the latter case, a machine-executable program such as @command{awk} reads your program, and then uses the instructions in your program to process the data. -@cindex programming, basic steps +@cindex programming @subentry basic steps When you write a program, it usually consists of the following, very basic set of steps, @ifnotdocbook @@ -41808,7 +41953,7 @@ and even more often, as ``I/O'' for short. (You will also see ``input'' and ``output'' used as verbs.) @cindex data-driven languages -@cindex languages@comma{} data-driven +@cindex languages, data-driven @command{awk} manages the reading of data for you, as well as the breaking it up into records and fields. Your program's job is to tell @command{awk} what to do with the data. You do this by describing @@ -41831,8 +41976,8 @@ and the fields of the record. You may also group multiple associated values under one name, as an array. -@cindex values, numeric -@cindex values, string +@cindex values @subentry numeric +@cindex values @subentry string @cindex scalar values Data, particularly in @command{awk}, consists of either numeric values, such as 42 or 3.1415927, or string values. @@ -41916,7 +42061,7 @@ rule's action. Actions are always enclosed in braces. (@xref{Action Overview}.) @cindex Ada programming language -@cindex programming languages, Ada +@cindex programming languages @subentry Ada @item Ada A programming language originally defined by the U.S.@: Department of Defense for embedded programming. It was designed to enforce good @@ -41925,6 +42070,7 @@ Software Engineering practices. @cindex Spencer, Henry @cindex @command{sed} utility @cindex amazing @command{awk} assembler (@command{aaa}) +@cindex @command{aaa} (amazing @command{awk} assembler) program @item Amazing @command{awk} Assembler Henry Spencer at the University of Toronto wrote a retargetable assembler completely as @command{sed} and @command{awk} scripts. It is thousands @@ -42151,8 +42297,8 @@ See ``Bracket Expression.'' See ``Bracket Expression.'' @cindex ASCII -@cindex ISO 8859-1 -@cindex ISO Latin-1 +@cindex ISO @subentry ISO 8859-1 character set +@cindex ISO @subentry ISO Latin-1 character set @cindex character sets (machine character encodings) @cindex Unicode @item Character Set @@ -42340,6 +42486,7 @@ The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC. See also ``GMT'' and ``UTC.'' @item Escape Sequences +@cindex ASCII A special sequence of characters used for describing nonprinting characters, such as @samp{\n} for newline or @samp{\033} for the ASCII ESC (Escape) character. (@xref{Escape Sequences}.) @@ -42423,8 +42570,6 @@ The @command{gawk} extension API provides constructor functions The GNU implementation of @command{awk}. @cindex GPL (General Public License) -@cindex General Public License (GPL) -@cindex GNU General Public License @item General Public License This document describes the terms under which @command{gawk} and its source code may be distributed. (@xref{Copying}.) @@ -42511,7 +42656,7 @@ information about the name of the organization and its language-independent three-letter acronym. @cindex Java programming language -@cindex programming languages, Java +@cindex programming languages @subentry Java @item Java A modern programming language originally developed by Sun Microsystems (now Oracle) supporting Object-Oriented programming. Although usually @@ -42818,7 +42963,6 @@ and POSIX systems. Used for the @command{gawk} functions @code{mktime()}, @code{strftime()}, and @code{systime()}. See also ``Epoch,'' ``GMT,'' and ``UTC.'' -@cindex Linux @cindex GNU/Linux @cindex Unix @cindex BSD-based operating systems |