aboutsummaryrefslogtreecommitdiffstats
path: root/gawk.info-3
diff options
context:
space:
mode:
Diffstat (limited to 'gawk.info-3')
-rw-r--r--gawk.info-31288
1 files changed, 1288 insertions, 0 deletions
diff --git a/gawk.info-3 b/gawk.info-3
new file mode 100644
index 00000000..5c87ac3a
--- /dev/null
+++ b/gawk.info-3
@@ -0,0 +1,1288 @@
+This is Info file gawk.info, produced by Makeinfo-1.54 from the input
+file gawk.texi.
+
+ This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+ This is Edition 0.15 of `The GAWK Manual',
+for the 2.15 version of the GNU implementation
+of AWK.
+
+ Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
+
+ Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+
+ Permission is granted to copy and distribute modified versions of
+this manual under the conditions for verbatim copying, provided that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+ Permission is granted to copy and distribute translations of this
+manual into another language, under the above conditions for modified
+versions, except that this permission notice may be stated in a
+translation approved by the Foundation.
+
+
+File: gawk.info, Node: Output Separators, Next: OFMT, Prev: Print Examples, Up: Printing
+
+Output Separators
+=================
+
+ As mentioned previously, a `print' statement contains a list of
+items, separated by commas. In the output, the items are normally
+separated by single spaces. But they do not have to be spaces; a
+single space is only the default. You can specify any string of
+characters to use as the "output field separator" by setting the
+built-in variable `OFS'. The initial value of this variable is the
+string `" "', that is, just a single space.
+
+ The output from an entire `print' statement is called an "output
+record". Each `print' statement outputs one output record and then
+outputs a string called the "output record separator". The built-in
+variable `ORS' specifies this string. The initial value of the
+variable is the string `"\n"' containing a newline character; thus,
+normally each `print' statement makes a separate line.
+
+ You can change how output fields and records are separated by
+assigning new values to the variables `OFS' and/or `ORS'. The usual
+place to do this is in the `BEGIN' rule (*note `BEGIN' and `END'
+Special Patterns: BEGIN/END.), so that it happens before any input is
+processed. You may also do this with assignments on the command line,
+before the names of your input files.
+
+ The following example prints the first and second fields of each
+input record separated by a semicolon, with a blank line added after
+each line:
+
+ awk 'BEGIN { OFS = ";"; ORS = "\n\n" }
+ { print $1, $2 }' BBS-list
+
+ If the value of `ORS' does not contain a newline, all your output
+will be run together on a single line, unless you output newlines some
+other way.
+
+
+File: gawk.info, Node: OFMT, Next: Printf, Prev: Output Separators, Up: Printing
+
+Controlling Numeric Output with `print'
+=======================================
+
+ When you use the `print' statement to print numeric values, `awk'
+internally converts the number to a string of characters, and prints
+that string. `awk' uses the `sprintf' function to do this conversion.
+For now, it suffices to say that the `sprintf' function accepts a
+"format specification" that tells it how to format numbers (or
+strings), and that there are a number of different ways that numbers
+can be formatted. The different format specifications are discussed
+more fully in *Note Using `printf' Statements for Fancier Printing:
+Printf.
+
+ The built-in variable `OFMT' contains the default format
+specification that `print' uses with `sprintf' when it wants to convert
+a number to a string for printing. By supplying different format
+specifications as the value of `OFMT', you can change how `print' will
+print your numbers. As a brief example:
+
+ awk 'BEGIN { OFMT = "%d" # print numbers as integers
+ print 17.23 }'
+
+will print `17'.
+
+
+File: gawk.info, Node: Printf, Next: Redirection, Prev: OFMT, Up: Printing
+
+Using `printf' Statements for Fancier Printing
+==============================================
+
+ If you want more precise control over the output format than `print'
+gives you, use `printf'. With `printf' you can specify the width to
+use for each item, and you can specify various stylistic choices for
+numbers (such as what radix to use, whether to print an exponent,
+whether to print a sign, and how many digits to print after the decimal
+point). You do this by specifying a string, called the "format
+string", which controls how and where to print the other arguments.
+
+* Menu:
+
+* Basic Printf:: Syntax of the `printf' statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+
+
+File: gawk.info, Node: Basic Printf, Next: Control Letters, Prev: Printf, Up: Printf
+
+Introduction to the `printf' Statement
+--------------------------------------
+
+ The `printf' statement looks like this:
+
+ printf FORMAT, ITEM1, ITEM2, ...
+
+The entire list of arguments may optionally be enclosed in parentheses.
+The parentheses are necessary if any of the item expressions uses a
+relational operator; otherwise it could be confused with a redirection
+(*note Redirecting Output of `print' and `printf': Redirection.). The
+relational operators are `==', `!=', `<', `>', `>=', `<=', `~' and `!~'
+(*note Comparison Expressions: Comparison Ops.).
+
+ The difference between `printf' and `print' is the argument FORMAT.
+This is an expression whose value is taken as a string; it specifies
+how to output each of the other arguments. It is called the "format
+string".
+
+ The format string is the same as in the ANSI C library function
+`printf'. Most of FORMAT is text to be output verbatim. Scattered
+among this text are "format specifiers", one per item. Each format
+specifier says to output the next item at that place in the format.
+
+ The `printf' statement does not automatically append a newline to its
+output. It outputs only what the format specifies. So if you want a
+newline, you must include one in the format. The output separator
+variables `OFS' and `ORS' have no effect on `printf' statements.
+
+
+File: gawk.info, Node: Control Letters, Next: Format Modifiers, Prev: Basic Printf, Up: Printf
+
+Format-Control Letters
+----------------------
+
+ A format specifier starts with the character `%' and ends with a
+"format-control letter"; it tells the `printf' statement how to output
+one item. (If you actually want to output a `%', write `%%'.) The
+format-control letter specifies what kind of value to print. The rest
+of the format specifier is made up of optional "modifiers" which are
+parameters such as the field width to use.
+
+ Here is a list of the format-control letters:
+
+`c'
+ This prints a number as an ASCII character. Thus, `printf "%c",
+ 65' outputs the letter `A'. The output for a string value is the
+ first character of the string.
+
+`d'
+ This prints a decimal integer.
+
+`i'
+ This also prints a decimal integer.
+
+`e'
+ This prints a number in scientific (exponential) notation. For
+ example,
+
+ printf "%4.3e", 1950
+
+ prints `1.950e+03', with a total of four significant figures of
+ which three follow the decimal point. The `4.3' are "modifiers",
+ discussed below.
+
+`f'
+ This prints a number in floating point notation.
+
+`g'
+ This prints a number in either scientific notation or floating
+ point notation, whichever uses fewer characters.
+
+`o'
+ This prints an unsigned octal integer.
+
+`s'
+ This prints a string.
+
+`x'
+ This prints an unsigned hexadecimal integer.
+
+`X'
+ This prints an unsigned hexadecimal integer. However, for the
+ values 10 through 15, it uses the letters `A' through `F' instead
+ of `a' through `f'.
+
+`%'
+ This isn't really a format-control letter, but it does have a
+ meaning when used after a `%': the sequence `%%' outputs one `%'.
+ It does not consume an argument.
+
+
+File: gawk.info, Node: Format Modifiers, Next: Printf Examples, Prev: Control Letters, Up: Printf
+
+Modifiers for `printf' Formats
+------------------------------
+
+ A format specification can also include "modifiers" that can control
+how much of the item's value is printed and how much space it gets. The
+modifiers come between the `%' and the format-control letter. Here are
+the possible modifiers, in the order in which they may appear:
+
+`-'
+ The minus sign, used before the width modifier, says to
+ left-justify the argument within its specified width. Normally
+ the argument is printed right-justified in the specified width.
+ Thus,
+
+ printf "%-4s", "foo"
+
+ prints `foo '.
+
+`WIDTH'
+ This is a number representing the desired width of a field.
+ Inserting any number between the `%' sign and the format control
+ character forces the field to be expanded to this width. The
+ default way to do this is to pad with spaces on the left. For
+ example,
+
+ printf "%4s", "foo"
+
+ prints ` foo'.
+
+ The value of WIDTH is a minimum width, not a maximum. If the item
+ value requires more than WIDTH characters, it can be as wide as
+ necessary. Thus,
+
+ printf "%4s", "foobar"
+
+ prints `foobar'.
+
+ Preceding the WIDTH with a minus sign causes the output to be
+ padded with spaces on the right, instead of on the left.
+
+`.PREC'
+ This is a number that specifies the precision to use when printing.
+ This specifies the number of digits you want printed to the right
+ of the decimal point. For a string, it specifies the maximum
+ number of characters from the string that should be printed.
+
+ The C library `printf''s dynamic WIDTH and PREC capability (for
+example, `"%*.*s"') is supported. Instead of supplying explicit WIDTH
+and/or PREC values in the format string, you pass them in the argument
+list. For example:
+
+ w = 5
+ p = 3
+ s = "abcdefg"
+ printf "<%*.*s>\n", w, p, s
+
+is exactly equivalent to
+
+ s = "abcdefg"
+ printf "<%5.3s>\n", s
+
+Both programs output `<**abc>'. (We have used the bullet symbol "*" to
+represent a space, to clearly show you that there are two spaces in the
+output.)
+
+ Earlier versions of `awk' did not support this capability. You may
+simulate it by using concatenation to build up the format string, like
+so:
+
+ w = 5
+ p = 3
+ s = "abcdefg"
+ printf "<%" w "." p "s>\n", s
+
+This is not particularly easy to read, however.
+
+
+File: gawk.info, Node: Printf Examples, Prev: Format Modifiers, Up: Printf
+
+Examples of Using `printf'
+--------------------------
+
+ Here is how to use `printf' to make an aligned table:
+
+ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
+
+prints the names of bulletin boards (`$1') of the file `BBS-list' as a
+string of 10 characters, left justified. It also prints the phone
+numbers (`$2') afterward on the line. This produces an aligned
+two-column table of names and phone numbers:
+
+ aardvark 555-5553
+ alpo-net 555-3412
+ barfly 555-7685
+ bites 555-1675
+ camelot 555-0542
+ core 555-2912
+ fooey 555-1234
+ foot 555-6699
+ macfoo 555-6480
+ sdace 555-3430
+ sabafoo 555-2127
+
+ Did you notice that we did not specify that the phone numbers be
+printed as numbers? They had to be printed as strings because the
+numbers are separated by a dash. This dash would be interpreted as a
+minus sign if we had tried to print the phone numbers as numbers. This
+would have led to some pretty confusing results.
+
+ We did not specify a width for the phone numbers because they are the
+last things on their lines. We don't need to put spaces after them.
+
+ We could make our table look even nicer by adding headings to the
+tops of the columns. To do this, use the `BEGIN' pattern (*note
+`BEGIN' and `END' Special Patterns: BEGIN/END.) to force the header to
+be printed only once, at the beginning of the `awk' program:
+
+ awk 'BEGIN { print "Name Number"
+ print "---- ------" }
+ { printf "%-10s %s\n", $1, $2 }' BBS-list
+
+ Did you notice that we mixed `print' and `printf' statements in the
+above example? We could have used just `printf' statements to get the
+same results:
+
+ awk 'BEGIN { printf "%-10s %s\n", "Name", "Number"
+ printf "%-10s %s\n", "----", "------" }
+ { printf "%-10s %s\n", $1, $2 }' BBS-list
+
+By outputting each column heading with the same format specification
+used for the elements of the column, we have made sure that the headings
+are aligned just like the columns.
+
+ The fact that the same format specification is used three times can
+be emphasized by storing it in a variable, like this:
+
+ awk 'BEGIN { format = "%-10s %s\n"
+ printf format, "Name", "Number"
+ printf format, "----", "------" }
+ { printf format, $1, $2 }' BBS-list
+
+ See if you can use the `printf' statement to line up the headings and
+table data for our `inventory-shipped' example covered earlier in the
+section on the `print' statement (*note The `print' Statement: Print.).
+
+
+File: gawk.info, Node: Redirection, Next: Special Files, Prev: Printf, Up: Printing
+
+Redirecting Output of `print' and `printf'
+==========================================
+
+ So far we have been dealing only with output that prints to the
+standard output, usually your terminal. Both `print' and `printf' can
+also send their output to other places. This is called "redirection".
+
+ A redirection appears after the `print' or `printf' statement.
+Redirections in `awk' are written just like redirections in shell
+commands, except that they are written inside the `awk' program.
+
+* Menu:
+
+* File/Pipe Redirection:: Redirecting Output to Files and Pipes.
+* Close Output:: How to close output files and pipes.
+
+
+File: gawk.info, Node: File/Pipe Redirection, Next: Close Output, Prev: Redirection, Up: Redirection
+
+Redirecting Output to Files and Pipes
+-------------------------------------
+
+ Here are the three forms of output redirection. They are all shown
+for the `print' statement, but they work identically for `printf' also.
+
+`print ITEMS > OUTPUT-FILE'
+ This type of redirection prints the items onto the output file
+ OUTPUT-FILE. The file name OUTPUT-FILE can be any expression.
+ Its value is changed to a string and then used as a file name
+ (*note Expressions as Action Statements: Expressions.).
+
+ When this type of redirection is used, the OUTPUT-FILE is erased
+ before the first output is written to it. Subsequent writes do not
+ erase OUTPUT-FILE, but append to it. If OUTPUT-FILE does not
+ exist, then it is created.
+
+ For example, here is how one `awk' program can write a list of BBS
+ names to a file `name-list' and a list of phone numbers to a file
+ `phone-list'. Each output file contains one name or number per
+ line.
+
+ awk '{ print $2 > "phone-list"
+ print $1 > "name-list" }' BBS-list
+
+`print ITEMS >> OUTPUT-FILE'
+ This type of redirection prints the items onto the output file
+ OUTPUT-FILE. The difference between this and the single-`>'
+ redirection is that the old contents (if any) of OUTPUT-FILE are
+ not erased. Instead, the `awk' output is appended to the file.
+
+`print ITEMS | COMMAND'
+ It is also possible to send output through a "pipe" instead of
+ into a file. This type of redirection opens a pipe to COMMAND
+ and writes the values of ITEMS through this pipe, to another
+ process created to execute COMMAND.
+
+ The redirection argument COMMAND is actually an `awk' expression.
+ Its value is converted to a string, whose contents give the shell
+ command to be run.
+
+ For example, this produces two files, one unsorted list of BBS
+ names and one list sorted in reverse alphabetical order:
+
+ awk '{ print $1 > "names.unsorted"
+ print $1 | "sort -r > names.sorted" }' BBS-list
+
+ Here the unsorted list is written with an ordinary redirection
+ while the sorted list is written by piping through the `sort'
+ utility.
+
+ Here is an example that uses redirection to mail a message to a
+ mailing list `bug-system'. This might be useful when trouble is
+ encountered in an `awk' script run periodically for system
+ maintenance.
+
+ report = "mail bug-system"
+ print "Awk script failed:", $0 | report
+ print "at record number", FNR, "of", FILENAME | report
+ close(report)
+
+ We call the `close' function here because it's a good idea to close
+ the pipe as soon as all the intended output has been sent to it.
+ *Note Closing Output Files and Pipes: Close Output, for more
+ information on this. This example also illustrates the use of a
+ variable to represent a FILE or COMMAND: it is not necessary to
+ always use a string constant. Using a variable is generally a
+ good idea, since `awk' requires you to spell the string value
+ identically every time.
+
+ Redirecting output using `>', `>>', or `|' asks the system to open a
+file or pipe only if the particular FILE or COMMAND you've specified
+has not already been written to by your program, or if it has been
+closed since it was last written to.
+
+
+File: gawk.info, Node: Close Output, Prev: File/Pipe Redirection, Up: Redirection
+
+Closing Output Files and Pipes
+------------------------------
+
+ When a file or pipe is opened, the file name or command associated
+with it is remembered by `awk' and subsequent writes to the same file or
+command are appended to the previous writes. The file or pipe stays
+open until `awk' exits. This is usually convenient.
+
+ Sometimes there is a reason to close an output file or pipe earlier
+than that. To do this, use the `close' function, as follows:
+
+ close(FILENAME)
+
+or
+
+ close(COMMAND)
+
+ The argument FILENAME or COMMAND can be any expression. Its value
+must exactly equal the string used to open the file or pipe to begin
+with--for example, if you open a pipe with this:
+
+ print $1 | "sort -r > names.sorted"
+
+then you must close it with this:
+
+ close("sort -r > names.sorted")
+
+ Here are some reasons why you might need to close an output file:
+
+ * To write a file and read it back later on in the same `awk'
+ program. Close the file when you are finished writing it; then
+ you can start reading it with `getline' (*note Explicit Input with
+ `getline': Getline.).
+
+ * To write numerous files, successively, in the same `awk' program.
+ If you don't close the files, eventually you may exceed a system
+ limit on the number of open files in one process. So close each
+ one when you are finished writing it.
+
+ * To make a command finish. When you redirect output through a pipe,
+ the command reading the pipe normally continues to try to read
+ input as long as the pipe is open. Often this means the command
+ cannot really do its work until the pipe is closed. For example,
+ if you redirect output to the `mail' program, the message is not
+ actually sent until the pipe is closed.
+
+ * To run the same program a second time, with the same arguments.
+ This is not the same thing as giving more input to the first run!
+
+ For example, suppose you pipe output to the `mail' program. If you
+ output several lines redirected to this pipe without closing it,
+ they make a single message of several lines. By contrast, if you
+ close the pipe after each line of output, then each line makes a
+ separate message.
+
+ `close' returns a value of zero if the close succeeded. Otherwise,
+the value will be non-zero. In this case, `gawk' sets the variable
+`ERRNO' to a string describing the error that occurred.
+
+
+File: gawk.info, Node: Special Files, Prev: Redirection, Up: Printing
+
+Standard I/O Streams
+====================
+
+ Running programs conventionally have three input and output streams
+already available to them for reading and writing. These are known as
+the "standard input", "standard output", and "standard error output".
+These streams are, by default, terminal input and output, but they are
+often redirected with the shell, via the `<', `<<', `>', `>>', `>&' and
+`|' operators. Standard error is used only for writing error messages;
+the reason we have two separate streams, standard output and standard
+error, is so that they can be redirected separately.
+
+ In other implementations of `awk', the only way to write an error
+message to standard error in an `awk' program is as follows:
+
+ print "Serious error detected!\n" | "cat 1>&2"
+
+This works by opening a pipeline to a shell command which can access the
+standard error stream which it inherits from the `awk' process. This
+is far from elegant, and is also inefficient, since it requires a
+separate process. So people writing `awk' programs have often
+neglected to do this. Instead, they have sent the error messages to the
+terminal, like this:
+
+ NF != 4 {
+ printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty"
+ }
+
+This has the same effect most of the time, but not always: although the
+standard error stream is usually the terminal, it can be redirected, and
+when that happens, writing to the terminal is not correct. In fact, if
+`awk' is run from a background job, it may not have a terminal at all.
+Then opening `/dev/tty' will fail.
+
+ `gawk' provides special file names for accessing the three standard
+streams. When you redirect input or output in `gawk', if the file name
+matches one of these special names, then `gawk' directly uses the
+stream it stands for.
+
+`/dev/stdin'
+ The standard input (file descriptor 0).
+
+`/dev/stdout'
+ The standard output (file descriptor 1).
+
+`/dev/stderr'
+ The standard error output (file descriptor 2).
+
+`/dev/fd/N'
+ The file associated with file descriptor N. Such a file must have
+ been opened by the program initiating the `awk' execution
+ (typically the shell). Unless you take special pains, only
+ descriptors 0, 1 and 2 are available.
+
+ The file names `/dev/stdin', `/dev/stdout', and `/dev/stderr' are
+aliases for `/dev/fd/0', `/dev/fd/1', and `/dev/fd/2', respectively,
+but they are more self-explanatory.
+
+ The proper way to write an error message in a `gawk' program is to
+use `/dev/stderr', like this:
+
+ NF != 4 {
+ printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr"
+ }
+
+ `gawk' also provides special file names that give access to
+information about the running `gawk' process. Each of these "files"
+provides a single record of information. To read them more than once,
+you must first close them with the `close' function (*note Closing
+Input Files and Pipes: Close Input.). The filenames are:
+
+`/dev/pid'
+ Reading this file returns the process ID of the current process,
+ in decimal, terminated with a newline.
+
+`/dev/ppid'
+ Reading this file returns the parent process ID of the current
+ process, in decimal, terminated with a newline.
+
+`/dev/pgrpid'
+ Reading this file returns the process group ID of the current
+ process, in decimal, terminated with a newline.
+
+`/dev/user'
+ Reading this file returns a single record terminated with a
+ newline. The fields are separated with blanks. The fields
+ represent the following information:
+
+ `$1'
+ The value of the `getuid' system call.
+
+ `$2'
+ The value of the `geteuid' system call.
+
+ `$3'
+ The value of the `getgid' system call.
+
+ `$4'
+ The value of the `getegid' system call.
+
+ If there are any additional fields, they are the group IDs
+ returned by `getgroups' system call. (Multiple groups may not be
+ supported on all systems.)
+
+ These special file names may be used on the command line as data
+files, as well as for I/O redirections within an `awk' program. They
+may not be used as source files with the `-f' option.
+
+ Recognition of these special file names is disabled if `gawk' is in
+compatibility mode (*note Invoking `awk': Command Line.).
+
+ *Caution*: Unless your system actually has a `/dev/fd' directory
+ (or any of the other above listed special files), the
+ interpretation of these file names is done by `gawk' itself. For
+ example, using `/dev/fd/4' for output will actually write on file
+ descriptor 4, and not on a new file descriptor that was `dup''ed
+ from file descriptor 4. Most of the time this does not matter;
+ however, it is important to *not* close any of the files related
+ to file descriptors 0, 1, and 2. If you do close one of these
+ files, unpredictable behavior will result.
+
+
+File: gawk.info, Node: One-liners, Next: Patterns, Prev: Printing, Up: Top
+
+Useful "One-liners"
+*******************
+
+ Useful `awk' programs are often short, just a line or two. Here is a
+collection of useful, short programs to get you started. Some of these
+programs contain constructs that haven't been covered yet. The
+description of the program will give you a good idea of what is going
+on, but please read the rest of the manual to become an `awk' expert!
+
+ Since you are reading this in Info, each line of the example code is
+enclosed in quotes, to represent text that you would type literally.
+The examples themselves represent shell commands that use single quotes
+to keep the shell from interpreting the contents of the program. When
+reading the examples, focus on the text between the open and close
+quotes.
+
+`awk '{ if (NF > max) max = NF }'
+` END { print max }''
+ This program prints the maximum number of fields on any input line.
+
+`awk 'length($0) > 80''
+ This program prints every line longer than 80 characters. The sole
+ rule has a relational expression as its pattern, and has no action
+ (so the default action, printing the record, is used).
+
+`awk 'NF > 0''
+ This program prints every line that has at least one field. This
+ is an easy way to delete blank lines from a file (or rather, to
+ create a new file similar to the old file but from which the blank
+ lines have been deleted).
+
+`awk '{ if (NF > 0) print }''
+ This program also prints every line that has at least one field.
+ Here we allow the rule to match every line, then decide in the
+ action whether to print.
+
+`awk 'BEGIN { for (i = 1; i <= 7; i++)'
+` print int(101 * rand()) }''
+ This program prints 7 random numbers from 0 to 100, inclusive.
+
+`ls -l FILES | awk '{ x += $4 } ; END { print "total bytes: " x }''
+ This program prints the total number of bytes used by FILES.
+
+`expand FILE | awk '{ if (x < length()) x = length() }'
+` END { print "maximum line length is " x }''
+ This program prints the maximum line length of FILE. The input is
+ piped through the `expand' program to change tabs into spaces, so
+ the widths compared are actually the right-margin columns.
+
+`awk 'BEGIN { FS = ":" }'
+` { print $1 | "sort" }' /etc/passwd'
+ This program prints a sorted list of the login names of all users.
+
+`awk '{ nlines++ }'
+` END { print nlines }''
+ This programs counts lines in a file.
+
+`awk 'END { print NR }''
+ This program also counts lines in a file, but lets `awk' do the
+ work.
+
+`awk '{ print NR, $0 }''
+ This program adds line numbers to all its input files, similar to
+ `cat -n'.
+
+
+File: gawk.info, Node: Patterns, Next: Actions, Prev: One-liners, Up: Top
+
+Patterns
+********
+
+ Patterns in `awk' control the execution of rules: a rule is executed
+when its pattern matches the current input record. This chapter tells
+all about how to write patterns.
+
+* Menu:
+
+* Kinds of Patterns:: A list of all kinds of patterns.
+ The following subsections describe
+ them in detail.
+* Regexp:: Regular expressions such as `/foo/'.
+* Comparison Patterns:: Comparison expressions such as `$1 > 10'.
+* Boolean Patterns:: Combining comparison expressions.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup rules.
+* Empty:: The empty pattern, which matches every record.
+
+
+File: gawk.info, Node: Kinds of Patterns, Next: Regexp, Prev: Patterns, Up: Patterns
+
+Kinds of Patterns
+=================
+
+ Here is a summary of the types of patterns supported in `awk'.
+
+`/REGULAR EXPRESSION/'
+ A regular expression as a pattern. It matches when the text of the
+ input record fits the regular expression. (*Note Regular
+ Expressions as Patterns: Regexp.)
+
+`EXPRESSION'
+ A single expression. It matches when its value, converted to a
+ number, is nonzero (if a number) or nonnull (if a string). (*Note
+ Expressions as Patterns: Expression Patterns.)
+
+`PAT1, PAT2'
+ A pair of patterns separated by a comma, specifying a range of
+ records. (*Note Specifying Record Ranges with Patterns: Ranges.)
+
+`BEGIN'
+`END'
+ Special patterns to supply start-up or clean-up information to
+ `awk'. (*Note `BEGIN' and `END' Special Patterns: BEGIN/END.)
+
+`NULL'
+ The empty pattern matches every input record. (*Note The Empty
+ Pattern: Empty.)
+
+
+File: gawk.info, Node: Regexp, Next: Comparison Patterns, Prev: Kinds of Patterns, Up: Patterns
+
+Regular Expressions as Patterns
+===============================
+
+ A "regular expression", or "regexp", is a way of describing a class
+of strings. A regular expression enclosed in slashes (`/') is an `awk'
+pattern that matches every input record whose text belongs to that
+class.
+
+ The simplest regular expression is a sequence of letters, numbers, or
+both. Such a regexp matches any string that contains that sequence.
+Thus, the regexp `foo' matches any string containing `foo'. Therefore,
+the pattern `/foo/' matches any input record containing `foo'. Other
+kinds of regexps let you specify more complicated classes of strings.
+
+* Menu:
+
+* Regexp Usage:: How to Use Regular Expressions
+* Regexp Operators:: Regular Expression Operators
+* Case-sensitivity:: How to do case-insensitive matching.
+
+
+File: gawk.info, Node: Regexp Usage, Next: Regexp Operators, Prev: Regexp, Up: Regexp
+
+How to Use Regular Expressions
+------------------------------
+
+ A regular expression can be used as a pattern by enclosing it in
+slashes. Then the regular expression is matched against the entire
+text of each record. (Normally, it only needs to match some part of
+the text in order to succeed.) For example, this prints the second
+field of each record that contains `foo' anywhere:
+
+ awk '/foo/ { print $2 }' BBS-list
+
+ Regular expressions can also be used in comparison expressions. Then
+you can specify the string to match against; it need not be the entire
+current input record. These comparison expressions can be used as
+patterns or in `if', `while', `for', and `do' statements.
+
+`EXP ~ /REGEXP/'
+ This is true if the expression EXP (taken as a character string)
+ is matched by REGEXP. The following example matches, or selects,
+ all input records with the upper-case letter `J' somewhere in the
+ first field:
+
+ awk '$1 ~ /J/' inventory-shipped
+
+ So does this:
+
+ awk '{ if ($1 ~ /J/) print }' inventory-shipped
+
+`EXP !~ /REGEXP/'
+ This is true if the expression EXP (taken as a character string)
+ is *not* matched by REGEXP. The following example matches, or
+ selects, all input records whose first field *does not* contain
+ the upper-case letter `J':
+
+ awk '$1 !~ /J/' inventory-shipped
+
+ The right hand side of a `~' or `!~' operator need not be a constant
+regexp (i.e., a string of characters between slashes). It may be any
+expression. The expression is evaluated, and converted if necessary to
+a string; the contents of the string are used as the regexp. A regexp
+that is computed in this way is called a "dynamic regexp". For example:
+
+ identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+"
+ $0 ~ identifier_regexp
+
+sets `identifier_regexp' to a regexp that describes `awk' variable
+names, and tests if the input record matches this regexp.
+
+
+File: gawk.info, Node: Regexp Operators, Next: Case-sensitivity, Prev: Regexp Usage, Up: Regexp
+
+Regular Expression Operators
+----------------------------
+
+ You can combine regular expressions with the following characters,
+called "regular expression operators", or "metacharacters", to increase
+the power and versatility of regular expressions.
+
+ Here is a table of metacharacters. All characters not listed in the
+table stand for themselves.
+
+`^'
+ This matches the beginning of the string or the beginning of a line
+ within the string. For example:
+
+ ^@chapter
+
+ matches the `@chapter' at the beginning of a string, and can be
+ used to identify chapter beginnings in Texinfo source files.
+
+`$'
+ This is similar to `^', but it matches only at the end of a string
+ or the end of a line within the string. For example:
+
+ p$
+
+ matches a record that ends with a `p'.
+
+`.'
+ This matches any single character except a newline. For example:
+
+ .P
+
+ matches any single character followed by a `P' in a string. Using
+ concatenation we can make regular expressions like `U.A', which
+ matches any three-character sequence that begins with `U' and ends
+ with `A'.
+
+`[...]'
+ This is called a "character set". It matches any one of the
+ characters that are enclosed in the square brackets. For example:
+
+ [MVX]
+
+ matches any one of the characters `M', `V', or `X' in a string.
+
+ Ranges of characters are indicated by using a hyphen between the
+ beginning and ending characters, and enclosing the whole thing in
+ brackets. For example:
+
+ [0-9]
+
+ matches any digit.
+
+ To include the character `\', `]', `-' or `^' in a character set,
+ put a `\' in front of it. For example:
+
+ [d\]]
+
+ matches either `d', or `]'.
+
+ This treatment of `\' is compatible with other `awk'
+ implementations, and is also mandated by the POSIX Command Language
+ and Utilities standard. The regular expressions in `awk' are a
+ superset of the POSIX specification for Extended Regular
+ Expressions (EREs). POSIX EREs are based on the regular
+ expressions accepted by the traditional `egrep' utility.
+
+ In `egrep' syntax, backslash is not syntactically special within
+ square brackets. This means that special tricks have to be used to
+ represent the characters `]', `-' and `^' as members of a
+ character set.
+
+ In `egrep' syntax, to match `-', write it as `---', which is a
+ range containing only `-'. You may also give `-' as the first or
+ last character in the set. To match `^', put it anywhere except
+ as the first character of a set. To match a `]', make it the
+ first character in the set. For example:
+
+ []d^]
+
+ matches either `]', `d' or `^'.
+
+`[^ ...]'
+ This is a "complemented character set". The first character after
+ the `[' *must* be a `^'. It matches any characters *except* those
+ in the square brackets (or newline). For example:
+
+ [^0-9]
+
+ matches any character that is not a digit.
+
+`|'
+ This is the "alternation operator" and it is used to specify
+ alternatives. For example:
+
+ ^P|[0-9]
+
+ matches any string that matches either `^P' or `[0-9]'. This
+ means it matches any string that contains a digit or starts with
+ `P'.
+
+ The alternation applies to the largest possible regexps on either
+ side.
+
+`(...)'
+ Parentheses are used for grouping in regular expressions as in
+ arithmetic. They can be used to concatenate regular expressions
+ containing the alternation operator, `|'.
+
+`*'
+ This symbol means that the preceding regular expression is to be
+ repeated as many times as possible to find a match. For example:
+
+ ph*
+
+ applies the `*' symbol to the preceding `h' and looks for matches
+ to one `p' followed by any number of `h's. This will also match
+ just `p' if no `h's are present.
+
+ The `*' repeats the *smallest* possible preceding expression.
+ (Use parentheses if you wish to repeat a larger expression.) It
+ finds as many repetitions as possible. For example:
+
+ awk '/\(c[ad][ad]*r x\)/ { print }' sample
+
+ prints every record in the input containing a string of the form
+ `(car x)', `(cdr x)', `(cadr x)', and so on.
+
+`+'
+ This symbol is similar to `*', but the preceding expression must be
+ matched at least once. This means that:
+
+ wh+y
+
+ would match `why' and `whhy' but not `wy', whereas `wh*y' would
+ match all three of these strings. This is a simpler way of
+ writing the last `*' example:
+
+ awk '/\(c[ad]+r x\)/ { print }' sample
+
+`?'
+ This symbol is similar to `*', but the preceding expression can be
+ matched once or not at all. For example:
+
+ fe?d
+
+ will match `fed' and `fd', but nothing else.
+
+`\'
+ This is used to suppress the special meaning of a character when
+ matching. For example:
+
+ \$
+
+ matches the character `$'.
+
+ The escape sequences used for string constants (*note Constant
+ Expressions: Constants.) are valid in regular expressions as well;
+ they are also introduced by a `\'.
+
+ In regular expressions, the `*', `+', and `?' operators have the
+highest precedence, followed by concatenation, and finally by `|'. As
+in arithmetic, parentheses can change how operators are grouped.
+
+
+File: gawk.info, Node: Case-sensitivity, Prev: Regexp Operators, Up: Regexp
+
+Case-sensitivity in Matching
+----------------------------
+
+ Case is normally significant in regular expressions, both when
+matching ordinary characters (i.e., not metacharacters), and inside
+character sets. Thus a `w' in a regular expression matches only a
+lower case `w' and not an upper case `W'.
+
+ The simplest way to do a case-independent match is to use a character
+set: `[Ww]'. However, this can be cumbersome if you need to use it
+often; and it can make the regular expressions harder for humans to
+read. There are two other alternatives that you might prefer.
+
+ One way to do a case-insensitive match at a particular point in the
+program is to convert the data to a single case, using the `tolower' or
+`toupper' built-in string functions (which we haven't discussed yet;
+*note Built-in Functions for String Manipulation: String Functions.).
+For example:
+
+ tolower($1) ~ /foo/ { ... }
+
+converts the first field to lower case before matching against it.
+
+ Another method is to set the variable `IGNORECASE' to a nonzero
+value (*note Built-in Variables::.). When `IGNORECASE' is not zero,
+*all* regexp operations ignore case. Changing the value of
+`IGNORECASE' dynamically controls the case sensitivity of your program
+as it runs. Case is significant by default because `IGNORECASE' (like
+most variables) is initialized to zero.
+
+ x = "aB"
+ if (x ~ /ab/) ... # this test will fail
+
+ IGNORECASE = 1
+ if (x ~ /ab/) ... # now it will succeed
+
+ In general, you cannot use `IGNORECASE' to make certain rules
+case-insensitive and other rules case-sensitive, because there is no way
+to set `IGNORECASE' just for the pattern of a particular rule. To do
+this, you must use character sets or `tolower'. However, one thing you
+can do only with `IGNORECASE' is turn case-sensitivity on or off
+dynamically for all the rules at once.
+
+ `IGNORECASE' can be set on the command line, or in a `BEGIN' rule.
+Setting `IGNORECASE' from the command line is a way to make a program
+case-insensitive without having to edit it.
+
+ The value of `IGNORECASE' has no effect if `gawk' is in
+compatibility mode (*note Invoking `awk': Command Line.). Case is
+always significant in compatibility mode.
+
+
+File: gawk.info, Node: Comparison Patterns, Next: Boolean Patterns, Prev: Regexp, Up: Patterns
+
+Comparison Expressions as Patterns
+==================================
+
+ "Comparison patterns" test relationships such as equality between
+two strings or numbers. They are a special case of expression patterns
+(*note Expressions as Patterns: Expression Patterns.). They are written
+with "relational operators", which are a superset of those in C. Here
+is a table of them:
+
+`X < Y'
+ True if X is less than Y.
+
+`X <= Y'
+ True if X is less than or equal to Y.
+
+`X > Y'
+ True if X is greater than Y.
+
+`X >= Y'
+ True if X is greater than or equal to Y.
+
+`X == Y'
+ True if X is equal to Y.
+
+`X != Y'
+ True if X is not equal to Y.
+
+`X ~ Y'
+ True if X matches the regular expression described by Y.
+
+`X !~ Y'
+ True if X does not match the regular expression described by Y.
+
+ The operands of a relational operator are compared as numbers if they
+are both numbers. Otherwise they are converted to, and compared as,
+strings (*note Conversion of Strings and Numbers: Conversion., for the
+detailed rules). Strings are compared by comparing the first character
+of each, then the second character of each, and so on, until there is a
+difference. If the two strings are equal until the shorter one runs
+out, the shorter one is considered to be less than the longer one.
+Thus, `"10"' is less than `"9"', and `"abc"' is less than `"abcd"'.
+
+ The left operand of the `~' and `!~' operators is a string. The
+right operand is either a constant regular expression enclosed in
+slashes (`/REGEXP/'), or any expression, whose string value is used as
+a dynamic regular expression (*note How to Use Regular Expressions:
+Regexp Usage.).
+
+ The following example prints the second field of each input record
+whose first field is precisely `foo'.
+
+ awk '$1 == "foo" { print $2 }' BBS-list
+
+Contrast this with the following regular expression match, which would
+accept any record with a first field that contains `foo':
+
+ awk '$1 ~ "foo" { print $2 }' BBS-list
+
+or, equivalently, this one:
+
+ awk '$1 ~ /foo/ { print $2 }' BBS-list
+
+
+File: gawk.info, Node: Boolean Patterns, Next: Expression Patterns, Prev: Comparison Patterns, Up: Patterns
+
+Boolean Operators and Patterns
+==============================
+
+ A "boolean pattern" is an expression which combines other patterns
+using the "boolean operators" "or" (`||'), "and" (`&&'), and "not"
+(`!'). Whether the boolean pattern matches an input record depends on
+whether its subpatterns match.
+
+ For example, the following command prints all records in the input
+file `BBS-list' that contain both `2400' and `foo'.
+
+ awk '/2400/ && /foo/' BBS-list
+
+ The following command prints all records in the input file
+`BBS-list' that contain *either* `2400' or `foo', or both.
+
+ awk '/2400/ || /foo/' BBS-list
+
+ The following command prints all records in the input file
+`BBS-list' that do *not* contain the string `foo'.
+
+ awk '! /foo/' BBS-list
+
+ Note that boolean patterns are a special case of expression patterns
+(*note Expressions as Patterns: Expression Patterns.); they are
+expressions that use the boolean operators. *Note Boolean Expressions:
+Boolean Ops, for complete information on the boolean operators.
+
+ The subpatterns of a boolean pattern can be constant regular
+expressions, comparisons, or any other `awk' expressions. Range
+patterns are not expressions, so they cannot appear inside boolean
+patterns. Likewise, the special patterns `BEGIN' and `END', which
+never match any input record, are not expressions and cannot appear
+inside boolean patterns.
+
+
+File: gawk.info, Node: Expression Patterns, Next: Ranges, Prev: Boolean Patterns, Up: Patterns
+
+Expressions as Patterns
+=======================
+
+ Any `awk' expression is also valid as an `awk' pattern. Then the
+pattern "matches" if the expression's value is nonzero (if a number) or
+nonnull (if a string).
+
+ The expression is reevaluated each time the rule is tested against a
+new input record. If the expression uses fields such as `$1', the
+value depends directly on the new input record's text; otherwise, it
+depends only on what has happened so far in the execution of the `awk'
+program, but that may still be useful.
+
+ Comparison patterns are actually a special case of this. For
+example, the expression `$5 == "foo"' has the value 1 when the value of
+`$5' equals `"foo"', and 0 otherwise; therefore, this expression as a
+pattern matches when the two values are equal.
+
+ Boolean patterns are also special cases of expression patterns.
+
+ A constant regexp as a pattern is also a special case of an
+expression pattern. `/foo/' as an expression has the value 1 if `foo'
+appears in the current input record; thus, as a pattern, `/foo/'
+matches any record containing `foo'.
+
+ Other implementations of `awk' that are not yet POSIX compliant are
+less general than `gawk': they allow comparison expressions, and
+boolean combinations thereof (optionally with parentheses), but not
+necessarily other kinds of expressions.
+
+
+File: gawk.info, Node: Ranges, Next: BEGIN/END, Prev: Expression Patterns, Up: Patterns
+
+Specifying Record Ranges with Patterns
+======================================
+
+ A "range pattern" is made of two patterns separated by a comma, of
+the form `BEGPAT, ENDPAT'. It matches ranges of consecutive input
+records. The first pattern BEGPAT controls where the range begins, and
+the second one ENDPAT controls where it ends. For example,
+
+ awk '$1 == "on", $1 == "off"'
+
+prints every record between `on'/`off' pairs, inclusive.
+
+ A range pattern starts out by matching BEGPAT against every input
+record; when a record matches BEGPAT, the range pattern becomes "turned
+on". The range pattern matches this record. As long as it stays
+turned on, it automatically matches every input record read. It also
+matches ENDPAT against every input record; when that succeeds, the
+range pattern is turned off again for the following record. Now it
+goes back to checking BEGPAT against each record.
+
+ The record that turns on the range pattern and the one that turns it
+off both match the range pattern. If you don't want to operate on
+these records, you can write `if' statements in the rule's action to
+distinguish them.
+
+ It is possible for a pattern to be turned both on and off by the same
+record, if both conditions are satisfied by that record. Then the
+action is executed for just that record.
+
+
+File: gawk.info, Node: BEGIN/END, Next: Empty, Prev: Ranges, Up: Patterns
+
+`BEGIN' and `END' Special Patterns
+==================================
+
+ `BEGIN' and `END' are special patterns. They are not used to match
+input records. Rather, they are used for supplying start-up or
+clean-up information to your `awk' script. A `BEGIN' rule is executed,
+once, before the first input record has been read. An `END' rule is
+executed, once, after all the input has been read. For example:
+
+ awk 'BEGIN { print "Analysis of `foo'" }
+ /foo/ { ++foobar }
+ END { print "`foo' appears " foobar " times." }' BBS-list
+
+ This program finds the number of records in the input file `BBS-list'
+that contain the string `foo'. The `BEGIN' rule prints a title for the
+report. There is no need to use the `BEGIN' rule to initialize the
+counter `foobar' to zero, as `awk' does this for us automatically
+(*note Variables::.).
+
+ The second rule increments the variable `foobar' every time a record
+containing the pattern `foo' is read. The `END' rule prints the value
+of `foobar' at the end of the run.
+
+ The special patterns `BEGIN' and `END' cannot be used in ranges or
+with boolean operators (indeed, they cannot be used with any operators).
+
+ An `awk' program may have multiple `BEGIN' and/or `END' rules. They
+are executed in the order they appear, all the `BEGIN' rules at
+start-up and all the `END' rules at termination.
+
+ Multiple `BEGIN' and `END' sections are useful for writing library
+functions, since each library can have its own `BEGIN' or `END' rule to
+do its own initialization and/or cleanup. Note that the order in which
+library functions are named on the command line controls the order in
+which their `BEGIN' and `END' rules are executed. Therefore you have
+to be careful to write such rules in library files so that the order in
+which they are executed doesn't matter. *Note Invoking `awk': Command
+Line, for more information on using library functions.
+
+ If an `awk' program only has a `BEGIN' rule, and no other rules,
+then the program exits after the `BEGIN' rule has been run. (Older
+versions of `awk' used to keep reading and ignoring input until end of
+file was seen.) However, if an `END' rule exists as well, then the
+input will be read, even if there are no other rules in the program.
+This is necessary in case the `END' rule checks the `NR' variable.
+
+ `BEGIN' and `END' rules must have actions; there is no default
+action for these rules since there is no current record when they run.
+
+
+File: gawk.info, Node: Empty, Prev: BEGIN/END, Up: Patterns
+
+The Empty Pattern
+=================
+
+ An empty pattern is considered to match *every* input record. For
+example, the program:
+
+ awk '{ print $1 }' BBS-list
+
+prints the first field of every record.
+