diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2010-07-02 15:53:23 +0300 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2010-07-02 15:53:23 +0300 |
commit | f3d9dd233ac07f764a554528c85be3768a1d1ddb (patch) | |
tree | f190ab7e0188c66eba76a74b8717e3ad7b16ef04 /gawk-info-2 | |
parent | 0f1b7311fbc0e61e3e12194ce3e8484aaa4b7fe6 (diff) | |
download | egawk-f3d9dd233ac07f764a554528c85be3768a1d1ddb.tar.gz egawk-f3d9dd233ac07f764a554528c85be3768a1d1ddb.tar.bz2 egawk-f3d9dd233ac07f764a554528c85be3768a1d1ddb.zip |
Now at gawk 2.10.
Diffstat (limited to 'gawk-info-2')
-rw-r--r-- | gawk-info-2 | 1265 |
1 files changed, 1265 insertions, 0 deletions
diff --git a/gawk-info-2 b/gawk-info-2 new file mode 100644 index 00000000..a228c5b9 --- /dev/null +++ b/gawk-info-2 @@ -0,0 +1,1265 @@ +Info file gawk-info, produced by Makeinfo, -*- Text -*- from input +file gawk.texinfo. + +This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them. + +Copyright (C) 1989 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + +Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that this permission notice may be stated in a +translation approved by the Foundation. + + + +File: gawk-info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files + +Examining Fields +================ + +When `awk' reads an input record, the record is automatically +separated or "parsed" by the interpreter into pieces called "fields". +By default, fields are separated by whitespace, like words in a line. +Whitespace in `awk' means any string of one or more spaces and/or +tabs; other characters such as newline, formfeed, and so on, that are +considered whitespace by other languages are *not* considered +whitespace by `awk'. + +The purpose of fields is to make it more convenient for you to refer +to these pieces of the record. You don't have to use them--you can +operate on the whole record if you wish--but fields are what make +simple `awk' programs so powerful. + +To refer to a field in an `awk' program, you use a dollar--sign, `$', +followed by the number of the field you want. Thus, `$1' refers to +the first field, `$2' to the second, and so on. For example, suppose +the following is a line of input: + + This seems like a pretty nice example. + + Here the first field, or `$1', is `This'; the second field, or `$2', +is `seems'; and so on. Note that the last field, `$7', is +`example.'. Because there is no space between the `e' and the `.', +the period is considered part of the seventh field. + +No matter how many fields there are, the last field in a record can +be represented by `$NF'. So, in the example above, `$NF' would be +the same as `$7', which is `example.'. Why this works is explained +below (*note Non-Constant Fields::.). If you try to refer to a field +beyond the last one, such as `$8' when the record has only 7 fields, +you get the empty string. + +Plain `NF', with no `$', is a special variable whose value is the +number of fields in the current record. + +`$0', which looks like an attempt to refer to the zeroth field, is a +special case: it represents the whole input record. This is what you +would use when you aren't interested in fields. + +Here are some more examples: + + awk '$1 ~ /foo/ { print $0 }' BBS-list + +This example contains the "matching" operator `~' (*note Comparison +Ops::.). Using this operator, all records in the file `BBS-list' +whose first field contains the string `foo' are printed. + +By contrast, the following example: + + awk '/foo/ { print $1, $NF }' BBS-list + +looks for the string `foo' in *the entire record* and prints the +first field and the last field for each input record containing the +pattern. + +The following program will search the system password file, and print +the entries for users who have no password. + + awk -F: '$2 == ""' /etc/passwd + +This program uses the `-F' option on the command line to set the file +separator. (Fields in `/etc/passwd' are separated by colons. The +second field represents a user's encrypted password, but if the field +is empty, that user has no password.) + + + +File: gawk-info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files + +Non-constant Field Numbers +========================== + +The number of a field does not need to be a constant. Any expression +in the `awk' language can be used after a `$' to refer to a field. +The `awk' utility evaluates the expression and uses the "numeric +value" as a field number. Consider this example: + + awk '{ print $NR }' + +Recall that `NR' is the number of records read so far: 1 in the first +record, 2 in the second, etc. So this example will print the first +field of the first record, the second field of the second record, and +so on. For the twentieth record, field number 20 will be printed; +most likely this will make a blank line, because the record will not +have 20 fields. + +Here is another example of using expressions as field numbers: + + awk '{ print $(2*2) }' BBS-list + +The `awk' language must evaluate the expression `(2*2)' and use its +value as the field number to print. The `*' sign represents +multiplication, so the expression `2*2' evaluates to 4. This +example, then, prints the hours of operation (the fourth field) for +every line of the file `BBS-list'. + +When you use non--constant field numbers, you may ask for a field +with a negative number. This always results in an empty string, just +like a field whose number is too large for the input record. For +example, `$(1-4)' would try to examine field number -3; it would +result in an empty string. + +If the field number you compute is zero, you get the entire record. + +The number of fields in the current record is stored in the special +variable `NF' (*note Special::.). The expression `$NF' is not a +special feature: it is the direct consequence of evaluating `NF' and +using its value as a field number. + + + +File: gawk-info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files + +Changing the Contents of a Field +================================ + +You can change the contents of a field as seen by `awk' within an +`awk' program; this changes what `awk' perceives as the current input +record. (The actual input is untouched: `awk' never modifies the +input file.) + +Look at this example: + + awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped + +The `-' sign represents subtraction, so this program reassigns field +three, `$3', to be the value of field two minus ten, ``$2' - 10'. +(*Note Arithmetic Ops::.) Then field two, and the new value for +field three, are printed. + +In order for this to work, the text in field `$2' must make sense as +a number; the string of characters must be converted to a number in +order for the computer to do arithmetic on it. The number resulting +from the subtraction is converted back to a string of characters +which then becomes field 3. *Note Conversion::. + +When you change the value of a field (as perceived by `awk'), the +text of the input record is recalculated to contain the new field +where the old one was. `$0' will from that time on reflect the +altered field. Thus, + + awk '{ $2 = $2 - 10; print $0 }' inventory-shipped + +will print a copy of the input file, with 10 subtracted from the +second field of each line. + +You can also assign contents to fields that are out of range. For +example: + + awk '{ $6 = ($5 + $4 + $3 + $2)/4) ; print $6 }' inventory-shipped + +We've just created `$6', whose value is the average of fields `$2', +`$3', `$4', and `$5'. The `+' sign represents addition, and the `/' +sign represents division. For the file `inventory-shipped' `$6' +represents the average number of parcels shipped for a particular +month. + +Creating a new field changes what `awk' interprets as the current +input record. The value of `$0' will be recomputed. This +recomputation affects and is affected by features not yet discussed, +in particular, the "Output Field Separator", `OFS', which is used to +separate the fields (*note Output Separators::.), and `NF' (the +number of fields; *note Fields::.). For example, the value of `NF' +will be set to the number of the highest out--of--range field you +create. + +Note, however, that merely *referencing* an out--of--range field will +*not* change the value of either `$0' or `NF'. Referencing an +out--of--range field merely produces a null string. For example: + + if ($(NF+1) != "") + print "can't happen" + else + print "everything is normal" + +should print `everything is normal'. (*Note If::, for more +information about `awk''s `if-else' statements.) + + + +File: gawk-info, Node: Field Separators, Next: Multiple, Prev: Changing Fields, Up: Reading Files + +Specifying How Fields Are Separated +=================================== + +You can change the way `awk' splits a record into fields by changing +the value of the "field separator". The field separator is +represented by the special variable `FS' in an `awk' program, and can +be set by `-F' on the command line. The `awk' language scans each +input line for the field separator character to determine the +positions of fields within that line. Shell programmers take note! +`awk' uses the variable `FS', not `IFS'. + +The default value of the field separator is a string containing a +single space. This value is actually a special case; as you know, by +default, fields are separated by whitespace sequences, not by single +spaces: two spaces in a row do not delimit an empty field. +``Whitespace'' is defined as sequences of one or more spaces or tab +characters. + +You change the value of `FS' by "assigning" it a new value. You can +do this using the special `BEGIN' pattern (*note BEGIN/END::.). This +pattern allows you to change the value of `FS' before any input is +read. The new value of `FS' is enclosed in quotations. For example, +set the value of `FS' to the string `","': + + awk 'BEGIN { FS = "," } ; { print $2 }' + +and use the input line: + + John Q. Smith, 29 Oak St., Walamazoo, MI 42139 + +This `awk' program will extract the string `29 Oak St.'. + +Sometimes your input data will contain separator characters that +don't separate fields the way you thought they would. For instance, +the person's name in the example we've been using might have a title +or suffix attached, such as `John Q. Smith, LXIX'. If you assigned +`FS' to be `,' then: + + awk 'BEGIN { FS = "," } ; { print $2 } + +would extract `LXIX', instead of `29 Oak St.'. If you were expecting +the program to print the address, you would be surprised. So, choose +your data layout and separator characters carefully to prevent +problems like this from happening. + +You can assign `FS' to be a series of characters. For example, the +assignment: + + FS = ", \t" + +makes every area of an input line that consists of a comma followed +by a space and a tab, into a field separator. (`\t' stands for a tab.) + +If `FS' is any single character other than a blank, then that +character is used as the field separator, and two successive +occurrences of that character do delimit an empty field. + +If you assign `FS' to a string longer than one character, that string +is evaluated as a "regular expression" (*note Regexp::.). The value +of the regular expression is used as a field separator. + +`FS' can be set on the command line. You use the `-F' argument to do +so. For example: + + awk -F, 'PROGRAM' INPUT-FILES + +sets `FS' to be the `,' character. Notice that the argument uses a +capital `F'. Contrast this with `-f', which specifies a file +containing an `awk' program. Case is significant in command options: +the `-F' and `-f' options have nothing to do with each other. You +can use both options at the same time to set the `FS' argument *and* +get an `awk' program from a file. + +As a special case, if the argument to `-F' is `t', then `FS' is set +to the tab character. (This is because if you type `-F\t', without +the quotes, at the shell, the `\' gets deleted, so `awk' figures that +you really want your fields to be separated with tabs, and not `t's. +Use `FS="t"' if you really do want to separate your fields with `t's.) + +For example, let's use an `awk' program file called `baud.awk' that +contains the pattern `/300/', and the action `print $1'. We'll use +the operating system utility `cat' to ``look'' at our program: + + % cat baud.awk + /300/ { print $1 } + +Let's also set `FS' to be the `-' character. We will apply all this +information to the file `BBS-list'. This `awk' program will now +print a list of the names of the bulletin boards that operate at 300 +baud and the first three digits of their phone numbers. + + awk -F- -f baud.awk BBS-list + +produces this output: + + aardvark 555 + alpo + barfly 555 + bites 555 + camelot 555 + core 555 + fooey 555 + foot 555 + macfoo 555 + sdace 555 + sabafoo 555 + +Note the second line of output. If you check the original file, you +will see that the second line looked like this: + + alpo-net 555-3412 2400/1200/300 A + +The `-' as part of the system's name was used as the field separator, +instead of the `-' in the phone number that was originally intended. +This demonstrates why you have to be careful in choosing your field +and record separators. + + + +File: gawk-info, Node: Multiple, Next: Assignment Options, Prev: Field Separators, Up: Reading Files + +Multiple--Line Records +====================== + +In some data bases, a single line cannot conveniently hold all the +information in one entry. Then you will want to use multi--line +records. + +The first step in doing this is to choose your data format: when +records are not defined as single lines, how will you want to define +them? What should separate records? + +One technique is to use an unusual character or string to separate +records. For example, you could use the formfeed character (written +`\f' in `awk', as in C) to separate them, making each record a page +of the file. To do this, just set the variable `RS' to `"\f"' (a +string containing the formfeed character), or whatever string you +prefer to use. + +Another technique is to have blank lines separate records. By a +special dispensation, a null string as the value of `RS' indicates +that records are separated by one or more blank lines. If you set +`RS' to the null string, a record will always end at the first blank +line encountered. And the next record won't start until the first +nonblank line that follows--no matter how many blank lines appear in +a row, they will be considered one record--separator. + +The second step is to separate the fields in the record. One way to +do this is to put each field on a separate line: to do this, just set +the variable `FS' to the string `"\n"'. (This simple regular +expression matches a single newline.) Another idea is to divide each +of the lines into fields in the normal manner; the regular expression +`"[ \t\n]+"' will do this nicely by treating the newlines inside the +record just like spaces. + +When `RS' is set to the null string, the newline character *always* +acts as a field separator. This is in addition to whatever value +`FS' has. The probable reason for this rule is so that you get +rational behavior in the default case (i.e. `FS == " "'). This can +be a problem if you really don't want the newline character to +separate fields, since there is no way to do that. However, you can +work around this by using the `split' function to manually break up +your data (*note String Functions::.). + +Here is how to use records separated by blank lines and break each +line into fields normally: + + awk 'BEGIN { RS = ""; FS = "[ \t\n]+" } ; { print $0 }' BBS-list + + + +File: gawk-info, Node: Assignment Options, Next: Getline, Prev: Multiple, Up: Reading Files + +Assigning Variables on the Command Line +======================================= + +You can include variable "assignments" among the file names on the +command line used to invoke `awk' (*note Command Line::.). Such +assignments have the form: + + VARIABLE=TEXT + +and allow you to change variables either at the beginning of the +`awk' run or in between input files. The variable assignment is +performed at a time determined by its position among the input file +arguments: after the processing of the preceding input file argument. +For example: + + awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list + +prints the value of field number `n' for all input records. Before +the first file is read, the command line sets the variable `n' equal +to 4. This causes the fourth field of the file `inventory-shipped' +to be printed. After the first file has finished, but before the +second file is started, `n' is set to 2, so that the second field of +the file `BBS-list' will be printed. + +Command line arguments are made available for explicit examination by +the `awk' program in an array named `ARGV' (*note Special::.). + + + +File: gawk-info, Node: Getline, Prev: Assignment Options, Up: Reading Files + +Explicit Input with `getline' +============================= + +So far we have been getting our input files from `awk''s main input +stream--either the standard input (usually your terminal) or the +files specified on the command line. The `awk' language has a +special built--in function called `getline' that can be used to read +input under your explicit control. + +This command is quite complex and should *not* be used by beginners. +The command (and its variations) is covered here because this is the +section about input. The examples that follow the explanation of the +`getline' command include material that has not been covered yet. +Therefore, come back and attempt the `getline' command *after* you +have reviewed the rest of this manual and have a good knowledge of +how `awk' works. + +When retrieving input, `getline' returns a 1 if it found a record, +and a 0 if the end of the file was encountered. If there was some +error in getting a record, such as a file that could not be opened, +then `getline' returns a -1. + +In the following examples, COMMAND stands for a string value that +represents a shell command. + +`getline' + The `getline' function can be used by itself, in an `awk' + program, to read input from the current input. All it does in + this case is read the next input record and split it up into + fields. This is useful if you've finished processing the + current record, but you want to do some special processing + *right now* on the next record. Here's an example: + + awk '{ + if (t = index($0, "/*")) { + if(t > 1) + tmp = substr($0, 1, t - 1) + else + tmp = "" + u = index(substr($0, t + 2), "*/") + while (! u) { + getline + t = -1 + u = index($0, "*/") + } + if(u <= length($0) - 2) + $0 = tmp substr($0, t + u + 3) + else + $0 = tmp + } + print $0 + }' + + This `awk' program deletes all comments, `/* ... */', from the + input. By replacing the `print $0' with other statements, you + could perform more complicated processing on the de--commented + input, such as search it for matches for a regular expression. + + This form of the `getline' command sets `NF' (the number of + fields; *note Fields::.), `NR' (the number of records read so + far), the `FNR' variable (*note Records::.), and the value of + `$0'. + + *Note:* The new value of `$0' will be used in testing the + patterns of any subsequent rules. The original value of `$0' + that triggered the rule which executed `getline' is lost. By + contrast, the `next' statement reads a new record but + immediately begins processing it normally, starting with the + first rule in the program. *Note Next::. + +`getline VAR' + This form of `getline' reads a record into the variable VAR. + This is useful when you want your program to read the next + record from the input file, but you don't want to subject the + record to the normal input processing. + + For example, suppose the next line is a comment, or a special + string, and you want to read it, but you must make certain that + it won't accidentally trigger any rules. This version of + `getline' will allow you to read that line and store it in a + variable so that the main read--a--line--and--check--each--rule + loop of `awk' never sees it. + + The following example swaps every two lines of input. For + example, given: + + wan + tew + free + phore + + it outputs: + + tew + wan + phore + free + + Here's the program: + + awk '{ + if ((getline tmp) > 0) { + print tmp + print $0 + } else + print $0 + }' + + The `getline' function used in this way sets only `NR' and `FNR' + (and of course, VAR). The record is not split into fields, so + the values of the fields (including `$0') and the value of `NF' + do not change. + +`getline < FILE' + This form of the `getline' function takes its input from the + file FILE. Here FILE is a string--valued expression that + specifies the file name. + + This form is useful if you want to read your input from a + particular file, instead of from the main input stream. For + example, the following program reads its input record from the + file `foo.input' when it encounters a first field with a value + equal to 10 in the current input file. + + awk '{ + if ($1 == 10) { + getline < "foo.input" + print + } else + print + }' + + Since the main input stream is not used, the values of `NR' and + `FNR' are not changed. But the record read is split into fields + in the normal manner, so the values of `$0' and other fields are + changed. So is the value of `NF'. + + This does not cause the record to be tested against all the + patterns in the `awk' program, in the way that would happen if + the record were read normally by the main processing loop of + `awk'. However the new record is tested against any subsequent + rules, just as when `getline' is used without a redirection. + +`getline VAR < FILE' + This form of the `getline' function takes its input from the + file FILE and puts it in the variable VAR. As above, FILE is a + string--valued expression that specifies the file to read from. + + In this version of `getline', none of the built--in variables + are changed, and the record is not split into fields. The only + variable changed is VAR. + + For example, the following program copies all the input files to + the output, except for records that say `@include FILENAME'. + Such a record is replaced by the contents of the file FILENAME. + + awk '{ + if (NF == 2 && $1 == "@include") { + while ((getline line < $2) > 0) + print line + close($2) + } else + print + }' + + Note here how the name of the extra input file is not built into + the program; it is taken from the data, from the second field on + the `@include' line. + + The `close' command is used to ensure that if two identical + `@include' lines appear in the input, the entire specified file + is included twice. *Note Close Input::. + + One deficiency of this program is that it does not process + nested `@include' statements the way a true macro preprocessor + would. + +`COMMAND | getline' + You can "pipe" the output of a command into `getline'. A pipe + is simply a way to link the output of one program to the input + of another. In this case, the string COMMAND is run as a shell + command and its output is piped into `awk' to be used as input. + This form of `getline' reads one record from the pipe. + + For example, the following program copies input to output, + except for lines that begin with `@execute', which are replaced + by the output produced by running the rest of the line as a + shell command: + + awk '{ + if ($1 == "@execute") { + tmp = substr($0, 10) + while ((tmp | getline) > 0) + print + close(tmp) + } else + print + }' + + The `close' command is used to ensure that if two identical + `@execute' lines appear in the input, the command is run again + for each one. *Note Close Input::. + + Given the input: + + foo + bar + baz + @execute who + bletch + + the program might produce: + + foo + bar + baz + hack ttyv0 Jul 13 14:22 + hack ttyp0 Jul 13 14:23 (gnu:0) + hack ttyp1 Jul 13 14:23 (gnu:0) + hack ttyp2 Jul 13 14:23 (gnu:0) + hack ttyp3 Jul 13 14:23 (gnu:0) + bletch + + Notice that this program ran the command `who' and printed the + result. (If you try this program yourself, you will get + different results, showing you logged in.) + + This variation of `getline' splits the record into fields, sets + the value of `NF' and recomputes the value of `$0'. The values + of `NR' and `FNR' are not changed. + +`COMMAND | getline VAR' + The output of the command COMMAND is sent through a pipe to + `getline' and into the variable VAR. For example, the following + program reads the current date and time into the variable + `current_time', using the utility called `date', and then prints + it. + + awk 'BEGIN { + "date" | getline current_time + close("date") + print "Report printed on " current_time + }' + + In this version of `getline', none of the built--in variables + are changed, and the record is not split into fields. + + + +File: gawk-info, Node: Close Input, Up: Getline + +Closing Input Files +------------------- + +If the same file name or the same shell command is used with +`getline' more than once during the execution of the `awk' program, +the file is opened (or the command is executed) only the first time. +At that time, the first record of input is read from that file or +command. The next time the same file or command is used in +`getline', another record is read from it, and so on. + +What this implies is that if you want to start reading the same file +again from the beginning, or if you want to rerun a shell command +(rather that reading more output from the command), you must take +special steps. What you can do is use the `close' statement: + + close (FILENAME) + +This statement closes a file or pipe, represented here by FILENAME. +The string value of FILENAME must be the same value as the string +used to open the file or pipe to begin with. + +Once this statement is executed, the next `getline' from that file or +command will reopen the file or rerun the command. + + + +File: gawk-info, Node: Printing, Next: One-liners, Prev: Reading Files, Up: Top + +Printing Output +*************** + +One of the most common things that actions do is to output or "print" +some or all of the input. For simple output, use the `print' +statement. For fancier formatting use the `printf' statement. Both +are described in this chapter. + +* Menu: + +* Print:: The `print' statement. +* Print Examples:: Simple examples of `print' statements. +* Output Separators:: The output separators and how to change them. + +* Redirection:: How to redirect output to multiple files and pipes. +* Close Output:: How to close output files and pipes. + +* Printf:: The `printf' statement. + + + +File: gawk-info, Node: Print, Next: Print Examples, Up: Printing + +The `print' Statement +===================== + +The `print' statement does output with simple, standardized +formatting. You specify only the strings or numbers to be printed, +in a list separated by commas. They are output, separated by single +spaces, followed by a newline. The statement looks like this: + + print ITEM1, ITEM2, ... + + The entire list of items may optionally be enclosed in parentheses. +The parentheses are necessary if any of the item expressions uses a +relational operator; otherwise it could be confused with a +redirection (*note Redirection::.). The relational operators are +`==', `!=', `<', `>', `>=', `<=', `~' and `!~' (*note Comparison +Ops::.). + +The items printed can be constant strings or numbers, fields of the +current record (such as `$1'), variables, or any `awk' expressions. +The `print' statement is completely general for computing *what* +values to print. With one exception (*note Output Separators::.), +what you can't do is specify *how* to print them--how many columns to +use, whether to use exponential notation or not, and so on. For +that, you need the `printf' statement (*note Printf::.). + +To print a fixed piece of text, write a string constant as one item, +such as `"Hello there"'. If you forget to use the double--quote +characters, your text will be taken as an `awk' expression, and you +will probably get an error. Keep in mind that a space will be +printed between any two items. + +The simple statement `print' with no items is equivalent to `print +$0': it prints the entire current record. To print a blank line, use +`print ""', where `""' is the null, or empty, string. + +Most often, each `print' statement makes one line of output. But it +isn't limited to one line. If an item value is a string that +contains a newline, the newline is output along with the rest of the +string. A single `print' can make any number of lines this way. + + + +File: gawk-info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing + +Examples of `print' Statements +============================== + +Here is an example that prints the first two fields of each input +record, with a space between them: + + awk '{ print $1, $2 }' inventory-shipped + +Its output looks like this: + + Jan 13 + Feb 15 + Mar 15 + ... + + A common mistake in using the `print' statement is to omit the comma +between two items. This often has the effect of making the items run +together in the output, with no space. The reason for this is that +juxtaposing two string expressions in `awk' means to concatenate +them. For example, without the comma: + + awk '{ print $1 $2 }' inventory-shipped + +prints: + + Jan13 + Feb15 + Mar15 + ... + + Neither example's output makes much sense to someone unfamiliar with +the file `inventory-shipped'. A heading line at the beginning would +make it clearer. Let's add some headings to our table of months +(`$1') and green crates shipped (`$2'). We do this using the BEGIN +pattern (*note BEGIN/END::.) to cause the headings to be printed only +once: + + awk 'BEGIN { print "Month Crates" + print "---- -----" } + { print $1, $2 }' inventory-shipped + +Did you already guess what will happen? This program prints the +following: + + Month Crates + ---- ----- + Jan 13 + Feb 15 + Mar 15 + ... + + The headings and the table data don't line up! We can fix this by +printing some spaces between the two fields: + + awk 'BEGIN { print "Month Crates" + print "---- -----" } + { print $1, " ", $2 }' inventory-shipped + +You can imagine that this way of lining up columns can get pretty +complicated when you have many columns to fix. Counting spaces for +two or three columns can be simple, but more than this and you can +get ``lost'' quite easily. This is why the `printf' statement was +created (*note Printf::.); one of its specialties is lining up +columns of data. + + + +File: gawk-info, Node: Output Separators, Next: Redirection, Prev: Print Examples, Up: Printing + +Output Separators +================= + +As mentioned previously, a `print' statement contains a list of +items, separated by commas. In the output, the items are normally +separated by single spaces. But they do not have to be spaces; a +single space is only the default. You can specify any string of +characters to use as the "output field separator", by setting the +special variable `OFS'. The initial value of this variable is the +string `" "'. + +The output from an entire `print' statement is called an "output +record". Each `print' statement outputs one output record and then +outputs a string called the "output record separator". The special +variable `ORS' specifies this string. The initial value of the +variable is the string `"\n"' containing a newline character; thus, +normally each `print' statement makes a separate line. + +You can change how output fields and records are separated by +assigning new values to the variables `OFS' and/or `ORS'. The usual +place to do this is in the `BEGIN' rule (*note BEGIN/END::.), so that +it happens before any input is processed. You may also do this with +assignments on the command line, before the names of your input files. + +The following example prints the first and second fields of each +input record separated by a semicolon, with a blank line added after +each line: + + awk 'BEGIN { OFS = ";"; ORS = "\n\n" } + { print $1, $2 }' BBS-list + +If the value of `ORS' does not contain a newline, all your output +will be run together on a single line, unless you output newlines +some other way. + + + +File: gawk-info, Node: Redirection, Next: Printf, Prev: Output Separators, Up: Printing + +Redirecting Output of `print' and `printf' +========================================== + +So far we have been dealing only with output that prints to the +standard output, usually your terminal. Both `print' and `printf' +can be told to send their output to other places. This is called +"redirection". + +A redirection appears after the `print' or `printf' statement. +Redirections in `awk' are written just like redirections in shell +commands, except that they are written inside the `awk' program. + +Here are the three forms of output redirection. They are all shown +for the `print' statement, but they work for `printf' also. + +`print ITEMS > OUTPUT-FILE' + This type of redirection prints the items onto the output file + OUTPUT-FILE. The file name OUTPUT-FILE can be any expression. + Its value is changed to a string and then used as a filename + (*note Expressions::.). + + When this type of redirection is used, the OUTPUT-FILE is erased + before the first output is written to it. Subsequent writes do + not erase OUTPUT-FILE, but append to it. If OUTPUT-FILE does + not exist, then it is created. + + For example, here is how one `awk' program can write a list of + BBS names to a file `name-list' and a list of phone numbers to a + file `phone-list'. Each output file contains one name or number + per line. + + awk '{ print $2 > "phone-list" + print $1 > "name-list" }' BBS-list + +`print ITEMS >> OUTPUT-FILE' + This type of redirection prints the items onto the output file + OUTPUT-FILE. The difference between this and the single--`>' + redirection is that the old contents (if any) of OUTPUT-FILE are + not erased. Instead, the `awk' output is appended to the file. + +`print ITEMS | COMMAND' + It is also possible to send output through a "pipe" instead of + into a file. This type of redirection opens a pipe to COMMAND + and writes the values of ITEMS through this pipe, to another + process created to execute COMMAND. + + The redirection argument COMMAND is actually an `awk' + expression. Its value is converted to a string, whose contents + give the shell command to be run. + + For example, this produces two files, one unsorted list of BBS + names and one list sorted in reverse alphabetical order: + + awk '{ print $1 > "names.unsorted" + print $1 | "sort -r > names.sorted" }' BBS-list + + Here the unsorted list is written with an ordinary redirection + while the sorted list is written by piping through the `sort' + utility. + + Here is an example that uses redirection to mail a message to a + mailing list `bug-system'. This might be useful when trouble is + encountered in an `awk' script run periodically for system + maintenance. + + print "Awk script failed:", $0 | "mail bug-system" + print "processing record number", FNR, "of", FILENAME | "mail bug-system" + close ("mail bug-system") + + We use a `close' statement here because it's a good idea to + close the pipe as soon as all the intended output has been sent + to it. *Note Close Output::, for more information on this. + +Redirecting output using `>', `>>', or `|' asks the system to open a +file or pipe only if the particular FILE or COMMAND you've specified +has not already been written to by your program. + + + +File: gawk-info, Node: Close Output, Up: Redirection + +Closing Output Files and Pipes +------------------------------ + +When a file or pipe is opened, the filename or command associated +with it is remembered by `awk' and subsequent writes to the same file +or command are appended to the previous writes. The file or pipe +stays open until `awk' exits. This is usually convenient. + +Sometimes there is a reason to close an output file or pipe earlier +than that. To do this, use the `close' command, as follows: + + close (FILENAME) + +or + + close (COMMAND) + +The argument FILENAME or COMMAND can be any expression. Its value +must exactly equal the string used to open the file or pipe to begin +with--for example, if you open a pipe with this: + + print $1 | "sort -r > names.sorted" + +then you must close it with this: + + close ("sort -r > names.sorted") + +Here are some reasons why you might need to close an output file: + + * To write a file and read it back later on in the same `awk' + program. Close the file when you are finished writing it; then + you can start reading it with `getline' (*note Getline::.). + + * To write numerous files, successively, in the same `awk' + program. If you don't close the files, eventually you will + exceed the system limit on the number of open files in one + process. So close each one when you are finished writing it. + + * To make a command finish. When you redirect output through a + pipe, the command reading the pipe normally continues to try to + read input as long as the pipe is open. Often this means the + command cannot really do its work until the pipe is closed. For + example, if you redirect output to the `mail' program, the + message will not actually be sent until the pipe is closed. + + * To run the same subprogram a second time, with the same arguments. + This is not the same thing as giving more input to the first run! + + For example, suppose you pipe output to the `mail' program. If + you output several lines redirected to this pipe without closing + it, they make a single message of several lines. By contrast, + if you close the pipe after each line of output, then each line + makes a separate message. + + + +File: gawk-info, Node: Printf, Prev: Redirection, Up: Printing + +Using `printf' Statements For Fancier Printing +============================================== + +If you want more precise control over the output format than `print' +gives you, use `printf'. With `printf' you can specify the width to +use for each item, and you can specify various stylistic choices for +numbers (such as what radix to use, whether to print an exponent, +whether to print a sign, and how many digits to print after the +decimal point). You do this by specifying a "format string". + +* Menu: + +* Basic Printf:: Syntax of the `printf' statement. +* Format-Control:: Format-control letters. +* Modifiers:: Format--specification modifiers. +* Printf Examples:: Several examples. + + + +File: gawk-info, Node: Basic Printf, Next: Format-Control, Up: Printf + +Introduction to the `printf' Statement +-------------------------------------- + +The `printf' statement looks like this: + + printf FORMAT, ITEM1, ITEM2, ... + + The entire list of items may optionally be enclosed in parentheses. +The parentheses are necessary if any of the item expressions uses a +relational operator; otherwise it could be confused with a +redirection (*note Redirection::.). The relational operators are +`==', `!=', `<', `>', `>=', `<=', `~' and `!~' (*note Comparison +Ops::.). + +The difference between `printf' and `print' is the argument FORMAT. +This is an expression whose value is taken as a string; its job is to +say how to output each of the other arguments. It is called the +"format string". + +The format string is essentially the same as in the C library +function `printf'. Most of FORMAT is text to be output verbatim. +Scattered among this text are "format specifiers", one per item. +Each format specifier says to output the next item at that place in +the format. + +The `printf' statement does not automatically append a newline to its +output. It outputs nothing but what the format specifies. So if you +want a newline, you must include one in the format. The output +separator variables `OFS' and `ORS' have no effect on `printf' +statements. + + + +File: gawk-info, Node: Format-Control, Next: Modifiers, Prev: Basic Printf, Up: Printf + +Format--Control Characters +-------------------------- + +A format specifier starts with the character `%' and ends with a +"format--control letter"; it tells the `printf' statement how to +output one item. (If you actually want to output a `%', write `%%'.) +The format--control letter specifies what kind of value to print. +The rest of the format specifier is made up of optional "modifiers" +which are parameters such as the field width to use. + +Here is a list of them: + +`c' + This prints a number as an ASCII character. Thus, `printf "%c", + 65' outputs the letter `A'. The output for a string value is + the first character of the string. + +`d' + This prints a decimal integer. + +`e' + This prints a number in scientific (exponential) notation. For + example, + + printf "%4.3e", 1950 + + prints `1.950e+03', with a total of 4 significant figures of + which 3 follow the decimal point. The `4.3' are "modifiers", + discussed below. + +`f' + This prints a number in floating point notation. + +`g' + This prints either scientific notation or floating point + notation, whichever is shorter. + +`o' + This prints an unsigned octal integer. + +`s' + This prints a string. + +`x' + This prints an unsigned hexadecimal integer. + +`%' + This isn't really a format--control letter, but it does have a + meaning when used after a `%': the sequence `%%' outputs one + `%'. It does not consume an argument. + + + +File: gawk-info, Node: Modifiers, Next: Printf Examples, Prev: Format-Control, Up: Printf + +Modifiers for `printf' Formats +------------------------------ + +A format specification can also include "modifiers" that can control +how much of the item's value is printed and how much space it gets. +The modifiers come between the `%' and the format--control letter. +Here are the possible modifiers, in the order in which they may appear: + +`-' + The minus sign, used before the width modifier, says to + left--justify the argument within its specified width. Normally + the argument is printed right--justified in the specified width. + +`WIDTH' + This is a number representing the desired width of a field. + Inserting any number between the `%' sign and the format control + character forces the field to be expanded to this width. The + default way to do this is to pad with spaces on the left. + +`.PREC' + This is a number that specifies the precision to use when + printing. This specifies the number of digits you want printed + to the right of the decimal place. + +The C library `printf''s dynamic WIDTH and PREC capability (for +example, `"%*.*s"') is not supported. However, it can be easily +simulated using concatenation to dynamically build the format string. + + + +File: gawk-info, Node: Printf Examples, Prev: Modifiers, Up: Printf + +Examples of Using `printf' +-------------------------- + +Here is how to use `printf' to make an aligned table: + + awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list + +prints the names of bulletin boards (`$1') of the file `BBS-list' as +a string of 10 characters, left justified. It also prints the phone +numbers (`$2') afterward on the line. This will produce an aligned +two--column table of names and phone numbers, like so: + + aardvark 555-5553 + alpo-net 555-3412 + barfly 555-7685 + bites 555-1675 + camelot 555-0542 + core 555-2912 + fooey 555-1234 + foot 555-6699 + macfoo 555-6480 + sdace 555-3430 + sabafoo 555-2127 + +Did you notice that we did not specify that the phone numbers be +printed as numbers? They had to be printed as strings because the +numbers are separated by a dash. This dash would be interpreted as a +"minus" sign if we had tried to print the phone numbers as numbers. +This would have led to some pretty confusing results. + +We did not specify a width for the phone numbers because they are the +last things on their lines. We don't need to put spaces after them. + +We could make our table look even nicer by adding headings to the +tops of the columns. To do this, use the BEGIN pattern (*note +BEGIN/END::.) to cause the header to be printed only once, at the +beginning of the `awk' program: + + awk 'BEGIN { print "Name Number" + print "--- -----" } + { printf "%-10s %s\n", $1, $2 }' BBS-list + +Did you notice that we mixed `print' and `printf' statements in the +above example? We could have used just `printf' statements to get +the same results: + + awk 'BEGIN { printf "%-10s %s\n", "Name", "Number" + printf "%-10s %s\n", "---", "-----" } + { printf "%-10s %s\n", $1, $2 }' BBS-list + +By outputting each column heading with the same format specification +used for the elements of the column, we have made sure that the +headings will be aligned just like the columns. + +The fact that the same format specification is used can be emphasized +by storing it in a variable, like so: + + awk 'BEGIN { format = "%-10s %s\n" + printf format, "Name", "Number" + printf format, "---", "-----" } + { printf format, $1, $2 }' BBS-list + +See if you can use the `printf' statement to line up the headings and +table data for our `inventory-shipped' example covered earlier in the +section on the `print' statement (*note Print::.). + + + +File: gawk-info, Node: One-liners, Next: Patterns, Prev: Printing, Up: Top + +Useful ``One-liners'' +********************* + +Useful `awk' programs are often short, just a line or two. Here is a +collection of useful, short programs to get you started. Some of +these programs contain constructs that haven't been covered yet. The +description of the program will give you a good idea of what is going +on, but please read the rest of the manual to become an `awk' expert! + +`awk '{ num_fields = num_fields + NF }' +`` END { print num_fields }''' + This program prints the total number of fields in all input lines. + +`awk 'length($0) > 80'' + This program prints every line longer than 80 characters. The + sole rule has a relational expression as its pattern, and has no + action (so the default action, printing the record, is used). + +`awk 'NF > 0'' + This program prints every line that has at least one field. + This is an easy way to delete blank lines from a file (or + rather, to create a new file similar to the old file but from + which the blank lines have been deleted). + +`awk '{ if (NF > 0) print }'' + This program also prints every line that has at least one field. + Here we allow the rule to match every line, then decide in the + action whether to print. + +`awk 'BEGIN { for (i = 1; i <= 7; i++)' +`` print int(101 * rand()) }''' + This program prints 7 random numbers from 0 to 100, inclusive. + +`ls -l FILES | awk '{ x += $4 } ; END { print "total bytes: " x }'' + This program prints the total number of bytes used by FILES. + +`expand FILE | awk '{ if (x < length()) x = length() }' +`` END { print "maximum line length is " x }''' + This program prints the maximum line length of FILE. The input + is piped through the `expand' program to change tabs into + spaces, so the widths compared are actually the right--margin + columns. + + |