diff options
Diffstat (limited to 'gawk-info-3')
-rw-r--r-- | gawk-info-3 | 1385 |
1 files changed, 1385 insertions, 0 deletions
diff --git a/gawk-info-3 b/gawk-info-3 new file mode 100644 index 00000000..b333f57c --- /dev/null +++ b/gawk-info-3 @@ -0,0 +1,1385 @@ +Info file gawk-info, produced by Makeinfo, -*- Text -*- from input +file gawk.texinfo. + +This file documents `awk', a program that you can use to select +particular records in a file and perform operations upon them. + +Copyright (C) 1989 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + +Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that this permission notice may be stated in a +translation approved by the Foundation. + + + +File: gawk-info, Node: Patterns, Next: Actions, Prev: One-liners, Up: Top + +Patterns +******** + +Patterns control the execution of rules: a rule is executed when its +pattern matches the input record. The `awk' language provides +several special patterns that are described in the sections that +follow. Patterns include: + +NULL + The empty pattern, which matches every input record. (*Note The + Empty Pattern: Empty.) + +/REGULAR EXPRESSION/ + A regular expression as a pattern. It matches when the text of + the input record fits the regular expression. (*Note Regular + Expressions as Patterns: Regexp.) + +CONDEXP + A single comparison expression. It matches when it is true. + (*Note Comparison Expressions as Patterns: Comparison Patterns.) + +`BEGIN' +`END' + Special patterns to supply start--up or clean--up information to + `awk'. (*Note Specifying Record Ranges With Patterns: BEGIN/END.) + +PAT1, PAT2 + A pair of patterns separated by a comma, specifying a range of + records. (*Note Specifying Record Ranges With Patterns: Ranges.) + +CONDEXP1 BOOLEAN CONDEXP2 + A "compound" pattern, which combines expressions with the + operators `and', `&&', and `or', `||'. (*Note Boolean + Operators and Patterns: Boolean.) + +! CONDEXP + The pattern CONDEXP is evaluated. Then the `!' performs a + boolean ``not'' or logical negation operation; if the input line + matches the pattern in CONDEXP then the associated action is + *not* executed. If the input line did not match that pattern, + then the action *is* executed. (*Note Boolean Operators and + Patterns: Boolean.) + +(EXPR) + Parentheses may be used to control how operators nest. + +PAT1 ? PAT2 : PAT3 + The first pattern is evaluated. If it is true, the input line + is tested against the second pattern, otherwise it is tested + against the third. (*Note Conditional Patterns: Conditional + Patterns.) + +* Menu: + +The following subsections describe these forms in detail: + +* Empty:: The empty pattern, which matches every record. + +* Regexp:: Regular expressions such as `/foo/'. + +* Comparison Patterns:: Comparison expressions such as `$1 > 10'. + +* Boolean:: Combining comparison expressions. + +* Ranges:: Using pairs of patterns to specify record ranges. + +* BEGIN/END:: Specifying initialization and cleanup rules. + +* Conditional Patterns:: Patterns such as `pat1 ? pat2 : pat3'. + + + +File: gawk-info, Node: Empty, Next: Regexp, Up: Patterns + +The Empty Pattern +================= + +An empty pattern is considered to match *every* input record. For +example, the program: + + awk '{ print $1 }' BBS-list + +prints just the first field of every record. + + + +File: gawk-info, Node: Regexp, Next: Comparison Patterns, Prev: Empty, Up: Patterns + +Regular Expressions as Patterns +=============================== + +A "regular expression", or "regexp", is a way of describing classes +of strings. When enclosed in slashes (`/'), it makes an `awk' +pattern that matches every input record that contains a match for the +regexp. + +The simplest regular expression is a sequence of letters, numbers, or +both. Such a regexp matches any string that contains that sequence. +Thus, the regexp `foo' matches any string containing `foo'. (More +complicated regexps let you specify classes of similar strings.) + +* Menu: + +* Usage: Regexp Usage. How regexps are used in patterns. +* Operators: Regexp Operators. How to write a regexp. + + + +File: gawk-info, Node: Regexp Usage, Next: Regexp Operators, Up: Regexp + +How to use Regular Expressions +------------------------------ + +When you enclose `foo' in slashes, you get a pattern that matches a +record that contains `foo'. For example, this prints the second +field of each record that contains `foo' anywhere: + + awk '/foo/ { print $2 }' BBS-list + +Regular expressions can also be used in comparison expressions. Then +you can specify the string to match against; it need not be the +entire current input record. These comparison expressions can be +used as patterns or in `if' and `while' statements. + +`EXP ~ /REGEXP/' + This is true if the expression EXP (taken as a character string) + is matched by REGEXP. The following example matches, or + selects, all input records with the letter `J' in the first field: + + awk '$1 ~ /J/' inventory-shipped + + So does this: + + awk '{ if ($1 ~ /J/) print }' inventory-shipped + +`EXP !~ /REGEXP/' + This is true if the expression EXP (taken as a character string) + is *not* matched by REGEXP. The following example matches, or + selects, all input records whose first field *does not* contain + the letter `J': + + awk '$1 !~ /J/' inventory-shipped + +The right hand side of a `~' or `!~' operator need not be a constant +regexp (i.e. a string of characters between `/'s). It can also be +"computed", or "dynamic". For example: + + identifier = "[A-Za-z_][A-Za-z_0-9]+" + $0 ~ identifier + +sets `identifier' to a regexp that describes `awk' variable names, +and tests if the input record matches this regexp. + +A dynamic regexp may actually be any expression. The expression is +evaluated, and the result is treated as a string that describes a +regular expression. + + + +File: gawk-info, Node: Regexp Operators, Prev: Regexp Usage, Up: Regexp + +Regular Expression Operators +---------------------------- + +You can combine regular expressions with the following characters, +called "regular expression operators", or "metacharacters", to +increase the power and versatility of regular expressions. This is a +table of metacharacters: + +`\' + This is used to suppress the special meaning of a character when + matching. For example: + + \$ + + matches the character `$'. + +`^' + This matches the beginning of the string or the beginning of a + line within the string. For example: + + ^@chapter + + matches the `@chapter' at the beginning of a string, and can be + used to identify chapter beginnings in Texinfo source files. + +`$' + This is similar to `^', but it matches only at the end of a + string or the end of a line within the string. For example: + + /p$/ + + as a pattern matches a record that ends with a `p'. + +`.' + This matches any single character except a newline. For example: + + .P + + matches any single character followed by a `P' in a string. + Using concatenation we can make regular expressions like `U.A', + which matches any three--character string that begins with `U' + and ends with `A'. + +`[...]' + This is called a "character set". It matches any one of a group + of characters that are enclosed in the square brackets. For + example: + + [MVX] + + matches any of the characters `M', `V', or `X' in a string. + + Ranges of characters are indicated by using a hyphen between the + beginning and ending characters, and enclosing the whole thing + in brackets. For example: + + [0-9] + + matches any string that contains a digit. + + Note that special patterns have to be followed to match the + characters, `]', `-', and `^' when they are enclosed in the + square brackets. To match a `]', make it the first character in + the set. For example: + + []d] + + matches either `]', or `d'. + + To match `-', write it as `--', which is a range containing only + `-'. You may also make the `-' be the first or last character + in the set. To match `^', make it any character except the + first one of a set. + +`[^ ...]' + This is the "complemented character set". The first character + after the `[' *must* be a `^'. This matches any characters + *except* those in the square brackets. For example: + + [^0-9] + + matches any characters that are not digits. + +`|' + This is the "alternation operator" and it is used to specify + alternatives. For example: + + ^P|[0-9] + + matches any string that matches either `^P' or `[0-9]'. This + means it matches any string that contains a digit or starts with + `P'. + +`(...)' + Parentheses are used for grouping in regular expressions as in + arithmetic. They can be used to concatenate regular expressions + containing the alternation operator, `|'. + +`*' + This symbol means that the preceding regular expression is to be + repeated as many times as possible to find a match. For example: + + ph* + + applies the `*' symbol to the preceding `h' and looks for + matches to one `p' followed by any number of `h''s. This will + also match just `p' if no `h''s are present. + + The `*' means repeat the *smallest* possible preceding + expression in order to find a match. The `awk' language + processes a `*' by matching as many repetitions as can be found. + For example: + + awk '/\(c[ad][ad]*r x\)/ { print }' sample + + matches every record in the input containing a string of the + form `(car x)', `(cdr x)', `(cadr x)', and so on. + +`+' + This symbol is similar to `*', but the preceding expression must + be matched at least once. This means that: + + wh+y + + would match `why' and `whhy' but not `wy', whereas `wh*y' would + match all three of these strings. And this is a simpler way of + writing the last `*' example: + + awk '/\(c[ad]+r x\)/ { print }' sample + +`?' + This symbol is similar to `*', but the preceding expression can + be matched once or not at all. For example: + + fe?d + + will match `fed' or `fd', but nothing else. + +In regular expressions, the `*', `+', and `?' operators have the +highest precedence, followed by concatenation, and finally by `|'. +As in arithmetic, parentheses can change how operators are grouped. + +Any other character stands for itself. However, it is important to +note that case in regular expressions *is* significant, both when +matching ordinary (i.e. non--metacharacter) characters, and inside +character sets. Thus a `w' in a regular expression matches only a +lower case `w' and not either an uppercase or lowercase `w'. When +you want to do a case--independent match, you have to use a character +set: `[Ww]'. + + + +File: gawk-info, Node: Comparison Patterns, Next: Ranges, Prev: Regexp, Up: Patterns + +Comparison Expressions as Patterns +================================== + +"Comparison patterns" use "relational operators" to compare strings +or numbers. The relational operators are the same as in C. Here is +a table of them: + +`X < Y' + True if X is less than Y. + +`X <= Y' + True if X is less than or equal to Y. + +`X > Y' + True if X is greater than Y. + +`X >= Y' + True if X is greater than or equal to Y. + +`X == Y' + True if X is equal to Y. + +`X != Y' + True if X is not equal to Y. + +Comparison expressions can be used as patterns to control whether a +rule is executed. The expression is evaluated for each input record +read, and the pattern is considered matched if the condition is "true". + +The operands of a relational operator are compared as numbers if they +are both numbers. Otherwise they are converted to, and compared as, +strings (*note Conversion::.). Strings are compared by comparing the +first character of each, then the second character of each, and so on. +Thus, `"10"' is less than `"9"'. + +The following example prints the second field of each input record +whose first field is precisely `foo'. + + awk '$1 == "foo" { print $2 }' BBS-list + +Contrast this with the following regular expression match, which +would accept any record with a first field that contains `foo': + + awk '$1 ~ "foo" { print $2 }' BBS-list + + + +File: gawk-info, Node: Ranges, Next: BEGIN/END, Prev: Comparison Patterns, Up: Patterns + +Specifying Record Ranges With Patterns +====================================== + +A "range pattern" is made of two patterns separated by a comma: +`BEGPAT, ENDPAT'. It matches ranges of consecutive input records. +The first pattern BEGPAT controls where the range begins, and the +second one ENDPAT controls where it ends. + +They work as follows: BEGPAT is matched against every input record; +when a record matches BEGPAT, the range pattern becomes "turned on". +The range pattern matches this record. As long as it stays turned +on, it automatically matches every input record read. But meanwhile, +ENDPAT is matched against every input record, and when it matches, +the range pattern is turned off again for the following record. Now +we go back to checking BEGPAT against each record. For example: + + awk '$1 == "on", $1 == "off"' + +prints every record between on/off pairs, inclusive. + +The record that turns on the range pattern and the one that turns it +off both match the range pattern. If you don't want to operate on +these records, you can write `if' statements in the rule's action to +distinguish them. + +It is possible for a pattern to be turned both on and off by the same +record, if both conditions are satisfied by that record. Then the +action is executed for just that record. + + + +File: gawk-info, Node: BEGIN/END, Next: Boolean, Prev: Ranges, Up: Patterns + +`BEGIN' and `END' Special Patterns +================================== + +`BEGIN' and `END' are special patterns. They are not used to match +input records. Rather, they are used for supplying start--up or +clean--up information to your `awk' script. A `BEGIN' rule is +executed, once, before the first input record has been read. An +`END' rule is executed, once, after all the input has been read. For +example: + + awk 'BEGIN { print "Analysis of ``foo'' program" } + /foo/ { ++foobar } + END { print "``foo'' appears " foobar " times." }' BBS-list + +This program finds out how many times the string `foo' appears in the +input file `BBS-list'. The `BEGIN' pattern prints out a title for +the report. There is no need to use the `BEGIN' pattern to +initialize the counter `foobar' to zero, as `awk' does this for us +automatically (*note Variables::.). The second rule increments the +variable `foobar' every time a record containing the pattern `foo' is +read. The last rule prints out the value of `foobar' at the end of +the run. + +The special patterns `BEGIN' and `END' do not combine with other +kinds of patterns. + +An `awk' program may have multiple `BEGIN' and/or `END' rules. The +contents of multiple `BEGIN' or `END' rules are treated as if they +had been enclosed in a single rule, in the order that the rules are +encountered in the `awk' program. (This feature was introduced with +the new version of `awk'.) + +Multiple `BEGIN' and `END' sections are also useful for writing +library functions that need to do initialization and/or cleanup of +their own. Note that the order in which library functions are named +on the command line will affect the order in which their `BEGIN' and +`END' rules will be executed. Therefore you have to be careful how +you write your library functions. (*Note Command Line::, for more +information on using library functions.) + +If an `awk' program only has a `BEGIN' rule, and no other rules, then +the program will exit after the `BEGIN' rule has been run. Older +versions of `awk' used to read their input until end of file was +seen. However, if an `END' rule exists as well, then the input will +be read, even if there are no other rules in the program. + +`BEGIN' and `END' rules must have actions; there is no default action +for these rules since there is no current record when they run. + + + +File: gawk-info, Node: Boolean, Next: Conditional Patterns, Prev: BEGIN/END, Up: Patterns + +Boolean Operators and Patterns +============================== + +A boolean pattern is a combination of other patterns using the +boolean operators ``or'' (`||'), ``and'' (`&&'), and ``not'' (`!'), +along with parentheses to control nesting. Whether the boolean +pattern matches an input record is computed from whether its +subpatterns match. + +The subpatterns of a boolean pattern can be regular expressions, +matching expressions, comparisons, or other boolean combinations of +such. Range patterns cannot appear inside boolean operators, since +they don't make sense for classifying a single record, and neither +can the special patterns `BEGIN' and `END', which never match any +input record. + +Here are descriptions of the three boolean operators. + +`PAT1 && PAT2' + Matches if both PAT1 and PAT2 match by themselves. For example, + the following command prints all records in the input file + `BBS-list' that contain both `2400' and `foo'. + + awk '/2400/ && /foo/' BBS-list + + Whether PAT2 matches is tested only if PAT1 succeeds. This can + make a difference when PAT2 contains expressions that have side + effects: in the case of `/foo/ && ($2 == bar++)', the variable + `bar' is not incremented if there is no `foo' in the record. + +`PAT1 || PAT2' + Matches if at least one of PAT1 and PAT2 matches the current + input record. For example, the following command prints all + records in the input file `BBS-list' that contain *either* + `2400' or `foo', or both. + + awk '/2400/ || /foo/' BBS-list + + Whether PAT2 matches is tested only if PAT1 fails to match. + This can make a difference when PAT2 contains expressions that + have side effects. + +`!PAT' + Matches if PAT does not match. For example, the following + command prints all records in the input file `BBS-list' that do + *not* contain the string `foo'. + + awk '! /foo/' BBS-list + +Note that boolean patterns are built from other patterns just as +boolean expressions are built from other expressions (*note Boolean +Ops::.). Any boolean expression is also a valid boolean pattern. +But the converse is not true: simple regular expression patterns such +as `/foo/' are not allowed in boolean expressions. Regular +expressions can appear in boolean expressions only in conjunction +with the matching operators, `~' and `!~'. + + + +File: gawk-info, Node: Conditional Patterns, Prev: Boolean, Up: Patterns + +Conditional Patterns +==================== + +Patterns may use a "conditional expression" much like the conditional +expression of the C language. This takes the form: + + PAT1 ? PAT2 : PAT3 + +The first pattern is evaluated. If it evaluates to TRUE, then the +input record is tested against PAT2. Otherwise it is tested against +PAT3. The conditional pattern matches if PAT2 or PAT3 (whichever one +is selected) matches. + + + +File: gawk-info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top + +Actions: The Basics +******************* + +The "action" part of an `awk' rule tells `awk' what to do once a +match for the pattern is found. An action consists of one or more +`awk' "statements", enclosed in curly braces (`{' and `}'). The +curly braces must be used even if the action contains only one +statement, or even if it contains no statements at all. Action +statements are separated by newlines or semicolons. + +Besides the print statements already covered (*note Printing::.), +there are four kinds of action statements: expressions, control +statements, compound statements, and function definitions. + + * "Expressions" include assignments, arithmetic, function calls, + and more (*note Expressions::.). + + * "Control statements" specify the control flow of `awk' programs. + The `awk' language gives you C--like constructs (`if', `for', + `while', and so on) as well as a few special ones (*note + Statements::.). + + * A "compound statement" is just one or more `awk' statements + enclosed in curly braces. This way you can group several + statements to form the body of an `if' or similar statement. + + * You can define "user--defined functions" for use elsewhere in + the `awk' program (*note User-defined::.). + + + +File: gawk-info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top + +Actions: Expressions +******************** + +Expressions are the basic building block of `awk' actions. An +expression evaluates to a value, which you can print, test, store in +a variable or pass to a function. + +But, beyond that, an expression can assign a new value to a variable +or a field, with an assignment operator. + +An expression can serve as a statement on its own. Most other action +statements are made up of various combinations of expressions. As in +other languages, expressions in `awk' include variables, array +references, constants, and function calls, as well as combinations of +these with various operators. + +* Menu: + +* Constants:: String and numeric constants. +* Variables:: Variables give names to values for future use. +* Fields:: Field references such as `$1' are also expressions. +* Arrays:: Array element references are expressions. + +* Arithmetic Ops:: Arithmetic operations (`+', `-', etc.) +* Concatenation:: Concatenating strings. +* Comparison Ops:: Comparison of numbers and strings with `<', etc. +* Boolean Ops:: Combining comparison expressions using boolean operators + `||' (``or''), `&&' (``and'') and `!' (``not''). + +* Assignment Ops:: Changing the value of a variable or a field. +* Increment Ops:: Incrementing the numeric value of a variable. + +* Conversion:: The conversion of strings to numbers and vice versa. +* Conditional Exp:: Conditional expressions select between two subexpressions + under control of a third subexpression. +* Function Calls:: A function call is an expression. + + + +File: gawk-info, Node: Constants, Next: Variables, Up: Expressions + +Constant Expressions +==================== + +There are two types of constants: numeric constants and string +constants. + +The "numeric constant" is a number. This number can be an integer, a +decimal fraction, or a number in scientific (exponential) notation. +Note that all numeric values are represented within `awk' in +double--precision floating point. Here are some examples of numeric +constants, which all have the same value: + + 105 + 1.05e+2 + 1050e-1 + +A string constant consists of a sequence of characters enclosed in +double--quote marks. For example: + + "parrot" + +represents the string constant `parrot'. Strings in `gawk' can be of +any length and they can contain all the possible 8--bit ASCII +characters including ASCII NUL. Other `awk' implementations may have +difficulty with some character codes. + +Some characters cannot be included literally in a string. You +represent them instead with "escape sequences", which are character +sequences beginning with a backslash (`\'). + +One use of the backslash is to include double--quote characters in a +string. Since a plain double--quote would end the string, you must +use `\"'. Backslash itself is another character that can't be +included normally; you write `\\' to put one backslash in the string. + +Another use of backslash is to represent unprintable characters such +as newline. While there is nothing to stop you from writing these +characters directly in an `awk' program, they may look ugly. + +`\b' + Represents a backspaced, H'. + +`\f' + Represents a formfeed, L'. + +`\n' + Represents a newline, J'. + +`\r' + Represents a carriage return, M'. + +`\t' + Represents a horizontal tab, I'. + +`\v' + Represents a vertical tab, K'. + +`\NNN' + Represents the octal value NNN, where NNN is one to three digits + between 0 and 7. For example, the code for the ASCII ESC + (escape) character is `\033'. + + + +File: gawk-info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions + +Variables +========= + +Variables let you give names to values and refer to them later. You +have already seen variables in many of the examples. The name of a +variable must be a sequence of letters, digits and underscores, but +it may not begin with a digit. Case is significant in variable +names; `a' and `A' are distinct variables. + +A variable name is a valid expression by itself; it represents the +variable's current value. Variables are given new values with +"assignment operators" and "increment operators". *Note Assignment +Ops::. + +A few variables have special built--in meanings, such as `FS', the +field separator, and `NF', the number of fields in the current input +record. *Note Special::, for a list of them. Special variables can +be used and assigned just like all other variables, but their values +are also used or changed automatically by `awk'. Each special +variable's name is made entirely of upper case letters. + +Variables in `awk' can be assigned either numeric values or string +values. By default, variables are initialized to the null string, +which has the numeric value zero. So there is no need to +``initialize'' each variable explicitly in `awk', the way you would +need to do in C or most other traditional programming languages. + + + +File: gawk-info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions + +Arithmetic Operators +==================== + +The `awk' language uses the common arithmetic operators when +evaluating expressions. All of these arithmetic operators follow +normal precedence rules, and work as you would expect them to. This +example divides field 3 by field 4, adds field 2, stores the result +into field 1, and prints the results: + + awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped + +The arithmetic operators in `awk' are: + +`X + Y' + Addition. + +`X - Y' + Subtraction. + +`- X' + Negation. + +`X / Y' + Division. Since all numbers in `awk' are double--precision + floating point, the result is not rounded to an integer: `3 / 4' + has the value 0.75. + +`X * Y' + Multiplication. + +`X % Y' + Remainder. The quotient is rounded toward zero to an integer, + multiplied by Y and this result is subtracted from X. This + operation is sometimes known as ``trunc--mod''. The following + relation always holds: + + `b * int(a / b) + (a % b) == a' + + One undesirable effect of this definition of remainder is that X + % Y is negative if X is negative. Thus, + + -17 % 8 = -1 + +`X ^ Y' +`X ** Y' + Exponentiation: X raised to the Y power. `2 ^ 3' has the value + 8. The character sequence `**' is equivalent to `^'. + + + +File: gawk-info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions + +String Concatenation +==================== + +There is only one string operation: concatenation. It does not have +a specific operator to represent it. Instead, concatenation is +performed by writing expressions next to one another, with no +operator. For example: + + awk '{ print "Field number one: " $1 }' BBS-list + +produces, for the first record in `BBS-list': + + Field number one: aardvark + +If you hadn't put the space after the `:', the line would have run +together. For example: + + awk '{ print "Field number one:" $1 }' BBS-list + +produces, for the first record in `BBS-list': + + Field number one:aardvark + + + +File: gawk-info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions + +Comparison Expressions +====================== + +"Comparison expressions" use "relational operators" to compare +strings or numbers. The relational operators are the same as in C. +Here is a table of them: + +`X < Y' + True if X is less than Y. + +`X <= Y' + True if X is less than or equal to Y. + +`X > Y' + True if X is greater than Y. + +`X >= Y' + True if X is greater than or equal to Y. + +`X == Y' + True if X is equal to Y. + +`X != Y' + True if X is not equal to Y. + +`X ~ REGEXP' + True if regexp REGEXP matches the string X. + +`X !~ REGEXP' + True if regexp REGEXP does not match the string X. + +`SUBSCRIPT in ARRAY' + True if array ARRAY has an element with the subscript SUBSCRIPT. + +Comparison expressions have the value 1 if true and 0 if false. + +The operands of a relational operator are compared as numbers if they +are both numbers. Otherwise they are converted to, and compared as, +strings (*note Conversion::.). Strings are compared by comparing the +first character of each, then the second character of each, and so on. +Thus, `"10"' is less than `"9"'. + +For example, + + $1 == "foo" + +has the value of 1, or is true, if the first field of the current +input record is precisely `foo'. By contrast, + + $1 ~ /foo/ + +has the value 1 if the first field contains `foo'. + + + +File: gawk-info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions + +Boolean Operators +================= + +A boolean expression is combination of comparison expressions or +matching expressions, using the boolean operators ``or'' (`||'), +``and'' (`&&'), and ``not'' (`!'), along with parentheses to control +nesting. The truth of the boolean expression is computed by +combining the truth values of the component expressions. + +Boolean expressions can be used wherever comparison and matching +expressions can be used. They can be used in `if' and `while' +statements. They have numeric values (1 if true, 0 if false). + +In addition, every boolean expression is also a valid boolean +pattern, so you can use it as a pattern to control the execution of +rules. + +Here are descriptions of the three boolean operators, with an example +of each. It may be instructive to compare these examples with the +analogous examples of boolean patterns (*note Boolean::.), which use +the same boolean operators in patterns instead of expressions. + +`BOOLEAN1 && BOOLEAN2' + True if both BOOLEAN1 and BOOLEAN2 are true. For example, the + following statement prints the current input record if it + contains both `2400' and `foo'. + + if ($0 ~ /2400/ && $0 ~ /foo/) print + + The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is + true. This can make a difference when BOOLEAN2 contains + expressions that have side effects: in the case of `$0 ~ /foo/ + && ($2 == bar++)', the variable `bar' is not incremented if + there is no `foo' in the record. + +`BOOLEAN1 || BOOLEAN2' + True if at least one of BOOLEAN1 and BOOLEAN2 is true. For + example, the following command prints all records in the input + file `BBS-list' that contain *either* `2400' or `foo', or both. + + awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list + + The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is + true. This can make a difference when BOOLEAN2 contains + expressions that have side effects. + +`!BOOLEAN' + True if BOOLEAN is false. For example, the following program + prints all records in the input file `BBS-list' that do *not* + contain the string `foo'. + + awk '{ if (! ($0 ~ /foo/)) print }' BBS-list + + + +File: gawk-info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions + +Assignment Operators +==================== + +An "assignment" is an expression that stores a new value into a +variable. For example, let's assign the value 1 to the variable `z': + + z = 1 + +After this expression is executed, the variable `z' has the value 1. +Whatever old value `z' had before the assignment is forgotten. + +The `=' sign is called an "assignment operator". It is the simplest +assignment operator because the value of the right--hand operand is +stored unchanged. + +The left--hand operand of an assignment can be a variable (*note +Variables::.), a field (*note Changing Fields::.) or an array element +(*note Arrays::.). These are all called "lvalues", which means they +can appear on the left side of an assignment operator. The +right--hand operand may be any expression; it produces the new value +which the assignment stores in the specified variable, field or array +element. + +Assignments can store string values also. For example, this would +store the value `"this food is good"' in the variable `message': + + thing = "food" + predicate = "good" + message = "this " thing " is " predicate + +(This also illustrates concatenation of strings.) + +It is important to note that variables do *not* have permanent types. +The type of a variable is simply the type of whatever value it +happens to hold at the moment. In the following program fragment, +the variable `foo' has a numeric value at first, and a string value +later on: + + foo = 1 + print foo + foo = "bar" + print foo + +When the second assignment gives `foo' a string value, the fact that +it previously had a numeric value is forgotten. + +An assignment is an expression, so it has a value: the same value +that is assigned. Thus, `z = 1' as an expression has the value 1. +One consequence of this is that you can write multiple assignments +together: + + x = y = z = 0 + +stores the value 0 in all three variables. It does this because the +value of `z = 0', which is 0, is stored into `y', and then the value +of `y = z = 0', which is 0, is stored into `x'. + +You can use an assignment anywhere an expression is called for. For +example, it is valid to write `x != (y = 1)' to set `y' to 1 and then +test whether `x' equals 1. But this style tends to make programs +hard to read; except in a one--shot program, you should rewrite it to +get rid of such nesting of assignments. This is never very hard. + +Aside from `=', there are several other assignment operators that do +arithmetic with the old value of the variable. For example, the +operator `+=' computes a new value by adding the right--hand value to +the old value of the variable. Thus, the following assignment adds 5 +to the value of `foo': + + foo += 5 + +This is precisely equivalent to the following: + + foo = foo + 5 + +Use whichever one makes the meaning of your program clearer. + +Here is a table of the arithmetic assignment operators. In each +case, the right--hand operand is an expression whose value is +converted to a number. + +`LVALUE += INCREMENT' + Adds INCREMENT to the value of LVALUE to make the new value of + LVALUE. + +`LVALUE -= DECREMENT' + Subtracts DECREMENT from the value of LVALUE. + +`LVALUE *= COEFFICIENT' + Multiplies the value of LVALUE by COEFFICIENT. + +`LVALUE /= QUOTIENT' + Divides the value of LVALUE by QUOTIENT. + +`LVALUE %= MODULUS' + Sets LVALUE to its remainder by MODULUS. + +`LVALUE ^= POWER' +`LVALUE **= POWER' + Raises LVALUE to the power POWER. + + + +File: gawk-info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions + +Increment Operators +=================== + +"Increment operators" increase or decrease the value of a variable by +1. You could do the same thing with an assignment operator, so the +increment operators add no power to the `awk' language; but they are +convenient abbreviations for something very common. + +The operator to add 1 is written `++'. There are two ways to use +this operator: pre--incrementation and post--incrementation. + +To pre--increment a variable V, write `++V'. This adds 1 to the +value of V and that new value is also the value of this expression. +The assignment expression `V += 1' is completely equivalent. + +Writing the `++' after the variable specifies post--increment. This +increments the variable value just the same; the difference is that +the value of the increment expression itself is the variable's *old* +value. Thus, if `foo' has value 4, then the expression `foo++' has +the value 4, but it changes the value of `foo' to 5. + +The post--increment `foo++' is nearly equivalent to writing `(foo += +1) - 1'. It is not perfectly equivalent because all numbers in `awk' +are floating point: in floating point, `foo + 1 - 1' does not +necessarily equal `foo'. But the difference will be minute as long +as you stick to numbers that are fairly small (less than a trillion). + +Any lvalue can be incremented. Fields and array elements are +incremented just like variables. + +The decrement operator `--' works just like `++' except that it +subtracts 1 instead of adding. Like `++', it can be used before the +lvalue to pre--decrement or after it to post--decrement. + +Here is a summary of increment and decrement expressions. + +`++LVALUE' + This expression increments LVALUE and the new value becomes the + value of this expression. + +`LVALUE++' + This expression causes the contents of LVALUE to be incremented. + The value of the expression is the *old* value of LVALUE. + +`--LVALUE' + Like `++LVALUE', but instead of adding, it subtracts. It + decrements LVALUE and delivers the value that results. + +`LVALUE--' + Like `LVALUE++', but instead of adding, it subtracts. It + decrements LVALUE. The value of the expression is the *old* + value of LVALUE. + + + +File: gawk-info, Node: Conversion, Next: Conditional Exp, Prev: Increment Ops, Up: Expressions + +Conversion of Strings and Numbers +================================= + +Strings are converted to numbers, and numbers to strings, if the +context of your `awk' statement demands it. For example, if the +values of `foo' or `bar' in the expression `foo + bar' happen to be +strings, they are converted to numbers before the addition is +performed. If numeric values appear in string concatenation, they +are converted to strings. Consider this: + + two = 2; three = 3 + print (two three) + 4 + +This eventually prints the (numeric) value `27'. The numeric +variables `two' and `three' are converted to strings and concatenated +together, and the resulting string is converted back to a number +before adding `4'. The resulting numeric value `27' is printed. + +If, for some reason, you need to force a number to be converted to a +string, concatenate the null string with that number. To force a +string to be converted to a number, add zero to that string. Strings +that can't be interpreted as valid numbers are given the numeric +value zero. + +The exact manner in which numbers are converted into strings is +controlled by the `awk' special variable `OFMT' (*note Special::.). +Numbers are converted using a special version of the `sprintf' +function (*note Built-in::.) with `OFMT' as the format specifier. + +`OFMT''s default value is `"%.6g"', which prints a value with at +least six significant digits. You might want to change it to specify +more precision, if your version of `awk' uses double precision +arithmetic. Double precision on most modern machines gives you 16 or +17 decimal digits of precision. + +Strange results can happen if you set `OFMT' to a string that doesn't +tell `sprintf' how to format floating point numbers in a useful way. +For example, if you forget the `%' in the format, all numbers will be +converted to the same constant string. + + + +File: gawk-info, Node: Conditional Exp, Next: Function Calls, Prev: Conversion, Up: Expressions + +Conditional Expressions +======================= + +A "conditional expression" is a special kind of expression with three +operands. It allows you to use one expression's value to select one +of two other expressions. + +The conditional expression looks the same as in the C language: + + SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP + +There are three subexpressions. The first, SELECTOR, is always +computed first. If it is ``true'' (not zero) then IF-TRUE-EXP is +computed next and its value becomes the value of the whole expression. +Otherwise, IF-FALSE-EXP is computed next and its value becomes the +value of the whole expression. + +For example, this expression produces the absolute value of `x': + + x > 0 ? x : -x + +Each time the conditional expression is computed, exactly one of +IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This +is important when the expressions contain side effects. For example, +this conditional expression examines element `i' of either array `a' +or array `b', and increments `i'. + + x == y ? a[i++] : b[i++] + +This is guaranteed to increment `i' exactly once, because each time +one or the other of the two increment expressions will be executed +and the other will not be. + + + +File: gawk-info, Node: Function Calls, Prev: Conditional Exp, Up: Expressions + +Function Calls +============== + +A "function" is a name for a particular calculation. Because it has +a name, you can ask for it by name at any point in the program. For +example, the function `sqrt' computes the square root of a number. + +A fixed set of functions are "built in", which means they are +available in every `awk' program. The `sqrt' function is one of +these. *Note Built-in::, for a list of built--in functions and their +descriptions. In addition, you can define your own functions in the +program for use elsewhere in the same program. *Note User-defined::, +for how to do this. + +The way to use a function is with a "function call" expression, which +consists of the function name followed by a list of "arguments" in +parentheses. The arguments are expressions which give the raw +materials for the calculation that the function will do. When there +is more than one argument, they are separated by commas. If there +are no arguments, write just `()' after the function name. + +*Do not put any space between the function name and the +open--parenthesis!* A user--defined function name looks just like +the name of a variable, and space would make the expression look like +concatenation of a variable with an expression inside parentheses. +Space before the parenthesis is harmless with built--in functions, +but it is best not to get into the habit of using space, lest you do +likewise for a user--defined function one day by mistake. + +Each function needs a particular number of arguments. For example, +the `sqrt' function must be called with a single argument, like this: + + sqrt(ARGUMENT) + +The argument is the number to take the square root of. + +Some of the built--in functions allow you to omit the final argument. +If you do so, they will use a reasonable default. *Note Built-in::, +for full details. If arguments are omitted in calls to user--defined +functions, then those arguments are treated as local variables, +initialized to the null string (*note User-defined::.). + +Like every other expression, the function call has a value, which is +computed by the function based on the arguments you give it. In this +example, the value of `sqrt(ARGUMENT)' is the square root of the +argument. A function can also have side effects, such as assigning +the values of certain variables or doing I/O. + +Here is a command to read numbers, one number per line, and print the +square root of each one: + + awk '{ print "The square root of", $1, "is", sqrt($1) }' + + + +File: gawk-info, Node: Statements, Next: Arrays, Prev: Expressions, Up: Top + +Actions: Statements +******************* + +"Control statements" such as `if', `while', and so on control the +flow of execution in `awk' programs. Most of the control statements +in `awk' are patterned on similar statements in C. + +The simplest kind of statement is an expression. The other kinds of +statements start with special keywords such as `if' and `while', to +distinguish them from simple expressions. + +In all the examples in this chapter, BODY can be either a single +statement or a group of statements. Groups of statements are +enclosed in braces, and separated by newlines or semicolons. + +* Menu: + +* Expressions:: One kind of statement simply computes an expression. + +* If:: Conditionally execute some `awk' statements. + +* While:: Loop until some condition is satisfied. + +* Do:: Do specified action while looping until some + condition is satisfied. + +* For:: Another looping statement, that provides + initialization and increment clauses. + +* Break:: Immediately exit the innermost enclosing loop. + +* Continue:: Skip to the end of the innermost enclosing loop. + +* Next:: Stop processing the current input record. + +* Exit:: Stop execution of `awk'. + + + +File: gawk-info, Node: If, Next: While, Up: Statements + +The `if' Statement +================== + +The `if'-`else' statement is `awk''s decision--making statement. The +`else' part of the statement is optional. + + `if (CONDITION) BODY1 else BODY2' + +Here CONDITION is an expression that controls what the rest of the +statement will do. If CONDITION is true, BODY1 is executed; +otherwise, BODY2 is executed (assuming that the `else' clause is +present). The condition is considered true if it is nonzero or +nonnull. + +Here is an example: + + awk '{ if (x % 2 == 0) + print "x is even" + else + print "x is odd" }' + +In this example, if the statement containing `x' is found to be true +(that is, x is divisible by 2), then the first `print' statement is +executed, otherwise the second `print' statement is performed. + +If the `else' appears on the same line as BODY1, and BODY1 is a +single statement, then a semicolon must separate BODY1 from `else'. +To illustrate this, let's rewrite the previous example: + + awk '{ if (x % 2 == 0) print "x is even"; else + print "x is odd" }' + +If you forget the `;', `awk' won't be able to parse it, and you will +get a syntax error. + +We would not actually write this example this way, because a human +reader might fail to see the `else' if it were not the first thing on +its line. + + + +File: gawk-info, Node: While, Next: Do, Prev: If, Up: Statements + +The `while' Statement +===================== + +In programming, a loop means a part of a program that is (or at least +can be) executed two or more times in succession. + +The `while' statement is the simplest looping statement in `awk'. It +repeatedly executes a statement as long as a condition is true. It +looks like this: + + while (CONDITION) + BODY + +Here BODY is a statement that we call the "body" of the loop, and +CONDITION is an expression that controls how long the loop keeps +running. + +The first thing the `while' statement does is test CONDITION. If +CONDITION is true, it executes the statement BODY. After BODY has +been executed, CONDITION is tested again and this process is repeated +until CONDITION is no longer true. If CONDITION is initially false, +the body of the loop is never executed. + + awk '{ i = 1 + while (i <= 3) { + print $i + i++ + } + }' + +This example prints the first three input fields, one per line. + +The loop works like this: first, the value of `i' is set to 1. Then, +the `while' tests whether `i' is less than or equal to three. This +is the case when `i' equals one, so the `i'-th field is printed. +Then the `i++' increments the value of `i' and the loop repeats. + +When `i' reaches 4, the loop exits. Here BODY is a compound +statement enclosed in braces. As you can see, a newline is not +required between the condition and the body; but using one makes the +program clearer unless the body is a compound statement or is very +simple. + + + +File: gawk-info, Node: Do, Next: For, Prev: While, Up: Statements + +The `do'--`while' Statement +=========================== + +The `do' loop is a variation of the `while' looping statement. The +`do' loop executes the BODY once, then repeats BODY as long as +CONDITION is true. It looks like this: + + do + BODY + while (CONDITION) + +Even if CONDITION is false at the start, BODY is executed at least +once (and only once, unless executing BODY makes CONDITION true). +Contrast this with the corresponding `while' statement: + + while (CONDITION) + BODY + +This statement will not execute BODY even once if CONDITION is false +to begin with. + +Here is an example of a `do' statement: + + awk '{ i = 1 + do { + print $0 + i++ + } while (i <= 10) + }' + +prints each input record ten times. It isn't a very realistic +example, since in this case an ordinary `while' would do just as +well. But this is normal; there is only occasionally a real use for +a `do' statement. + + |