aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2014-11-06 06:19:45 +0200
committerArnold D. Robbins <arnold@skeeve.com>2014-11-16 20:00:45 +0200
commit6e6d960b0964b43f3c94e19872537f7fd4603f59 (patch)
tree3f4230badb1a7070c7fe1a17ac25b97c0c894202
parent757eacd6cf522e56df34372ca7e6968817947cbb (diff)
downloadegawk-6e6d960b0964b43f3c94e19872537f7fd4603f59.tar.gz
egawk-6e6d960b0964b43f3c94e19872537f7fd4603f59.tar.bz2
egawk-6e6d960b0964b43f3c94e19872537f7fd4603f59.zip
Copyedits. Through page 72 or so in ORA MS.
-rw-r--r--doc/gawktexi.in214
1 files changed, 111 insertions, 103 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index dfb52d75..971faae4 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -173,6 +173,9 @@
@macro DBXREF{text}
@xref{\text\}
@end macro
+@macro DBPXREF{text}
+@pxref{\text\}
+@end macro
@end ifdocbook
@ifnotdocbook
@@ -182,6 +185,9 @@
@macro DBXREF{text}
@xref{\text\},
@end macro
+@macro DBPXREF{text}
+@pxref{\text\},
+@end macro
@end ifnotdocbook
@ifclear FOR_PRINT
@@ -5223,7 +5229,7 @@ sequences and that are not listed in the following stand for themselves:
@cindex backslash (@code{\}), regexp operator
@cindex @code{\} (backslash), regexp operator
@item @code{\}
-This is used to suppress the special meaning of a character when
+This suppresses the special meaning of a character when
matching. For example, @samp{\$}
matches the character @samp{$}.
@@ -5232,8 +5238,9 @@ matches the character @samp{$}.
@cindex @code{^} (caret), regexp operator
@cindex caret (@code{^}), regexp operator
@item @code{^}
-This matches the beginning of a string. For example, @samp{^@@chapter}
-matches @samp{@@chapter} at the beginning of a string and can be used
+This matches the beginning of a string. @samp{^@@chapter}
+matches @samp{@@chapter} at the beginning of a string,
+for example, and can be used
to identify chapter beginnings in Texinfo source files.
The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to
match only at the beginning of the string.
@@ -5339,7 +5346,7 @@ There are two subtle points to understand about how @samp{*} works.
First, the @samp{*} applies only to the single preceding regular expression
component (e.g., in @samp{ph*}, it applies just to the @samp{h}).
To cause @samp{*} to apply to a larger sub-expression, use parentheses:
-@samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph} and so on.
+@samp{(ph)*} matches @samp{ph}, @samp{phph}, @samp{phphph}, and so on.
Second, @samp{*} finds as many repetitions as possible. If the text
to be matched is @samp{phhhhhhhhhhhhhhooey}, @samp{ph*} matches all of
@@ -5439,7 +5446,7 @@ expressions are not available in regular expressions.
@cindex range expressions (regexps)
@cindex character lists in regular expression
-As mentioned earlier, a bracket expression matches any character amongst
+As mentioned earlier, a bracket expression matches any character among
those listed between the opening and closing square brackets.
Within a bracket expression, a @dfn{range expression} consists of two
@@ -5497,23 +5504,23 @@ a keyword denoting the class, and @samp{:]}.
POSIX standard.
@float Table,table-char-classes
-@caption{POSIX Character Classes}
+@caption{POSIX character classes}
@multitable @columnfractions .15 .85
@headitem Class @tab Meaning
-@item @code{[:alnum:]} @tab Alphanumeric characters.
-@item @code{[:alpha:]} @tab Alphabetic characters.
-@item @code{[:blank:]} @tab Space and TAB characters.
-@item @code{[:cntrl:]} @tab Control characters.
-@item @code{[:digit:]} @tab Numeric characters.
-@item @code{[:graph:]} @tab Characters that are both printable and visible.
-(A space is printable but not visible, whereas an @samp{a} is both.)
-@item @code{[:lower:]} @tab Lowercase alphabetic characters.
-@item @code{[:print:]} @tab Printable characters (characters that are not control characters).
-@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits,
-control characters, or space characters).
-@item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few).
-@item @code{[:upper:]} @tab Uppercase alphabetic characters.
-@item @code{[:xdigit:]} @tab Characters that are hexadecimal digits.
+@item @code{[:alnum:]} @tab Alphanumeric characters
+@item @code{[:alpha:]} @tab Alphabetic characters
+@item @code{[:blank:]} @tab Space and TAB characters
+@item @code{[:cntrl:]} @tab Control characters
+@item @code{[:digit:]} @tab Numeric characters
+@item @code{[:graph:]} @tab Characters that are both printable and visible
+(a space is printable but not visible, whereas an @samp{a} is both)
+@item @code{[:lower:]} @tab Lowercase alphabetic characters
+@item @code{[:print:]} @tab Printable characters (characters that are not control characters)
+@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits
+control characters, or space characters)
+@item @code{[:space:]} @tab Space characters (such as space, TAB, and formfeed, to name a few)
+@item @code{[:upper:]} @tab Uppercase alphabetic characters
+@item @code{[:xdigit:]} @tab Characters that are hexadecimal digits
@end multitable
@end float
@@ -5528,7 +5535,7 @@ and numeric characters in your character set.
@c Thanks to
@c Date: Tue, 01 Jul 2014 07:39:51 +0200
@c From: Hermann Peifer <peifer@gmx.eu>
-Some utilities that match regular expressions provide a non-standard
+Some utilities that match regular expressions provide a nonstandard
@code{[:ascii:]} character class; @command{awk} does not. However, you
can simulate such a construct using @code{[\x00-\x7F]}. This matches
all values numerically between zero and 127, which is the defined
@@ -5887,16 +5894,16 @@ in @ref{Regexp Operators}.
@end ifnottex
@item @code{--posix}
-Only POSIX regexps are supported; the GNU operators are not special
+Match only POSIX regexps; the GNU operators are not special
(e.g., @samp{\w} matches a literal @samp{w}). Interval expressions
are allowed.
@cindex Brian Kernighan's @command{awk}
@item @code{--traditional}
-Traditional Unix @command{awk} regexps are matched. The GNU operators
+Match traditional Unix @command{awk} regexps. The GNU operators
are not special, and interval expressions are not available.
-The POSIX character classes (@samp{[[:alnum:]]}, etc.) are supported,
-as BWK @command{awk} supports them.
+Because BWK @command{awk} supports them,
+the POSIX character classes (@samp{[[:alnum:]]}, etc.) are available.
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
@@ -5956,7 +5963,7 @@ When @code{IGNORECASE} is not zero, @emph{all} regexp and string
operations ignore case.
Changing the value of @code{IGNORECASE} dynamically controls the
-case-sensitivity of the program as it runs. Case is significant by
+case sensitivity of the program as it runs. Case is significant by
default because @code{IGNORECASE} (like most variables) is initialized
to zero:
@@ -5969,7 +5976,7 @@ if (x ~ /ab/) @dots{} # now it will succeed
@end example
In general, you cannot use @code{IGNORECASE} to make certain rules
-case-insensitive and other rules case-sensitive, because there is no
+case insensitive and other rules case sensitive, as there is no
straightforward way
to set @code{IGNORECASE} just for the pattern of
a particular rule.@footnote{Experienced C and C++ programmers will note
@@ -5980,7 +5987,7 @@ and
However, this is somewhat obscure and we don't recommend it.}
To do this, use either bracket expressions or @code{tolower()}. However, one
thing you can do with @code{IGNORECASE} only is dynamically turn
-case-sensitivity on or off for all the rules at once.
+case sensitivity on or off for all the rules at once.
@code{IGNORECASE} can be set on the command line or in a @code{BEGIN} rule
(@pxref{Other Arguments}; also
@@ -6023,12 +6030,12 @@ in conditional expressions, or as part of matching expressions
using the @samp{~} and @samp{!~} operators.
@item
-Escape sequences let you represent non-printable characters and
+Escape sequences let you represent nonprintable characters and
also let you represent regexp metacharacters as literal characters
to be matched.
@item
-Regexp operators provide grouping, alternation and repetition.
+Regexp operators provide grouping, alternation, and repetition.
@item
Bracket expressions give you a shorthand for specifying sets
@@ -6043,8 +6050,8 @@ the match, such as for text substitution and when the record separator
is a regexp.
@item
-Matching expressions may use dynamic regexps, that is, string values
-treated as regular expressions.
+Matching expressions may use dynamic regexps (i.e., string values
+treated as regular expressions).
@item
@command{gawk}'s @code{IGNORECASE} variable lets you control the
@@ -6129,7 +6136,7 @@ never automatically reset to zero.
@end menu
@node awk split records
-@subsection Record Splitting With Standard @command{awk}
+@subsection Record Splitting with Standard @command{awk}
@cindex separators, for records
@cindex record separators
@@ -6160,7 +6167,7 @@ awk 'BEGIN @{ RS = "u" @}
@noindent
changes the value of @code{RS} to @samp{u}, before reading any input.
-This is a string whose first character is the letter ``u;'' as a result, records
+This is a string whose first character is the letter ``u''; as a result, records
are separated by the letter ``u.'' Then the input file is read, and the second
rule in the @command{awk} program (the action with no pattern) prints each
record. Because each @code{print} statement adds a newline at the end of
@@ -6276,7 +6283,7 @@ The empty string @code{""} (a string without any characters)
has a special meaning
as the value of @code{RS}. It means that records are separated
by one or more blank lines and nothing else.
-@xref{Multiple Line}, for more details.
+@DBXREF{Multiple Line} for more details.
If you change the value of @code{RS} in the middle of an @command{awk} run,
the new value is used to delimit subsequent records, but the record
@@ -6296,7 +6303,7 @@ sets the variable @code{RT} to the text in the input that matched
@code{RS}.
@node gawk split records
-@subsection Record Splitting With @command{gawk}
+@subsection Record Splitting with @command{gawk}
@cindex common extensions, @code{RS} as a regexp
@cindex extensions, common@comma{} @code{RS} as a regexp
@@ -6340,11 +6347,11 @@ $ @kbd{echo record 1 AAAA record 2 BBBB record 3 |}
The square brackets delineate the contents of @code{RT}, letting you
see the leading and trailing whitespace. The final value of
@code{RT} is a newline.
-@xref{Simple Sed}, for a more useful example
+@DBXREF{Simple Sed} for a more useful example
of @code{RS} as a regexp and @code{RT}.
If you set @code{RS} to a regular expression that allows optional
-trailing text, such as @samp{RS = "abc(XYZ)?"} it is possible, due
+trailing text, such as @samp{RS = "abc(XYZ)?"}, it is possible, due
to implementation constraints, that @command{gawk} may match the leading
part of the regular expression, but not the trailing part, particularly
if the input text that could match the trailing part is fairly long.
@@ -6407,7 +6414,7 @@ character as a record separator. However, this is a special case:
@cindex records, treating files as
@cindex treating files, as single records
-@xref{Readfile Function}, for an interesting way to read
+@DBXREF{Readfile Function} for an interesting way to read
whole files. If you are using @command{gawk}, see @ref{Extension Sample
Readfile}, for another option.
@end sidebar
@@ -6431,9 +6438,9 @@ called @dfn{fields}. By default, fields are separated by @dfn{whitespace},
like words in a line.
Whitespace in @command{awk} means any string of one or more spaces,
TABs, or newlines;@footnote{In POSIX @command{awk}, newlines are not
-considered whitespace for separating fields.} other characters, such as
-formfeed, vertical tab, etc., that are
-considered whitespace by other languages, are @emph{not} considered
+considered whitespace for separating fields.} other characters
+that are considered whitespace by other languages
+(such as formfeed, vertical tab, etc.) are @emph{not} considered
whitespace by @command{awk}.
The purpose of fields is to make it more convenient for you to refer to
@@ -6450,7 +6457,7 @@ to refer to a field in an @command{awk} program,
followed by the number of the field you want. Thus, @code{$1}
refers to the first field, @code{$2} to the second, and so on.
(Unlike the Unix shells, the field numbers are not limited to single digits.
-@code{$127} is the one hundred twenty-seventh field in the record.)
+@code{$127} is the 127th field in the record.)
For example, suppose the following is a line of input:
@example
@@ -6520,7 +6527,7 @@ awk '@{ print $NR @}'
@noindent
Recall that @code{NR} is the number of records read so far: one in the
-first record, two in the second, etc. So this example prints the first
+first record, two in the second, and so on. So this example prints the first
field of the first record, the second field of the second record, and so
on. For the twentieth record, field number 20 is printed; most likely,
the record has fewer than 20 fields, so this prints a blank line.
@@ -6537,7 +6544,7 @@ The parentheses are used so that the multiplication is done before the
@samp{$} operation; they are necessary whenever there is a binary
operator@footnote{A @dfn{binary operator}, such as @samp{*} for
multiplication, is one that takes two operands. The distinction
-is required, since @command{awk} also has unary (one-operand)
+is required, because @command{awk} also has unary (one-operand)
and ternary (three-operand) operators.}
in the field-number expression. This example, then, prints the
type of relationship (the fourth field) for every line of the file
@@ -6611,7 +6618,7 @@ $ @kbd{awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped}
@dots{}
@end example
-It is also possible to also assign contents to fields that are out
+It is also possible to assign contents to fields that are out
of range. For example:
@example
@@ -6662,9 +6669,9 @@ else
@noindent
should print @samp{everything is normal}, because @code{NF+1} is certain
-to be out of range. (@xref{If Statement},
+to be out of range. (@DBXREF{If Statement}
for more information about @command{awk}'s @code{if-else} statements.
-@xref{Typing and Comparison},
+@DBXREF{Typing and Comparison}
for more information about the @samp{!=} operator.)
It is important to note that making an assignment to an existing field
@@ -6749,7 +6756,7 @@ in a record simply by setting @code{FS} and @code{OFS}, and then
expecting a plain @samp{print} or @samp{print $0} to print the
modified record.
-But this does not work, since nothing was done to change the record
+But this does not work, because nothing was done to change the record
itself. Instead, you must force the record to be rebuilt, typically
with a statement such as @samp{$1 = $1}, as described earlier.
@end sidebar
@@ -6801,7 +6808,7 @@ the Unix Bourne shell, @command{sh}, or Bash).
@cindex @code{FS} variable, changing value of
The value of @code{FS} can be changed in the @command{awk} program with the
assignment operator, @samp{=} (@pxref{Assignment Ops}).
-Often the right time to do this is at the beginning of execution
+Often, the right time to do this is at the beginning of execution
before any input has been processed, so that the very first record
is read with the proper separator. To do this, use the special
@code{BEGIN} pattern
@@ -6957,7 +6964,7 @@ statement prints the new @code{$0}.
@cindex dark corner, @code{^}, in @code{FS}
There is an additional subtlety to be aware of when using regular expressions
for field splitting.
-It is not well-specified in the POSIX standard, or anywhere else, what @samp{^}
+It is not well specified in the POSIX standard, or anywhere else, what @samp{^}
means when splitting fields. Does the @samp{^} match only at the beginning of
the entire record? Or is each field separator a new string? It turns out that
different @command{awk} versions answer this question differently, and you
@@ -7123,11 +7130,11 @@ awk -F: '$5 == ""' /etc/passwd
@end example
@node Full Line Fields
-@subsection Making The Full Line Be A Single Field
+@subsection Making the Full Line Be a Single Field
Occasionally, it's useful to treat the whole input line as a
single field. This can be done easily and portably simply by
-setting @code{FS} to @code{"\n"} (a newline).@footnote{Thanks to
+setting @code{FS} to @code{"\n"} (a newline):@footnote{Thanks to
Andrew Schorr for this tip.}
@example
@@ -7137,42 +7144,6 @@ awk -F'\n' '@var{program}' @var{files @dots{}}
@noindent
When you do this, @code{$1} is the same as @code{$0}.
-@node Field Splitting Summary
-@subsection Field-Splitting Summary
-
-It is important to remember that when you assign a string constant
-as the value of @code{FS}, it undergoes normal @command{awk} string
-processing. For example, with Unix @command{awk} and @command{gawk},
-the assignment @samp{FS = "\.."} assigns the character string @code{".."}
-to @code{FS} (the backslash is stripped). This creates a regexp meaning
-``fields are separated by occurrences of any two characters.''
-If instead you want fields to be separated by a literal period followed
-by any single character, use @samp{FS = "\\.."}.
-
-The following list summarizes how fields are split, based on the value
-of @code{FS} (@samp{==} means ``is equal to''):
-
-@table @code
-@item FS == " "
-Fields are separated by runs of whitespace. Leading and trailing
-whitespace are ignored. This is the default.
-
-@item FS == @var{any other single character}
-Fields are separated by each occurrence of the character. Multiple
-successive occurrences delimit empty fields, as do leading and
-trailing occurrences.
-The character can even be a regexp metacharacter; it does not need
-to be escaped.
-
-@item FS == @var{regexp}
-Fields are separated by occurrences of characters that match @var{regexp}.
-Leading and trailing matches of @var{regexp} delimit empty fields.
-
-@item FS == ""
-Each individual character in the record becomes a separate field.
-(This is a common extension; it is not specified by the POSIX standard.)
-@end table
-
@sidebar Changing @code{FS} Does Not Affect the Fields
@cindex POSIX @command{awk}, field separators and
@@ -7218,6 +7189,42 @@ root:nSijPlPhZZwgE:0:0:Root:/:
@end example
@end sidebar
+@node Field Splitting Summary
+@subsection Field-Splitting Summary
+
+It is important to remember that when you assign a string constant
+as the value of @code{FS}, it undergoes normal @command{awk} string
+processing. For example, with Unix @command{awk} and @command{gawk},
+the assignment @samp{FS = "\.."} assigns the character string @code{".."}
+to @code{FS} (the backslash is stripped). This creates a regexp meaning
+``fields are separated by occurrences of any two characters.''
+If instead you want fields to be separated by a literal period followed
+by any single character, use @samp{FS = "\\.."}.
+
+The following list summarizes how fields are split, based on the value
+of @code{FS} (@samp{==} means ``is equal to''):
+
+@table @code
+@item FS == " "
+Fields are separated by runs of whitespace. Leading and trailing
+whitespace are ignored. This is the default.
+
+@item FS == @var{any other single character}
+Fields are separated by each occurrence of the character. Multiple
+successive occurrences delimit empty fields, as do leading and
+trailing occurrences.
+The character can even be a regexp metacharacter; it does not need
+to be escaped.
+
+@item FS == @var{regexp}
+Fields are separated by occurrences of characters that match @var{regexp}.
+Leading and trailing matches of @var{regexp} delimit empty fields.
+
+@item FS == ""
+Each individual character in the record becomes a separate field.
+(This is a common extension; it is not specified by the POSIX standard.)
+@end table
+
@sidebar @code{FS} and @code{IGNORECASE}
The @code{IGNORECASE} variable
@@ -7236,7 +7243,7 @@ print $1
@noindent
The output is @samp{aCa}. If you really want to split fields on an
alphabetic character while ignoring case, use a regexp that will
-do it for you. E.g., @samp{FS = "[c]"}. In this case, @code{IGNORECASE}
+do it for you (e.g., @samp{FS = "[c]"}). In this case, @code{IGNORECASE}
will take effect.
@end sidebar
@@ -7246,18 +7253,19 @@ will take effect.
@node Constant Size
@section Reading Fixed-Width Data
+@cindex data, fixed-width
+@cindex fixed-width data
+@cindex advanced features, fixed-width data
+@command{gawk} provides a facility for dealing with
+fixed-width fields with no distinctive field separator.
+
@quotation NOTE
This @value{SECTION} discusses an advanced
feature of @command{gawk}. If you are a novice @command{awk} user,
you might want to skip it on the first reading.
@end quotation
-@cindex data, fixed-width
-@cindex fixed-width data
-@cindex advanced features, fixed-width data
-@command{gawk} provides a facility for dealing with
-fixed-width fields with no distinctive field separator. For example,
-data of this nature arises in the input for old Fortran programs where
+Fixed-width data data arises in the input for old Fortran programs where
numbers are run together, or in the output of programs that did not
anticipate the use of their output as input for other programs.
@@ -7298,15 +7306,10 @@ dave ttyq4 26Jun9115days 46 46 wnewmail
@end group
@end example
-The following program takes the above input, converts the idle time to
+The following program takes this input, converts the idle time to
number of seconds, and prints out the first two fields and the calculated
idle time:
-@quotation NOTE
-This program uses a number of @command{awk} features that
-haven't been introduced yet.
-@end quotation
-
@example
BEGIN @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @}
NR > 2 @{
@@ -7325,6 +7328,11 @@ NR > 2 @{
@}
@end example
+@quotation NOTE
+The preceding program uses a number of @command{awk} features that
+haven't been introduced yet.
+@end quotation
+
Running the program on the data produces the following results:
@example
@@ -7370,7 +7378,7 @@ else
This information is useful when writing a function
that needs to temporarily change @code{FS} or @code{FIELDWIDTHS},
read some records, and then restore the original settings
-(@pxref{Passwd Functions},
+(@DBPXREF{Passwd Functions},
for an example of such a function).
@node Splitting By Content