aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.1
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2010-07-16 14:49:57 +0300
committerArnold D. Robbins <arnold@skeeve.com>2010-07-16 14:49:57 +0300
commit6a2caf2157d87b4b582b2494bdd7d6a688dd0b1f (patch)
tree9a2862cc11be4832f188cfbdce175120ceba5024 /doc/gawk.1
parent315bd501ca696bc3e3c938b4604d8dac7a6f512f (diff)
downloadegawk-6a2caf2157d87b4b582b2494bdd7d6a688dd0b1f.tar.gz
egawk-6a2caf2157d87b4b582b2494bdd7d6a688dd0b1f.tar.bz2
egawk-6a2caf2157d87b4b582b2494bdd7d6a688dd0b1f.zip
Move to gawk-3.1.6.
Diffstat (limited to 'doc/gawk.1')
-rw-r--r--doc/gawk.1237
1 files changed, 164 insertions, 73 deletions
diff --git a/doc/gawk.1 b/doc/gawk.1
index 9859e7d3..3e950452 100644
--- a/doc/gawk.1
+++ b/doc/gawk.1
@@ -3,6 +3,14 @@
.ds AN \s-1ANSI\s+1
.ds GN \s-1GNU\s+1
.ds AK \s-1AWK\s+1
+.de EX
+.nf
+.ft CW
+..
+.de EE
+.ft R
+.fi
+..
.ds EP \fIGAWK: Effective AWK Programming\fP
.if !\n(.g \{\
. if !\w|\*(lq| \{\
@@ -14,7 +22,7 @@
. if \w'\(rq' .ds rq "\(rq
. \}
.\}
-.TH GAWK 1 "June 26 2005" "Free Software Foundation" "Utility Commands"
+.TH GAWK 1 "Oct 19 2007" "Free Software Foundation" "Utility Commands"
.SH NAME
gawk \- pattern scanning and processing language
.SH SYNOPSIS
@@ -53,7 +61,7 @@ file .\|.\|.
.I Gawk
is the \*(GN Project's implementation of the \*(AK programming language.
It conforms to the definition of the language in
-the \*(PX 1003.2 Command Language And Utilities Standard.
+the \*(PX 1003.1 Standard.
This version in turn is based on the description in
.IR "The AWK Programming Language" ,
by Aho, Kernighan, and Weinberger,
@@ -94,7 +102,7 @@ pre-defined \*(AK variables.
.PP
.I Gawk
options may be either traditional \*(PX one letter options,
-or \*(GN style long options. \*(PX options start with a single \*(lq\-\*(rq,
+or \*(GN-style long options. \*(PX options start with a single \*(lq\-\*(rq,
while long options start with \*(lq\-\^\-\*(rq.
Long options are provided for both \*(GN-specific features and
for \*(PX-mandated features.
@@ -119,7 +127,7 @@ remains unique.
.SH OPTIONS
.PP
.I Gawk
-accepts the following options, listed alphabetically.
+accepts the following options, listed by frequency.
.TP
.PD 0
.BI \-F " fs"
@@ -174,7 +182,8 @@ flag sets the maximum number of fields, and the
.B r
flag sets the maximum record size. These two flags and the
.B \-m
-option are from the Bell Laboratories research version of \*(UX
+option are from an earlier version of the Bell Laboratories
+research version of \*(UX
.IR awk .
They are ignored by
.IR gawk ,
@@ -233,7 +242,7 @@ If no
is provided,
.I gawk
uses a file named
-.I awkvars.out
+.B awkvars.out
in the current directory.
.sp .5
Having a list of all the global variables is a good way to look for
@@ -446,6 +455,25 @@ It is intended primarily for medium to large \*(AK programs used
in shell scripts.
.TP
.PD 0
+.B "\-W use\-lc\-numeric"
+.TP
+.PD
+.B \-\^\-use\-lc\-numeric
+This forces
+.I gawk
+to use the locale's decimal point character when parsing input data.
+Although the POSIX standard requires this behavior, and
+.I gawk
+does so when
+.B \-\^\-posix
+is in effect, the default is to follow traditional behavior and use a
+period as the decimal point, even in locales where the period is not the
+decimal point character. This option overrides the default behavior,
+without the full draconian strictness of the
+.B \-\^\-posix
+option.
+.TP
+.PD 0
.B "\-W version"
.TP
.PD
@@ -467,7 +495,7 @@ these options cause an immediate, successful exit.)
.B \-\^\-
Signal the end of options. This is useful to allow further arguments to the
\*(AK program itself to start with a \*(lq\-\*(rq.
-This is mainly for consistency with the argument parsing convention used
+This provides consistency with the argument parsing convention used
by most other \*(PX programs.
.PP
In compatibility mode,
@@ -588,7 +616,7 @@ or both,
depending upon how they are used. \*(AK also has one dimensional
arrays; arrays with multiple dimensions may be simulated.
Several pre-defined variables are set as a program
-runs; these will be described as needed and summarized below.
+runs; these are described as needed and summarized below.
.SS Records
Normally, records are separated by newline characters. You can control how
records are separated by assigning values to the built-in variable
@@ -636,10 +664,10 @@ In the special case that
.B FS
is a single space, fields are separated
by runs of spaces and/or tabs and/or newlines.
-(But see the discussion of
-.BR \-\^\-posix ,
+(But see the section
+.BR "POSIX COMPATIBILITY" ,
below).
-.B NOTE:
+.BR NOTE :
The value of
.B IGNORECASE
(see below) also affects how fields are split when
@@ -828,7 +856,7 @@ and
.B sub()
built-in functions all ignore case when doing regular expression
operations.
-.B NOTE:
+.BR NOTE :
Array subscripting is
.I not
affected.
@@ -858,7 +886,7 @@ the C
facilities such as
.BR isalpha() ,
and
-.BR tolupper() .
+.BR toupper() .
.TP
.B LINT
Provides dynamic control of the
@@ -1003,11 +1031,7 @@ are associative, i.e. indexed by string values.
.PP
The special operator
.B in
-may be used in an
-.B if
-or
-.B while
-statement to see if an array has an index consisting of a particular
+may be used to test if an array has an index consisting of a particular
value.
.PP
.RS
@@ -1040,7 +1064,7 @@ just by specifying the array name without a subscript.
Variables and fields
may be (floating point) numbers, or strings, or both. How the
value of a variable is interpreted depends upon its context. If used in
-a numeric expression, it will be treated as a number, if used as a string
+a numeric expression, it will be treated as a number; if used as a string
it will be treated as a string.
.PP
To force a variable to be treated as a number, add 0 to it; to force it
@@ -1073,6 +1097,16 @@ the variable
.B b
has a string value of \fB"12"\fR and not \fB"12.00"\fR.
.PP
+When operating in POSIX mode (such as with the
+.B \-\^\-posix
+command line option),
+beware that locale settings may interfere with the way
+decimal numbers are treated: the decimal separator of the numbers you
+are feeding to
+.I gawk
+must conform to what your locale would expect, be it
+a comma (,) or a period (.).
+.PP
.I Gawk
performs comparisons as follows:
If two variables are numeric, they are compared numerically.
@@ -1081,13 +1115,6 @@ If one value is numeric and the other has a string value that is a
Otherwise, the numeric value is converted to a string and a string
comparison is performed.
Two strings are compared, of course, as strings.
-Note that the POSIX standard applies the concept of
-\*(lqnumeric string\*(rq everywhere, even to string constants.
-However, this is
-clearly incorrect, and
-.I gawk
-does not do this.
-(Fortunately, this is fixed in the next version of the standard.)
.PP
Note that string constants, such as \fB"57"\fP, are
.I not
@@ -1335,7 +1362,7 @@ matches the end of a string.
character list, matches any of the characters
.IR abc.\|.\|. .
.TP
-.BI [^ abc.\|.\|. ]
+\fB[^\fIabc.\|.\|.\fB]\fR
negated character list, matches any character except
.IR abc.\|.\|. .
.TP
@@ -1429,7 +1456,7 @@ The escape sequences that are valid in string constants (see below)
are also valid in regular expressions.
.PP
.I "Character classes"
-are a new feature introduced in the \*(PX standard.
+are a feature introduced in the \*(PX standard.
A character class is a special notation for describing
lists of characters that have a specific attribute, but where the
actual characters themselves can vary from country to country and/or
@@ -1495,7 +1522,8 @@ match them, and if your character set collated differently from
With the \*(PX character classes, you can write
.BR /[[:alnum:]]/ ,
and this matches
-the alphabetic and numeric characters in your character set.
+the alphabetic and numeric characters in your character set,
+no matter what it is.
.PP
Two additional special sequences can appear in character lists.
These apply to non-\s-1ASCII\s+1 character sets, which can have single symbols
@@ -1505,7 +1533,7 @@ that are represented with more than one
character, as well as several characters that are equivalent for
.IR collating ,
or sorting, purposes. (E.g., in French, a plain \*(lqe\*(rq
-and a grave-accented e\` are equivalent.)
+and a grave-accented \*(lqe\h'-\w:e:u'\`\*(rq are equivalent.)
.TP
Collating Symbols
A collating symbol is a multi-character collating element enclosed in
@@ -1637,6 +1665,13 @@ Addition and subtraction.
.I space
String concatenation.
.TP
+.B "| |&"
+Piped I/O for
+.BR getline ,
+.BR print ,
+and
+.BR printf .
+.TP
.PD 0
.B "< >"
.TP
@@ -1649,7 +1684,7 @@ The regular relational operators.
.TP
.B "~ !~"
Regular expression match, negated match.
-.B NOTE:
+.BR NOTE :
Do not use a constant regular expression
.RB ( /foo/ )
on the left-hand side of a
@@ -1780,6 +1815,10 @@ as above.
Co-processes are a
.I gawk
extension.
+.RI ( command
+can also be a socket. See the subsection
+.BR "Special File Names" ,
+below.)
.TP
.B next
Stop processing the current input record. The next input record
@@ -1856,14 +1895,17 @@ and
.BR printf .
.TP
.BI "print .\|.\|. >>" " file"
-appends output to the
+Appends output to the
.IR file .
.TP
.BI "print .\|.\|. |" " command"
-writes on a pipe.
+Writes on a pipe.
.TP
.BI "print .\|.\|. |&" " command"
-sends data to a co-process.
+Sends data to a co-process or socket.
+(See also the subsection
+.BR "Special File Names" ,
+below.)
.PP
The
.BR getline
@@ -1872,8 +1914,8 @@ Upon an error,
.B ERRNO
contains a string describing the problem.
.PP
-.B NOTE:
-If using a pipe or co-process to
+.BR NOTE :
+If using a pipe, co-process, or socket to
.BR getline ,
or from
.B print
@@ -1883,8 +1925,8 @@ within a loop, you
.I must
use
.B close()
-to create new instances of the command.
-\*(AK does not automatically close pipes or co-processes when
+to create new instances of the command or socket.
+\*(AK does not automatically close pipes, sockets, or co-processes when
they return EOF.
.SS The \fIprintf\fP\^ Statement
.PP
@@ -1907,7 +1949,7 @@ character of that string is printed.
.BR "%d" "," " %i"
A decimal number (the integer part).
.TP
-.B %e , " %E"
+.BR %e , " %E"
A floating point number of the form
.BR [\-]d.dddddde[+\^\-]dd .
The
@@ -1917,11 +1959,22 @@ format uses
instead of
.BR e .
.TP
-.B %f
+.BR %f , " %F"
A floating point number of the form
.BR [\-]ddd.dddddd .
+If the system library supports it,
+.B %F
+is available as well. This is like
+.BR %f ,
+but uses capital letters for special \*(lqnot a number\*(rq
+and \*(lqinfinity\*(rq values. If
+.B %F
+is not available,
+.I gawk
+uses
+.BR %f .
.TP
-.B %g , " %G"
+.BR %g , " %G"
Use
.B %e
or
@@ -1944,7 +1997,7 @@ An unsigned decimal number (again, an integer).
.B %s
A character string.
.TP
-.B %x , " %X"
+.BR %x , " %X"
An unsigned hexadecimal number (an integer).
The
.B %X
@@ -1965,7 +2018,7 @@ outside the range of a C
integer,
.I gawk
switches to the
-.B %g
+.B %0f
format specifier. If
.B \-\^\-lint
is provided on the command line
@@ -2023,8 +2076,9 @@ a nonzero result.
For
.BR %e ,
.BR %E ,
+.B %f
and
-.BR %f ,
+.BR %F ,
the result always contains a
decimal point.
For
@@ -2053,8 +2107,9 @@ A number that specifies the precision to use when printing.
For the
.BR %e ,
.BR %E ,
+.B %f
and
-.BR %f
+.BR %F ,
formats, this specifies the
number of digits you want printed to the right of the decimal point.
For the
@@ -2275,7 +2330,7 @@ The contents of
are sorted using
.IR gawk\^ "'s"
normal rules for
-comparing values, and the indexes of the
+comparing values, and the indices of the
sorted values of
.I s
are replaced with sequential
@@ -2288,7 +2343,7 @@ is first duplicated into
.IR d ,
and then
.I d
-is sorted, leaving the indexes of the
+is sorted, leaving the indices of the
source array
.I s
unchanged.
@@ -2530,6 +2585,16 @@ with all the lower-case characters in
.I str
translated to their corresponding upper-case counterparts.
Non-alphabetic characters are left unchanged.
+.PP
+As of version 3.1.5,
+.I gawk
+is multibyte aware. This means that
+.BR index() ,
+.BR length() ,
+.B substr()
+and
+.B match()
+all work in terms of characters, not bytes.
.SS Time Functions
Since one of the primary uses of \*(AK programs is processing log files
that contain time stamp information,
@@ -2574,11 +2639,15 @@ is out of range,
.B mktime()
returns \-1.
.TP
-\fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
+\fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR[\fB, \fIutc-flag\fR]]]\fB)\fR
Formats
.I timestamp
according to the specification in
.IR format.
+If
+.I utc-flag
+is present and is non-zero or non-null, the result
+is in UTC, otherwise the result is in local time.
The
.I timestamp
should be of the same form as returned by
@@ -2595,14 +2664,6 @@ See the specification for the
.B strftime()
function in \*(AN C for the format conversions that are
guaranteed to be available.
-A public-domain version of
-.IR strftime (3)
-and a man page for it come with
-.IR gawk ;
-if that version was used to build
-.IR gawk ,
-then all of the conversions described in that man page are available to
-.IR gawk.
.TP
.B systime()
Returns the current time of day as the number of seconds since the Epoch
@@ -2613,7 +2674,7 @@ Starting with version 3.1 of
the following bit manipulation functions are available.
They work by converting double-precision floating point
values to
-.B "unsigned long"
+.B uintmax_t
integers, doing the operation, and then converting the
result back to floating point.
The functions are:
@@ -2771,7 +2832,7 @@ function f(p, q, a, b) # a and b are local
The left parenthesis in a function call is required
to immediately follow the function name,
without any intervening white space.
-This is to avoid a syntactic ambiguity with the concatenation operator.
+This avoids a syntactic ambiguity with the concatenation operator.
This restriction does not apply to the built-in functions listed above.
.PP
Functions may call each other and may be recursive.
@@ -2819,7 +2880,7 @@ Returns the value returned by
.ft B
This function is provided and documented in \*(EP,
but everything about this feature is likely to change
-in the next release.
+eventually.
We STRONGLY recommend that you do not use this feature
for anything that you aren't willing to redo.
.ft R
@@ -2834,7 +2895,9 @@ or whatever file was named with the
.B \-\^\-profile
option. It then continues to run.
.B SIGHUP
-causes it to dump the profile and function call stack and then exit.
+causes
+.I pgawk
+to dump the profile and function call stack and then exit.
.SH EXAMPLES
.nf
Print and sort the login names of all users:
@@ -2907,10 +2970,11 @@ action to assign a value to the
.B TEXTDOMAIN
variable to set the text domain to a name associated with your program.
.sp
-.ti +5n
-.ft B
+.RS
+.EX
BEGIN { TEXTDOMAIN = "myprog" }
-.ft R
+.EE
+.RE
.sp
This allows
.I gawk
@@ -2942,9 +3006,9 @@ to generate a
file for your program.
.TP
5.
-Provide appropriate translations, and build and install a corresponding
+Provide appropriate translations, and build and install the corresponding
.B \&.mo
-file.
+files.
.PP
The internationalization features are described in full detail in \*(EP.
.SH POSIX COMPATIBILITY
@@ -3080,7 +3144,9 @@ invoking
.I gawk
with the
.B \-\^\-traditional
-option.
+or
+.B \-\^\-posix
+options.
.PP
The following features of
.I gawk
@@ -3186,6 +3252,10 @@ The ability to use positional specifiers with
.B printf
and
.BR sprintf() .
+.TP
+\(bu
+The ability to pass an array to
+.BR length() .
.\" New keywords or changes to keywords
.TP
\(bu
@@ -3258,7 +3328,7 @@ option is \*(lqt\*(rq, then
is set to the tab character.
Note that typing
.B "gawk \-F\et \&.\|.\|."
-simply causes the shell to quote the \*(lqt,\*(rq, and does not pass
+simply causes the shell to quote the \*(lqt,\*(rq and does not pass
\*(lq\et\*(rq to the
.B \-F
option.
@@ -3310,6 +3380,15 @@ command, then it accepts an additional control-flow statement:
\fB}\fR
.fi
.RE
+.PP
+If
+.I gawk
+is configured with the
+.B \-\^\-disable\-directories-fatal
+option, then it will silently skip directories named on the command line.
+Otherwise, it will do so only if invoked with the
+.B \-\^\-traditional
+option.
.SH ENVIRONMENT VARIABLES
The
.B AWKPATH
@@ -3350,6 +3429,8 @@ Addison-Wesley, 1988. ISBN 0-201-07981-X.
.PP
\*(EP,
Edition 3.0, published by the Free Software Foundation, 2001.
+The current version of this document is available online at
+.BR http://www.gnu.org/software/gawk/manual .
.SH BUGS
The
.B \-F
@@ -3385,13 +3466,16 @@ The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
Scott Deifik is the current DOS maintainer. Pat Rankin did the
port to VMS, and Michal Jaegermann did the port to the Atari ST.
The port to OS/2 was done by Kai Uwe Rommel, with contributions and
-help from Darrel Hankerson. Fred Fish supplied support for the Amiga,
-Stephen Davies provided the Tandem port,
+help from Darrel Hankerson.
+Juan M.\& Guerrero now maintains the OS/2 port.
+Fred Fish supplied support for the Amiga,
and Martin Brown provided the BeOS port.
+Stephen Davies provided the original Tandem port, and
+Matthew Woehlke provided changes for Tandem's POSIX-compliant systems.
.SH VERSION INFORMATION
This man page documents
.IR gawk ,
-version 3.1.5.
+version 3.1.6.
.SH BUG REPORTS
If you find a bug in
.IR gawk ,
@@ -3404,12 +3488,18 @@ Please include your operating system and its revision, the version of
what C compiler you used to compile it, and a test program
and data that are as small as possible for reproducing the problem.
.PP
-Before sending a bug report, please do two things. First, verify that
+Before sending a bug report, please do the following things. First, verify that
you have the latest version of
.IR gawk .
Many bugs (usually subtle ones) are fixed at each release, and if
yours is out of date, the problem may already have been solved.
-Second, please read this man page and the reference manual carefully to
+Second, please see if setting the environment variable
+.B LC_ALL
+to
+.B LC_ALL=C
+causes things to behave as you expect. If so, it's a locale issue,
+and may or may not really be a bug.
+Finally, please read this man page and the reference manual carefully to
be sure that what you think is a bug really is, instead of just a quirk
in the language.
.PP
@@ -3435,7 +3525,8 @@ provided valuable assistance during testing and debugging.
We thank him.
.SH COPYING PERMISSIONS
Copyright \(co 1989, 1991, 1992, 1993, 1994, 1995, 1996,
-1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005, 2007
+Free Software Foundation, Inc.
.PP
Permission is granted to make and distribute verbatim copies of
this manual page provided the copyright notice and this permission