aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi312
1 files changed, 282 insertions, 30 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 54c4f913..dec51695 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -1527,9 +1527,11 @@ default @command{awk} utility. A more modern @command{awk} lives in
if you try the test program:
@example
+@group
$ @kbd{awk 1 /dev/null}
@error{} awk: syntax error near line 1
@error{} awk: bailing out near line 1
+@end group
@end example
@noindent
@@ -2991,10 +2993,12 @@ for the
single- and double-quote characters, like so:
@example
+@group
$ @kbd{awk 'BEGIN @{ print "Here is a single quote <\47>" @}'}
@print{} Here is a single quote <'>
$ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'}
@print{} Here is a double quote <">
+@end group
@end example
@noindent
@@ -3266,8 +3270,10 @@ action---so it uses the default action, printing the record.
Print the length of the longest input line:
@example
+@group
awk '@{ if (length($0) > max) max = length($0) @}
END @{ print max @}' data
+@end group
@end example
The code associated with @code{END} executes after all
@@ -3584,11 +3590,13 @@ starts a comment, it ignores @emph{everything} on the rest of the
line. For example:
@example
+@group
$ @kbd{gawk 'BEGIN @{ print "dont panic" # a friendly \}
> @kbd{ BEGIN rule}
> @kbd{@}'}
@error{} gawk: cmd. line:2: BEGIN rule
@error{} gawk: cmd. line:2: ^ syntax error
+@end group
@end example
@noindent
@@ -4785,10 +4793,12 @@ The files to be included may be nested; e.g., given a third
script, namely @file{test3}:
@example
+@group
@@include "test2"
BEGIN @{
print "This is script test3."
@}
+@end group
@end example
@noindent
@@ -4875,8 +4885,10 @@ $ @kbd{gawk '@@load "ordchr"; BEGIN @{print chr(65)@}'}
This is equivalent to the following example:
@example
+@group
$ @kbd{gawk -lordchr 'BEGIN @{print chr(65)@}'}
@print{} A
+@end group
@end example
@noindent
@@ -6499,8 +6511,10 @@ with each @samp{u} changed to a newline. Here are the results of running
the program on @file{mail-list}:
@example
+@group
$ @kbd{awk 'BEGIN @{ RS = "u" @}}
> @kbd{@{ print $0 @}' mail-list}
+@end group
@print{} Amelia 555-5553 amelia.zodiac
@print{} sq
@print{} e@@gmail.com F
@@ -6657,9 +6671,11 @@ matches either a newline or a series of one or more uppercase letters
with optional leading and/or trailing whitespace:
@example
+@group
$ @kbd{echo record 1 AAAA record 2 BBBB record 3 |}
> @kbd{gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @}}
> @kbd{@{ print "Record =", $0,"and RT = [" RT "]" @}'}
+@end group
@print{} Record = record 1 and RT = [ AAAA ]
@print{} Record = record 2 and RT = [ BBBB ]
@print{} Record = record 3 and RT = [
@@ -7100,8 +7116,10 @@ values of the fields and @code{OFS}. To do this, use the
seemingly innocuous assignment:
@example
+@group
$1 = $1 # force record to be reconstituted
print $0 # or whatever else with $0
+@end group
@end example
@noindent
@@ -7997,16 +8015,20 @@ Putting this to use, here is a simple program to parse the data:
@example
@c file eg/misc/simple-csv.awk
+@group
BEGIN @{
FPAT = "([^,]+)|(\"[^\"]+\")"
@}
+@end group
+@group
@{
print "NF = ", NF
for (i = 1; i <= NF; i++) @{
printf("$%d = <%s>\n", i, $i)
@}
@}
+@end group
@c endfile
@end example
@@ -8447,6 +8469,7 @@ read-a-line-and-check-each-rule loop of @command{awk} never sees it.
The following example swaps every two lines of input:
@example
+@group
@{
if ((getline tmp) > 0) @{
print tmp
@@ -8454,6 +8477,7 @@ The following example swaps every two lines of input:
@} else
print $0
@}
+@end group
@end example
@noindent
@@ -8596,6 +8620,7 @@ lines that begin with @samp{@@execute}, which are replaced by the output
produced by running the rest of the line as a shell command:
@example
+@group
@{
if ($1 == "@@execute") @{
tmp = substr($0, 10) # Remove "@@execute"
@@ -8605,6 +8630,7 @@ produced by running the rest of the line as a shell command:
@} else
print
@}
+@end group
@end example
@noindent
@@ -8908,12 +8934,14 @@ For example, a TCP client can decide to give up on receiving
any response from the server after a certain amount of time:
@example
+@group
Service = "/inet/tcp/0/localhost/daytime"
PROCINFO[Service, "READ_TIMEOUT"] = 100
if ((Service |& getline) > 0)
print $0
else if (ERRNO != "")
print ERRNO
+@end group
@end example
Here is how to read interactively from the user@footnote{This assumes
@@ -9255,10 +9283,12 @@ newlines:
@end ifnotinfo
@example
+@group
$ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'}
@print{} line one
@print{} line two
@print{} line three
+@end group
@end example
@cindex fields, printing
@@ -9510,12 +9540,14 @@ The output separator variables @code{OFS} and @code{ORS} have no effect
on @code{printf} statements. For example:
@example
+@group
$ @kbd{awk 'BEGIN @{}
> @kbd{ORS = "\nOUCH!\n"; OFS = "+"}
> @kbd{msg = "Don\47t Panic!"}
> @kbd{printf "%s\n", msg}
> @kbd{@}'}
@print{} Don't Panic!
+@end group
@end example
@noindent
@@ -10038,9 +10070,11 @@ alone for now and let's hope no-one notices.
@end ignore
@example
+@group
awk '@{ print $1 > "names.unsorted"
command = "sort -r > names.sorted"
print $1 | command @}' mail-list
+@end group
@end example
The unsorted list is written with an ordinary redirection, while
@@ -10375,7 +10409,7 @@ The @var{protocol} is one of @samp{tcp} or @samp{udp},
and the other fields represent the other essential pieces of information
for making a networking connection.
These @value{FN}s are used with the @samp{|&} operator for communicating
-with a coprocess
+with @w{a coprocess}
(@pxref{Two-way I/O}).
This is an advanced feature, mentioned here only for completeness.
Full discussion is delayed until
@@ -10474,10 +10508,14 @@ it is good practice to use a variable to store the @value{FN} or command.
The previous example becomes the following:
@example
+@group
sortcom = "sort -r names"
sortcom | getline foo
+@end group
+@group
@dots{}
close(sortcom)
+@end group
@end example
@noindent
@@ -10625,7 +10663,7 @@ if it fails.
@float Table,table-close-pipe-return-values
@caption{Return values from @code{close()} of a pipe}
-@multitable @columnfractions .40 .60
+@multitable @columnfractions .50 .50
@headitem Situation @tab Return value from @code{close()}
@item Normal exit of command @tab Command's exit status
@item Death by signal of command @tab 256 + number of murderous signal
@@ -10691,7 +10729,7 @@ if it fails.
@float Table,table-close-pipe-return-values
@caption{Return values from @code{close()} of a pipe}
-@multitable @columnfractions .40 .60
+@multitable @columnfractions .50 .50
@headitem Situation @tab Return value from @code{close()}
@item Normal exit of command @tab Command's exit status
@item Death by signal of command @tab 256 + number of murderous signal
@@ -10721,7 +10759,8 @@ disk) is a fatal error.
@example
$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'}
-@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory)
+@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No
+@error{} such file or directory)
@end example
@command{gawk} makes it possible to detect that an error has
@@ -11181,6 +11220,7 @@ confusion can arise when attempting to use regexp constants as arguments
to user-defined functions (@pxref{User-defined}). For example:
@example
+@group
function mysub(pat, repl, str, global)
@{
if (global)
@@ -11189,13 +11229,16 @@ function mysub(pat, repl, str, global)
sub(pat, repl, str)
return str
@}
+@end group
+@group
@{
@dots{}
text = "hi! hi yourself!"
mysub(/hi/, "howdy", text, 1)
@dots{}
@}
+@end group
@end example
@c @cindex automatic warnings
@@ -11443,8 +11486,10 @@ is performed. If numeric values appear in string concatenation, they
are converted to strings. Consider the following:
@example
+@group
two = 2; three = 3
print (two three) + 4
+@end group
@end example
@noindent
@@ -11946,10 +11991,14 @@ to it. In the following program fragment, the variable
@code{foo} has a numeric value at first, and a string value later on:
@example
+@group
foo = 1
print foo
+@end group
+@group
foo = "bar"
print foo
+@end group
@end example
@noindent
@@ -12021,16 +12070,20 @@ righthand expression. For example:
@cindex Rankin, Pat
@example
+@group
# Thanks to Pat Rankin for this example
BEGIN @{
foo[rand()] += 5
for (x in foo)
print x, foo[x]
+@end group
+@group
bar[rand()] = bar[rand()] + 5
for (x in bar)
print x, bar[x]
@}
+@end group
@end example
@cindex operators, assignment, evaluation order
@@ -12816,10 +12869,12 @@ leave off one of the @samp{=} characters. The result is still valid
@command{awk} code, but the program does not do what is intended:
@example
+@group
if (a = b) # oops! should be a == b
@dots{}
else
@dots{}
+@end group
@end example
@noindent
@@ -13771,8 +13826,10 @@ $ @kbd{awk '! /li/' mail-list}
@print{} Bill 555-1675 bill.drowning@@hotmail.com A
@print{} Camilla 555-2912 camilla.infusarum@@skynet.be R
@print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F
+@group
@print{} Martin 555-6480 martin.codicibus@@hotmail.com A
@print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R
+@end group
@end example
@cindex @code{BEGIN} pattern, Boolean patterns and
@@ -14163,10 +14220,12 @@ the variable's value into the program inside the script.
For example, consider the following program:
@example
+@group
printf "Enter search pattern: "
read pattern
awk "/$pattern/ "'@{ nmatches++ @}
END @{ print nmatches, "found" @}' /path/to/data
+@end group
@end example
@noindent
@@ -14355,10 +14414,12 @@ the null string; otherwise, the condition is true.
Refer to the following:
@example
+@group
if (x % 2 == 0)
print "x is even"
else
print "x is odd"
+@end group
@end example
In this example, if the expression @samp{x % 2 == 0} is true (i.e.,
@@ -14680,6 +14741,7 @@ finds the smallest divisor of any integer, and also identifies prime
numbers:
@example
+@group
# find smallest divisor of num
@{
num = $1
@@ -14687,11 +14749,14 @@ numbers:
if (num % divisor == 0)
break
@}
+@end group
+@group
if (num % divisor == 0)
printf "Smallest divisor of %d is %d\n", num, divisor
else
printf "%d is prime\n", num
@}
+@end group
@end example
When the remainder is zero in the first @code{if} statement, @command{awk}
@@ -14999,14 +15064,18 @@ using an @code{exit} statement with a nonzero argument, as shown
in the following example:
@example
+@group
BEGIN @{
if (("date" | getline date_now) <= 0) @{
print "Can't get system date" > "/dev/stderr"
exit 1
@}
+@end group
+@group
print "current date is", date_now
close("date")
@}
+@end group
@end example
@quotation NOTE
@@ -15304,6 +15373,7 @@ Unlike most @command{awk} arrays,
In the following example:
@example
+@group
$ @kbd{awk 'BEGIN @{}
> @kbd{for (i = 0; i < ARGC; i++)}
> @kbd{print ARGV[i]}
@@ -15311,6 +15381,7 @@ $ @kbd{awk 'BEGIN @{}
@print{} awk
@print{} inventory-shipped
@print{} mail-list
+@end group
@end example
@noindent
@@ -15735,12 +15806,14 @@ points out that it effectively gives @command{awk} data pointers. Consider his
example:
@example
+@group
# Indirect multiply of any variable by amount, return result
function multiply(variable, amount)
@{
return SYMTAB[variable] *= amount
@}
+@end group
@end example
@noindent
@@ -15858,6 +15931,7 @@ presented the following program describing the information contained in @code{AR
and @code{ARGV}:
@example
+@group
$ @kbd{awk 'BEGIN @{}
> @kbd{for (i = 0; i < ARGC; i++)}
> @kbd{print ARGV[i]}
@@ -15865,6 +15939,7 @@ $ @kbd{awk 'BEGIN @{}
@print{} awk
@print{} inventory-shipped
@print{} mail-list
+@end group
@end example
@noindent
@@ -16461,8 +16536,10 @@ For example, this statement tests whether the array @code{frequencies}
contains the index @samp{2}:
@example
+@group
if (2 in frequencies)
print "Subscript 2 is present."
+@end group
@end example
Note that this is @emph{not} a test of whether the array
@@ -16472,8 +16549,10 @@ There is no way to do that except to scan all the elements. Also, this
(incorrect) alternative does:
@example
+@group
if (frequencies[2] != "")
print "Subscript 2 is present."
+@end group
@end example
@node Assigning Elements
@@ -16530,6 +16609,7 @@ all the lines.
When this program is run with the following input:
@example
+@group
@c file eg/misc/arraymax.data
5 I am the Five man
2 Who are you? The new number two!
@@ -16537,17 +16617,20 @@ When this program is run with the following input:
1 Who is number one?
3 I three you.
@c endfile
+@end group
@end example
@noindent
Its output is:
@example
+@group
1 Who is number one?
2 Who are you? The new number two!
3 I three you.
4 . . . And four on the floor
5 I am the Five man
+@end group
@end example
If a line number is repeated, the last line with a given number overrides
@@ -16556,11 +16639,13 @@ Gaps in the line numbers can be handled with an easy improvement to the
program's @code{END} rule, as follows:
@example
+@group
END @{
for (x = 1; x <= max; x++)
if (x in arr)
print arr[x]
@}
+@end group
@end example
@node Scanning an Array
@@ -16580,8 +16665,10 @@ So @command{awk} has a special kind of @code{for} statement for scanning
an array:
@example
+@group
for (@var{var} in @var{array})
@var{body}
+@end group
@end example
@noindent
@@ -16602,12 +16689,15 @@ such words.
for more information on the built-in function @code{length()}.
@example
+@group
# Record a 1 for each word that is used at least once
@{
for (i = 1; i <= NF; i++)
used[$i] = 1
@}
+@end group
+@group
# Find number of distinct words more than 10 characters long
END @{
for (x in used) @{
@@ -16618,6 +16708,7 @@ END @{
@}
print num_long_words, "words longer than 10 characters"
@}
+@end group
@end example
@noindent
@@ -17005,9 +17096,11 @@ same as assigning it a null value (the empty string, @code{""}).
For example:
@example
+@group
foo[4] = ""
if (4 in foo)
print "This is printed, even though foo[4] is empty"
+@end group
@end example
@cindex lint checking, array elements
@@ -17162,22 +17255,26 @@ END @{
When given the input:
@example
+@group
1 2 3 4 5 6
2 3 4 5 6 1
3 4 5 6 1 2
4 5 6 1 2 3
+@end group
@end example
@noindent
the program produces the following output:
@example
+@group
4 3 2 1
5 4 3 2
6 5 4 3
1 6 5 4
2 1 6 5
3 2 1 6
+@end group
@end example
@node Multiscanning
@@ -17357,15 +17454,19 @@ you can often devise workarounds using control statements. For example,
the following code prints the elements of our main array @code{a}:
@example
+@group
for (i in a) @{
for (j in a[i]) @{
if (j == 3) @{
for (k in a[i][j])
print a[i][j][k]
+@end group
+@group
@} else
print a[i][j]
@}
@}
+@end group
@end example
@noindent
@@ -17815,9 +17916,11 @@ asort(a)
results in the following contents of @code{a}:
@example
+@group
a[1] = "cul"
a[2] = "de"
a[3] = "sac"
+@end group
@end example
The @code{asorti()} function works similarly to @code{asort()}; however,
@@ -18906,6 +19009,9 @@ a file or pipe that was opened for reading (such as with @code{getline}),
or if @var{filename} is not an open file, pipe, or coprocess.
In such a case, @code{fflush()} returns @minus{}1, as well.
+@c end the table to let the sidebar take up the full width of the page.
+@end table
+
@cindex sidebar, Interactive Versus Noninteractive Buffering
@ifdocbook
@docbook
@@ -19006,6 +19112,7 @@ it is all buffered and sent down the pipe to @command{cat} in one shot.
@end cartouche
@end ifnotdocbook
+@table @asis
@item @code{system(@var{command})}
@cindexawkfunc{system}
@cindex invoke shell command
@@ -19798,7 +19905,7 @@ that illustrates the use of these functions:
@example
@group
@c file eg/lib/bits2str.awk
-# bits2str --- turn a byte into readable ones and zeros
+# bits2str --- turn an integer into readable ones and zeros
function bits2str(bits, data, mask)
@{
@@ -19820,7 +19927,7 @@ function bits2str(bits, data, mask)
@c this is a hack to make testbits.awk self-contained
@ignore
@c file eg/prog/testbits.awk
-# bits2str --- turn a byte into readable 1's and 0's
+# bits2str --- turn an integer into readable ones and zeros
function bits2str(bits, data, mask)
@{
@@ -19861,7 +19968,8 @@ $ @kbd{gawk -f testbits.awk}
@print{} 123 = 01111011
@print{} 0123 = 01010011
@print{} 0x99 = 10011001
-@print{} compl(0x99) = 0x3fffffffffff66 = 00111111111111111111111111111111111111111111111101100110
+@print{} compl(0x99) = 0x3fffffffffff66 =
+@print{} 00111111111111111111111111111111111111111111111101100110
@print{} lshift(0x99, 2) = 0x264 = 0000001001100100
@print{} rshift(0x99, 2) = 0x26 = 00100110
@end example
@@ -20200,10 +20308,12 @@ entire program before starting to execute any of it.
The definition of a function named @var{name} looks like this:
@display
+@group
@code{function} @var{name}@code{(}[@var{parameter-list}]@code{)}
@code{@{}
@var{body-of-function}
@code{@}}
+@end group
@end display
@cindex names, functions
@@ -20371,11 +20481,13 @@ This function deletes all the elements in an array (recall that the
extra whitespace signifies the start of the local variable list):
@example
+@group
function delarray(a, i)
@{
for (i in a)
delete a[i]
@}
+@end group
@end example
When working with arrays, it is often necessary to delete all the elements
@@ -20582,10 +20694,12 @@ In addition, recursive calls create new arrays.
Consider this example:
@example
+@group
function some_func(p1, a)
@{
if (p1++ > 3)
return
+@end group
a[p1] = p1
@@ -20649,12 +20763,14 @@ this has no effect on any other variables. Thus, if @code{myfunc()}
does this:
@example
+@group
function myfunc(str)
@{
print str
str = "zzz"
print str
@}
+@end group
@end example
@noindent
@@ -20810,11 +20926,13 @@ function maxelt(vec, i, ret)
return ret
@}
+@group
# Load all fields of each record into nums.
@{
for(i = 1; i <= NF; i++)
nums[NR, i] = $i
@}
+@end group
END @{
print maxelt(nums)
@@ -21108,12 +21226,14 @@ first thing to do is write some comparison functions:
@example
@c file eg/prog/indirectcall.awk
+@group
# num_lt --- do a numeric less than comparison
function num_lt(left, right)
@{
return ((left + 0) < (right + 0))
@}
+@end group
# num_ge --- do a numeric greater than or equal to comparison
@@ -21162,19 +21282,23 @@ names of the two comparison functions:
@example
@c file eg/prog/indirectcall.awk
+@group
# sort --- sort the data in ascending order and return it as a string
function sort(first, last)
@{
return do_sort(first, last, "num_lt")
@}
+@end group
+@group
# rsort --- sort the data in descending order and return it as a string
function rsort(first, last)
@{
return do_sort(first, last, "num_ge")
@}
+@end group
@c endfile
@end example
@@ -21674,6 +21798,7 @@ been true but was not, and then it kills the program. In C, using
@code{assert()} looks this:
@example
+@group
#include <assert.h>
int myfunc(int a, double b)
@@ -21681,6 +21806,7 @@ int myfunc(int a, double b)
assert(a <= 5 && b >= 17.1);
@dots{}
@}
+@end group
@end example
If the assertion fails, the program prints a message similar to this:
@@ -21838,9 +21964,10 @@ function round(x, ival, aval, fraction)
@}
@c endfile
@c don't include test harness in the file that gets installed
-
+@group
# test harness
# @{ print $0, round($0) @}
+@end group
@end example
@node Cliff Random Function
@@ -22246,7 +22373,7 @@ if (length(contents) == 0)
@end example
This tests the result to see if it is empty or not. An equivalent
-test would be @samp{contents == ""}.
+test would be @samp{@w{contents == ""}}.
@xref{Extension Sample Readfile} for an extension function that
also reads an entire file into memory.
@@ -22577,8 +22704,10 @@ $ @kbd{gawk -f rewind.awk -f test.awk data }
@print{} data 1 a
@print{} data 2 b
@print{} data 3 c
+@group
@print{} data 4 d
@print{} data 5 e
+@end group
@end example
@node File Checking
@@ -23793,8 +23922,10 @@ function getgrent()
_gr_init()
if (++_gr_count in _gr_bycount)
return _gr_bycount[_gr_count]
+@group
return ""
@}
+@end group
@c endfile
@end example
@@ -24324,10 +24455,12 @@ list of fields or characters:
if (by_fields == 0 && by_chars == 0)
by_fields = 1 # default
+@group
if (fieldlist == "") @{
print "cut: needs list for -c or -f" > "/dev/stderr"
exit 1
@}
+@end group
if (by_fields)
set_fieldlist()
@@ -24668,8 +24801,10 @@ function endfile(file)
print fcount
@}
+@group
total += fcount
@}
+@end group
@c endfile
@end example
@@ -24826,11 +24961,15 @@ BEGIN @{
pw = getpwuid(uid)
pr_first_field(pw)
+@group
if (euid != uid) @{
printf(" euid=%d", euid)
pw = getpwuid(euid)
+@end group
+@group
pr_first_field(pw)
@}
+@end group
printf(" gid=%d", gid)
pw = getgrgid(gid)
@@ -24958,14 +25097,17 @@ BEGIN @{
# test argv in case reading from stdin instead of file
if (i in ARGV)
i++ # skip datafile name
+@group
if (i in ARGV) @{
outfile = ARGV[i]
ARGV[i] = ""
@}
-
+@end group
+@group
s1 = s2 = "a"
out = (outfile s1 s2)
@}
+@end group
@c endfile
@end example
@@ -25121,11 +25263,15 @@ line into each file on the command line, and then to the standard output:
It is also possible to write the loop this way:
@example
+@group
for (i in copy)
if (append)
print >> copy[i]
+@end group
+@group
else
print > copy[i]
+@end group
@end example
@noindent
@@ -25276,10 +25422,12 @@ BEGIN @{
usage()
@}
+@group
if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{
charcount = substr(ARGV[Optind], 2) + 0
Optind++
@}
+@end group
for (i = 1; i < Optind; i++)
ARGV[i] = ""
@@ -25313,10 +25461,12 @@ strings are then compared and @code{are_equal()} returns the result:
@example
@c file eg/prog/uniq.awk
+@group
function are_equal( n, m, clast, cline, alast, aline)
@{
if (fcount == 0 && charcount == 0)
return (last == $0)
+@end group
if (fcount > 0) @{
n = split(last, alast)
@@ -25331,9 +25481,11 @@ function are_equal( n, m, clast, cline, alast, aline)
clast = substr(clast, charcount + 1)
cline = substr(cline, charcount + 1)
@}
+@group
return (clast == cline)
@}
+@end group
@c endfile
@end example
@@ -25392,11 +25544,13 @@ NR == 1 @{
END @{
if (do_count)
printf("%4d %s\n", count, last) > outputfile
+@group
else if ((repeated_only && count > 1) ||
(non_repeated_only && count == 1))
print last > outputfile
close(outputfile)
@}
+@end group
@c endfile
@end example
@@ -26191,10 +26345,12 @@ At first glance, a program like this would seem to do the job:
freq[$i]++
@}
+@group
END @{
for (word in freq)
printf "%s\t%d\n", word, freq[word]
@}
+@end group
@end example
The program relies on @command{awk}'s default field-splitting
@@ -26584,9 +26740,11 @@ line. That line is then printed to the output file:
i++
@}
@}
+@group
print join(a, 1, n, SUBSEP) > curfile
@}
@}
+@end group
@c endfile
@end example
@@ -26672,10 +26830,12 @@ function usage()
exit 1
@}
+@group
BEGIN @{
# validate arguments
if (ARGC < 3)
usage()
+@end group
RS = ARGV[1]
ORS = ARGV[2]
@@ -27069,13 +27229,11 @@ the program is done:
continue
@}
fpath = pathto($2)
-@group
if (fpath == "") @{
printf("igawk: %s:%d: cannot find %s\n",
input[stackptr], FNR, $2) > "/dev/stderr"
continue
@}
-@end group
if (! (fpath in processed)) @{
processed[fpath] = input[stackptr]
input[++stackptr] = fpath # push onto stack
@@ -27332,10 +27490,12 @@ notice and this notice are preserved.
Here is the program:
@example
+@group
awk 'BEGIN@{O="~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c";
printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O,
X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O,
O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),x-O@}'
+@end group
@end example
@cindex Johansen, Chris
@@ -27823,11 +27983,13 @@ Our first comparison function can be used to scan an array in
numerical order of the indices:
@example
+@group
function cmp_num_idx(i1, v1, i2, v2)
@{
# numerical index comparison, ascending order
return (i1 - i2)
@}
+@end group
@end example
Our second function traverses an array based on the string order of
@@ -27932,10 +28094,13 @@ function cmp_field(i1, v1, i2, v2)
a[NR][i] = $i
@}
+@group
END @{
PROCINFO["sorted_in"] = "cmp_field"
+@end group
if (POS < 1 || POS > NF)
POS = 1
+
for (i in a) @{
for (j = 1; j <= NF; j++)
printf("%s%c", a[i][j], j < NF ? ":" : "")
@@ -27992,6 +28157,7 @@ function cmp_numeric(i1, v1, i2, v2)
return (v1 != v2) ? (v2 - v1) : (i2 - i1)
@}
+@group
function cmp_string(i1, v1, i2, v2)
@{
# string value (and index) comparison, descending order
@@ -27999,6 +28165,7 @@ function cmp_string(i1, v1, i2, v2)
v2 = v2 i2
return (v1 > v2) ? -1 : (v1 != v2)
@}
+@end group
@end example
@c Avoid using the term ``stable'' when describing the unpredictable behavior
@@ -28152,11 +28319,13 @@ The following example demonstrates the use of a comparison function with
both values to lowercase in order to compare them ignoring case.
@example
+@group
# case_fold_compare --- compare as strings, ignoring case
function case_fold_compare(i1, v1, i2, v2, l, r)
@{
l = tolower(v1)
+@end group
r = tolower(v2)
if (l < r)
@@ -29513,8 +29682,10 @@ This is somewhat counterintuitive.
and those with positional specifiers in the same string:
@example
+@group
$ @kbd{gawk 'BEGIN @{ printf "%d %3$s\n", 1, 2, "hi" @}'}
@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none
+@end group
@end example
@quotation NOTE
@@ -30139,8 +30310,10 @@ be inside this function. To investigate further, we must begin
@samp{n} (for ``next''):
@example
+@group
gawk> @kbd{n}
@print{} 66 if (fcount > 0) @{
+@end group
@end example
This tells us that @command{gawk} is now ready to execute line 66, which
@@ -30909,10 +31082,12 @@ partial dump of Davide Brini's obfuscated code
@c FIXME: This will need updating if num-handler branch is ever merged in.
@smallexample
+@group
gawk> @kbd{dump}
@print{} # BEGIN
@print{}
@print{} [ 1:0xfcd340] Op_rule : [in_rule = BEGIN] [source_file = brini.awk]
+@end group
@print{} [ 1:0xfcc240] Op_push_i : "~" [MALLOC|STRING|STRCUR]
@print{} [ 1:0xfcc2a0] Op_push_i : "~" [MALLOC|STRING|STRCUR]
@print{} [ 1:0xfcc280] Op_match :
@@ -30945,18 +31120,18 @@ gawk> @kbd{dump}
@print{} [ :0xfcc660] Op_no_op :
@print{} [ 1:0xfcc520] Op_assign_concat : c
@print{} [ :0xfcc620] Op_jmp : [target_jmp = 0xfcc440]
-@print{}
@dots{}
-@print{}
@print{} [ 2:0xfcc5a0] Op_K_printf : [expr_count = 17] [redir_type = ""]
@print{} [ :0xfcc140] Op_no_op :
@print{} [ :0xfcc1c0] Op_atexit :
@print{} [ :0xfcc640] Op_stop :
@print{} [ :0xfcc180] Op_no_op :
@print{} [ :0xfcd150] Op_after_beginfile :
+@group
@print{} [ :0xfcc160] Op_no_op :
@print{} [ :0xfcc1a0] Op_after_endfile :
gawk>
+@end group
@end smallexample
@cindex @code{exit} debugger command
@@ -31311,6 +31486,7 @@ In computer systems, integer arithmetic is exact, but the possible
range of values is limited. Integer arithmetic is generally faster than
floating-point arithmetic.
+@cindex floating-point, numbers
@item Floating-point arithmetic
Floating-point numbers represent what were called in school ``real''
numbers (i.e., those that have a fractional part, such as 3.1415927).
@@ -31322,6 +31498,12 @@ Modern systems support floating-point arithmetic in hardware, with a
limited range of values. There are software libraries that allow
the use of arbitrary-precision floating-point calculations.
+@cindex floating-point, numbers@comma{} single-precision
+@cindex floating-point, numbers@comma{} double-precision
+@cindex floating-point, numbers@comma{} arbitrary-precision
+@cindex single-precision
+@cindex double-precision
+@cindex arbitrary-precision
POSIX @command{awk} uses @dfn{double-precision} floating-point numbers, which
can hold more digits than @dfn{single-precision} floating-point numbers.
@command{gawk} has facilities for performing arbitrary-precision
@@ -31331,29 +31513,48 @@ floating-point arithmetic, which we describe in more detail shortly.
Computers work with integer and floating-point values of different
ranges. Integer values are usually either 32 or 64 bits in size.
Single-precision floating-point values occupy 32 bits, whereas double-precision
-floating-point values occupy 64 bits. Floating-point values are always
-signed. The possible ranges of values are shown in @ref{table-numeric-ranges}.
+floating-point values occupy 64 bits.
+(Quadruple-precision floating point values also exist. They occupy 128 bits,
+but such numbers are not available in @command{awk}.)
+Floating-point values are always
+signed. The possible ranges of values are shown in @ref{table-numeric-ranges}
+and @ref{table-floating-point-ranges}.
@float Table,table-numeric-ranges
-@caption{Value ranges for different numeric representations}
+@caption{Value ranges for integer representations}
@multitable @columnfractions .34 .33 .33
-@headitem Numeric representation @tab Minimum value @tab Maximum value
+@headitem Representation @tab Minimum value @tab Maximum value
@item 32-bit signed integer @tab @minus{}2,147,483,648 @tab 2,147,483,647
@item 32-bit unsigned integer @tab 0 @tab 4,294,967,295
@item 64-bit signed integer @tab @minus{}9,223,372,036,854,775,808 @tab 9,223,372,036,854,775,807
@item 64-bit unsigned integer @tab 0 @tab 18,446,744,073,709,551,615
+@end multitable
+@end float
+
+@float Table,table-floating-point-ranges
+@caption{Approximate value ranges for floating-point number representations}
+@multitable @columnfractions .38 .22 .22 .23
@iftex
-@item Single-precision floating point (approximate) @tab @math{1.175494^{-38}} @tab @math{3.402823^{38}}
-@item Double-precision floating point (approximate) @tab @math{2.225074^{-308}} @tab @math{1.797693^{308}}
+@headitem Representation @tab @w{Minimum positive} @w{nonzero value} @tab Minimum @w{finite value} @tab Maximum @w{finite value}
+@end iftex
+@ifnottex
+@headitem Representation @tab Minimum positive nonzero value @tab Minimum finite value @tab Maximum finite value
+@end ifnottex
+@iftex
+@item @w{Single-precision floating-point} @tab @math{1.175494 @cdot 10^{-38}} @tab @math{-3.402823 @cdot 10^{38}} @tab @math{3.402823 @cdot 10^{38}}
+@item @w{Double-precision floating-point} @tab @math{2.225074 @cdot 10^{-308}} @tab @math{-1.797693 @cdot 10^{308}} @tab @math{1.797693 @cdot 10^{308}}
+@item @w{Quadruple-precision floating-point} @tab @math{3.362103 @cdot 10^{-4932}} @tab @math{-1.189731 @cdot 10^{4932}} @tab @math{1.189731 @cdot 10^{4932}}
@end iftex
@ifinfo
-@item Single-precision floating point (approximate) @tab 1.175494e-38 @tab 3.402823e38
-@item Double-precision floating point (approximate) @tab 2.225074e-308 @tab 1.797693e308
+@item Single-precision floating-point @tab 1.175494e-38 @tab -3.402823e+38 @tab 3.402823e+38
+@item Double-precision floating-point @tab 2.225074e-308 @tab -1.797693e+308 @tab 1.797693e+308
+@item Quadruple-precision floating-point @tab 3.362103e-4932 @tab -1.189731e+4932 @tab 1.189731e+4932
@end ifinfo
@ifnottex
@ifnotinfo
-@item Single-precision floating point (approximate) @tab 1.175494@sup{-38} @tab 3.402823@sup{38}
-@item Double-precision floating point (approximate) @tab 2.225074@sup{-308} @tab 1.797693@sup{308}
+@item Single-precision floating-point @tab 1.175494*10@sup{-38} @tab -3.402823*10@sup{38} @tab 3.402823*10@sup{38}
+@item Double-precision floating-point @tab 2.225074*10@sup{-308} @tab -1.797693*10@sup{308} @tab 1.797693*10@sup{308}
+@item Quadruple-precision floating-point @tab 3.362103*10@sup{-4932} @tab -1.189731*10@sup{4932} @tab 1.189731*10@sup{4932}
@end ifnotinfo
@end ifnottex
@end multitable
@@ -31622,12 +31823,14 @@ You have to decide how small a delta is important to you. Code to do
this looks something like the following:
@example
+@group
delta = 0.00001 # for example
difference = abs(a) - abs(b) # subtract the two values
if (difference < delta)
# all ok
else
# not ok
+@end group
@end example
@noindent
@@ -32097,6 +32300,7 @@ choose to set:
@example
@c file eg/prog/pi.awk
+@group
# pi.awk --- compute the digits of pi
@c endfile
@c endfile
@@ -32112,6 +32316,7 @@ choose to set:
BEGIN @{
digits = 100000
two = 2 * 10 ^ digits
+@end group
pi = two
for (m = digits * 4; m > 0; --m) @{
d = m * 2 + 1
@@ -33078,6 +33283,7 @@ of the function using the macro.
For example, you might allocate a string value like so:
@example
+@group
awk_value_t result;
char *message;
const char greet[] = "Don't Panic!";
@@ -33085,8 +33291,10 @@ const char greet[] = "Don't Panic!";
emalloc(message, char *, sizeof(greet), "myfunc");
strcpy(message, greet);
make_malloced_string(message, strlen(message), & result);
+@end group
@end example
+@sp 2
@item #define ezalloc(pointer, type, size, message) @dots{}
This is like @code{emalloc()}, but it calls @code{gawk_calloc()}
instead of @code{gawk_malloc()}.
@@ -33222,6 +33430,7 @@ registering parts of your extension with @command{gawk}.
Extension functions are described by the following record:
@example
+@group
typedef struct awk_ext_func @{
@ @ @ @ const char *name;
@ @ @ @ awk_value_t *(*const function)(int num_actual_args,
@@ -33232,6 +33441,7 @@ typedef struct awk_ext_func @{
@ @ @ @ awk_bool_t suppress_lint;
@ @ @ @ void *data; /* opaque pointer to any extra state */
@} awk_ext_func_t;
+@end group
@end example
The fields are:
@@ -33427,12 +33637,14 @@ Your extension should package these functions inside an
@code{awk_input_parser_t}, which looks like this:
@example
+@group
typedef struct awk_input_parser @{
const char *name; /* name of parser */
awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
awk_const struct awk_input_parser *awk_const next; /* for gawk */
@} awk_input_parser_t;
+@end group
@end example
The fields are:
@@ -34185,6 +34397,7 @@ to a global variable or array. It is an optimization that
avoids looking up variables in @command{gawk}'s symbol table every time
access is needed. This was discussed earlier, in @ref{General Data Types}.
+@need 1500
The following functions let you work with scalar cookies:
@table @code
@@ -34247,12 +34460,14 @@ your extension's variable in @command{gawk}'s symbol table using
using @code{sym_lookup()}:
@example
+@group
static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */
static void
my_extension_init()
@{
awk_value_t value;
+@end group
/* install initial value */
sym_update("MAGIC_VAR", make_number(42.0, & value));
@@ -34756,10 +34971,12 @@ Finally, because everything was successful, the function sets the
return value to success, and returns:
@example
+@group
make_number(1.0, result);
out:
return result;
@}
+@end group
@end example
Here is the output from running this part of the test:
@@ -34971,7 +35188,7 @@ BEGIN @{
Here is the result of running the script:
@example
-$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk}
+$ @kbd{AWKLIBPATH=$PWD gawk -f subarray.awk}
@print{} new_array["subarray"]["foo"] = bar
@print{} new_array["hello"] = world
@print{} new_array["answer"] = 42
@@ -35110,7 +35327,7 @@ It is up to the extension to decide if there are API incompatibilities.
Typically, a check like this is enough:
@example
-if (api->major_version != GAWK_API_MAJOR_VERSION
+if ( api->major_version != GAWK_API_MAJOR_VERSION
|| api->minor_version < GAWK_API_MINOR_VERSION) @{
fprintf(stderr, "foo_extension: version mismatch with gawk!\n");
fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n",
@@ -35211,10 +35428,12 @@ as described here. The boilerplate needed is also provided in comments
in the @file{gawkapi.h} header file:
@example
+@group
/* Boilerplate code: */
int plugin_is_GPL_compatible;
static gawk_api_t *const api;
+@end group
static awk_ext_id_t ext_id;
static const char *ext_version = NULL; /* or @dots{} = "some string" */
@@ -35615,10 +35834,12 @@ The second is a pointer to an @code{awk_value_t} structure, usually named
@code{result}:
@example
+@group
/* do_chdir --- provide dynamically loaded chdir() function for gawk */
static awk_value_t *
do_chdir(int nargs, awk_value_t *result, struct awk_ext_func *unused)
+@end group
@{
awk_value_t newdir;
int ret = -1;
@@ -35745,7 +35966,7 @@ fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf)
#endif
#ifdef S_IFDOOR /* Solaris weirdness */
@{ S_IFDOOR, "door" @},
-#endif /* S_IFDOOR */
+#endif
@};
int j, k;
@end example
@@ -35788,9 +36009,11 @@ certain members and/or the type of the file. It then returns zero,
for success:
@example
+@group
#ifdef HAVE_STRUCT_STAT_ST_BLKSIZE
array_set_numeric(array, "blksize", sbuf->st_blksize);
-#endif /* HAVE_STRUCT_STAT_ST_BLKSIZE */
+#endif
+@end group
pmode = format_mode(sbuf->st_mode);
array_set(array, "pmode", make_const_string(pmode, strlen(pmode),
@@ -35879,20 +36102,24 @@ Next, it gets the information for the file. If the called function
/* stat the file; if error, set ERRNO and return */
ret = statfunc(name, & sbuf);
+@group
if (ret < 0) @{
update_ERRNO_int(errno);
return make_number(ret, result);
@}
+@end group
@end example
The tedious work is done by @code{fill_stat_array()}, shown
earlier. When done, the function returns the result from @code{fill_stat_array()}:
@example
+@group
ret = fill_stat_array(name, array, & sbuf);
return make_number(ret, result);
@}
+@end group
@end example
Finally, it's necessary to provide the ``glue'' that loads the
@@ -41580,14 +41807,24 @@ like this: @code{""}.
Humans are used to working in decimal; i.e., base 10. In base 10,
numbers go from 0 to 9, and then ``roll over'' into the next
+@iftex
+column. (Remember grade school? @math{42 = 4\times 10 + 2}.)
+@end iftex
+@ifnottex
column. (Remember grade school? 42 = 4 x 10 + 2.)
+@end ifnottex
There are other number bases though. Computers commonly use base 2
or @dfn{binary}, base 8 or @dfn{octal}, and base 16 or @dfn{hexadecimal}.
In binary, each column represents two times the value in the column to
its right. Each column may contain either a 0 or a 1.
+@iftex
+Thus, binary 1010 represents @math{(1\times 8) + (0\times 4) + (1\times 2) + (0\times 1)}, or decimal 10.
+@end iftex
+@ifnottex
Thus, binary 1010 represents (1 x 8) + (0 x 4) + (1 x 2)
+ (0 x 1), or decimal 10.
+@end ifnottex
Octal and hexadecimal are discussed more in
@ref{Nondecimal-numbers}.
@@ -41727,7 +41964,12 @@ electronic circuitry works ``naturally'' in base 2 (just think of Off/On),
everything inside a computer is calculated using base 2. Each digit
represents the presence (or absence) of a power of 2 and is called a
@dfn{bit}. So, for example, the base-two number @code{10101} is
+@iftex
+the same as decimal 21, (@math{(1\times 16) + (1\times 4) + (1\times 1)}).
+@end iftex
+@ifnottex
the same as decimal 21, ((1 x 16) + (1 x 4) + (1 x 1)).
+@end ifnottex
Since base-two numbers quickly become
very long to read and write, they are usually grouped by 3 (i.e., they are
@@ -41898,7 +42140,7 @@ See also ``Interpreter.''
@item Complemented Bracket Expression
The negation of a @dfn{bracket expression}. All that is @emph{not}
described by a given bracket expression. The symbol @samp{^} precedes
-the negated bracket expression. E.g.: @samp{[[^:digit:]}
+the negated bracket expression. E.g.: @samp{[^[:digit:]]}
designates whatever character is not a digit. @samp{[^bad]}
designates whatever character is not one of the letters @samp{b}, @samp{a},
or @samp{d}.
@@ -42167,7 +42409,12 @@ Base 16 notation, where the digits are @code{0}--@code{9} and
@code{A}--@code{F}, with @samp{A}
representing 10, @samp{B} representing 11, and so on, up to @samp{F} for 15.
Hexadecimal numbers are written in C using a leading @samp{0x},
+@iftex
+to indicate their base. Thus, @code{0x12} is 18 (@math{(1\times 16) + 2}).
+@end iftex
+@ifnottex
to indicate their base. Thus, @code{0x12} is 18 ((1 x 16) + 2).
+@end ifnottex
@xref{Nondecimal-numbers}.
@item I/O
@@ -42231,7 +42478,7 @@ meaning. Keywords are reserved and may not be used as variable names.
@code{break},
@code{case},
@code{continue},
-@code{default}
+@code{default},
@code{delete},
@code{do@dots{}while},
@code{else},
@@ -42317,7 +42564,12 @@ Ancient @command{awk} implementations used single precision floating-point.
@item Octal
Base-eight notation, where the digits are @code{0}--@code{7}.
Octal numbers are written in C using a leading @samp{0},
+@iftex
+to indicate their base. Thus, @code{013} is 11 (@math{(1\times 8) + 3}).
+@end iftex
+@ifnottex
to indicate their base. Thus, @code{013} is 11 ((1 x 8) + 3).
+@end ifnottex
@xref{Nondecimal-numbers}.
@item Output Record