diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 312 |
1 files changed, 282 insertions, 30 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 54c4f913..dec51695 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -1527,9 +1527,11 @@ default @command{awk} utility. A more modern @command{awk} lives in if you try the test program: @example +@group $ @kbd{awk 1 /dev/null} @error{} awk: syntax error near line 1 @error{} awk: bailing out near line 1 +@end group @end example @noindent @@ -2991,10 +2993,12 @@ for the single- and double-quote characters, like so: @example +@group $ @kbd{awk 'BEGIN @{ print "Here is a single quote <\47>" @}'} @print{} Here is a single quote <'> $ @kbd{awk 'BEGIN @{ print "Here is a double quote <\42>" @}'} @print{} Here is a double quote <"> +@end group @end example @noindent @@ -3266,8 +3270,10 @@ action---so it uses the default action, printing the record. Print the length of the longest input line: @example +@group awk '@{ if (length($0) > max) max = length($0) @} END @{ print max @}' data +@end group @end example The code associated with @code{END} executes after all @@ -3584,11 +3590,13 @@ starts a comment, it ignores @emph{everything} on the rest of the line. For example: @example +@group $ @kbd{gawk 'BEGIN @{ print "dont panic" # a friendly \} > @kbd{ BEGIN rule} > @kbd{@}'} @error{} gawk: cmd. line:2: BEGIN rule @error{} gawk: cmd. line:2: ^ syntax error +@end group @end example @noindent @@ -4785,10 +4793,12 @@ The files to be included may be nested; e.g., given a third script, namely @file{test3}: @example +@group @@include "test2" BEGIN @{ print "This is script test3." @} +@end group @end example @noindent @@ -4875,8 +4885,10 @@ $ @kbd{gawk '@@load "ordchr"; BEGIN @{print chr(65)@}'} This is equivalent to the following example: @example +@group $ @kbd{gawk -lordchr 'BEGIN @{print chr(65)@}'} @print{} A +@end group @end example @noindent @@ -6499,8 +6511,10 @@ with each @samp{u} changed to a newline. Here are the results of running the program on @file{mail-list}: @example +@group $ @kbd{awk 'BEGIN @{ RS = "u" @}} > @kbd{@{ print $0 @}' mail-list} +@end group @print{} Amelia 555-5553 amelia.zodiac @print{} sq @print{} e@@gmail.com F @@ -6657,9 +6671,11 @@ matches either a newline or a series of one or more uppercase letters with optional leading and/or trailing whitespace: @example +@group $ @kbd{echo record 1 AAAA record 2 BBBB record 3 |} > @kbd{gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @}} > @kbd{@{ print "Record =", $0,"and RT = [" RT "]" @}'} +@end group @print{} Record = record 1 and RT = [ AAAA ] @print{} Record = record 2 and RT = [ BBBB ] @print{} Record = record 3 and RT = [ @@ -7100,8 +7116,10 @@ values of the fields and @code{OFS}. To do this, use the seemingly innocuous assignment: @example +@group $1 = $1 # force record to be reconstituted print $0 # or whatever else with $0 +@end group @end example @noindent @@ -7997,16 +8015,20 @@ Putting this to use, here is a simple program to parse the data: @example @c file eg/misc/simple-csv.awk +@group BEGIN @{ FPAT = "([^,]+)|(\"[^\"]+\")" @} +@end group +@group @{ print "NF = ", NF for (i = 1; i <= NF; i++) @{ printf("$%d = <%s>\n", i, $i) @} @} +@end group @c endfile @end example @@ -8447,6 +8469,7 @@ read-a-line-and-check-each-rule loop of @command{awk} never sees it. The following example swaps every two lines of input: @example +@group @{ if ((getline tmp) > 0) @{ print tmp @@ -8454,6 +8477,7 @@ The following example swaps every two lines of input: @} else print $0 @} +@end group @end example @noindent @@ -8596,6 +8620,7 @@ lines that begin with @samp{@@execute}, which are replaced by the output produced by running the rest of the line as a shell command: @example +@group @{ if ($1 == "@@execute") @{ tmp = substr($0, 10) # Remove "@@execute" @@ -8605,6 +8630,7 @@ produced by running the rest of the line as a shell command: @} else print @} +@end group @end example @noindent @@ -8908,12 +8934,14 @@ For example, a TCP client can decide to give up on receiving any response from the server after a certain amount of time: @example +@group Service = "/inet/tcp/0/localhost/daytime" PROCINFO[Service, "READ_TIMEOUT"] = 100 if ((Service |& getline) > 0) print $0 else if (ERRNO != "") print ERRNO +@end group @end example Here is how to read interactively from the user@footnote{This assumes @@ -9255,10 +9283,12 @@ newlines: @end ifnotinfo @example +@group $ @kbd{awk 'BEGIN @{ print "line one\nline two\nline three" @}'} @print{} line one @print{} line two @print{} line three +@end group @end example @cindex fields, printing @@ -9510,12 +9540,14 @@ The output separator variables @code{OFS} and @code{ORS} have no effect on @code{printf} statements. For example: @example +@group $ @kbd{awk 'BEGIN @{} > @kbd{ORS = "\nOUCH!\n"; OFS = "+"} > @kbd{msg = "Don\47t Panic!"} > @kbd{printf "%s\n", msg} > @kbd{@}'} @print{} Don't Panic! +@end group @end example @noindent @@ -10038,9 +10070,11 @@ alone for now and let's hope no-one notices. @end ignore @example +@group awk '@{ print $1 > "names.unsorted" command = "sort -r > names.sorted" print $1 | command @}' mail-list +@end group @end example The unsorted list is written with an ordinary redirection, while @@ -10375,7 +10409,7 @@ The @var{protocol} is one of @samp{tcp} or @samp{udp}, and the other fields represent the other essential pieces of information for making a networking connection. These @value{FN}s are used with the @samp{|&} operator for communicating -with a coprocess +with @w{a coprocess} (@pxref{Two-way I/O}). This is an advanced feature, mentioned here only for completeness. Full discussion is delayed until @@ -10474,10 +10508,14 @@ it is good practice to use a variable to store the @value{FN} or command. The previous example becomes the following: @example +@group sortcom = "sort -r names" sortcom | getline foo +@end group +@group @dots{} close(sortcom) +@end group @end example @noindent @@ -10625,7 +10663,7 @@ if it fails. @float Table,table-close-pipe-return-values @caption{Return values from @code{close()} of a pipe} -@multitable @columnfractions .40 .60 +@multitable @columnfractions .50 .50 @headitem Situation @tab Return value from @code{close()} @item Normal exit of command @tab Command's exit status @item Death by signal of command @tab 256 + number of murderous signal @@ -10691,7 +10729,7 @@ if it fails. @float Table,table-close-pipe-return-values @caption{Return values from @code{close()} of a pipe} -@multitable @columnfractions .40 .60 +@multitable @columnfractions .50 .50 @headitem Situation @tab Return value from @code{close()} @item Normal exit of command @tab Command's exit status @item Death by signal of command @tab 256 + number of murderous signal @@ -10721,7 +10759,8 @@ disk) is a fatal error. @example $ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'} -@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory) +@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No +@error{} such file or directory) @end example @command{gawk} makes it possible to detect that an error has @@ -11181,6 +11220,7 @@ confusion can arise when attempting to use regexp constants as arguments to user-defined functions (@pxref{User-defined}). For example: @example +@group function mysub(pat, repl, str, global) @{ if (global) @@ -11189,13 +11229,16 @@ function mysub(pat, repl, str, global) sub(pat, repl, str) return str @} +@end group +@group @{ @dots{} text = "hi! hi yourself!" mysub(/hi/, "howdy", text, 1) @dots{} @} +@end group @end example @c @cindex automatic warnings @@ -11443,8 +11486,10 @@ is performed. If numeric values appear in string concatenation, they are converted to strings. Consider the following: @example +@group two = 2; three = 3 print (two three) + 4 +@end group @end example @noindent @@ -11946,10 +11991,14 @@ to it. In the following program fragment, the variable @code{foo} has a numeric value at first, and a string value later on: @example +@group foo = 1 print foo +@end group +@group foo = "bar" print foo +@end group @end example @noindent @@ -12021,16 +12070,20 @@ righthand expression. For example: @cindex Rankin, Pat @example +@group # Thanks to Pat Rankin for this example BEGIN @{ foo[rand()] += 5 for (x in foo) print x, foo[x] +@end group +@group bar[rand()] = bar[rand()] + 5 for (x in bar) print x, bar[x] @} +@end group @end example @cindex operators, assignment, evaluation order @@ -12816,10 +12869,12 @@ leave off one of the @samp{=} characters. The result is still valid @command{awk} code, but the program does not do what is intended: @example +@group if (a = b) # oops! should be a == b @dots{} else @dots{} +@end group @end example @noindent @@ -13771,8 +13826,10 @@ $ @kbd{awk '! /li/' mail-list} @print{} Bill 555-1675 bill.drowning@@hotmail.com A @print{} Camilla 555-2912 camilla.infusarum@@skynet.be R @print{} Fabius 555-1234 fabius.undevicesimus@@ucb.edu F +@group @print{} Martin 555-6480 martin.codicibus@@hotmail.com A @print{} Jean-Paul 555-2127 jeanpaul.campanorum@@nyu.edu R +@end group @end example @cindex @code{BEGIN} pattern, Boolean patterns and @@ -14163,10 +14220,12 @@ the variable's value into the program inside the script. For example, consider the following program: @example +@group printf "Enter search pattern: " read pattern awk "/$pattern/ "'@{ nmatches++ @} END @{ print nmatches, "found" @}' /path/to/data +@end group @end example @noindent @@ -14355,10 +14414,12 @@ the null string; otherwise, the condition is true. Refer to the following: @example +@group if (x % 2 == 0) print "x is even" else print "x is odd" +@end group @end example In this example, if the expression @samp{x % 2 == 0} is true (i.e., @@ -14680,6 +14741,7 @@ finds the smallest divisor of any integer, and also identifies prime numbers: @example +@group # find smallest divisor of num @{ num = $1 @@ -14687,11 +14749,14 @@ numbers: if (num % divisor == 0) break @} +@end group +@group if (num % divisor == 0) printf "Smallest divisor of %d is %d\n", num, divisor else printf "%d is prime\n", num @} +@end group @end example When the remainder is zero in the first @code{if} statement, @command{awk} @@ -14999,14 +15064,18 @@ using an @code{exit} statement with a nonzero argument, as shown in the following example: @example +@group BEGIN @{ if (("date" | getline date_now) <= 0) @{ print "Can't get system date" > "/dev/stderr" exit 1 @} +@end group +@group print "current date is", date_now close("date") @} +@end group @end example @quotation NOTE @@ -15304,6 +15373,7 @@ Unlike most @command{awk} arrays, In the following example: @example +@group $ @kbd{awk 'BEGIN @{} > @kbd{for (i = 0; i < ARGC; i++)} > @kbd{print ARGV[i]} @@ -15311,6 +15381,7 @@ $ @kbd{awk 'BEGIN @{} @print{} awk @print{} inventory-shipped @print{} mail-list +@end group @end example @noindent @@ -15735,12 +15806,14 @@ points out that it effectively gives @command{awk} data pointers. Consider his example: @example +@group # Indirect multiply of any variable by amount, return result function multiply(variable, amount) @{ return SYMTAB[variable] *= amount @} +@end group @end example @noindent @@ -15858,6 +15931,7 @@ presented the following program describing the information contained in @code{AR and @code{ARGV}: @example +@group $ @kbd{awk 'BEGIN @{} > @kbd{for (i = 0; i < ARGC; i++)} > @kbd{print ARGV[i]} @@ -15865,6 +15939,7 @@ $ @kbd{awk 'BEGIN @{} @print{} awk @print{} inventory-shipped @print{} mail-list +@end group @end example @noindent @@ -16461,8 +16536,10 @@ For example, this statement tests whether the array @code{frequencies} contains the index @samp{2}: @example +@group if (2 in frequencies) print "Subscript 2 is present." +@end group @end example Note that this is @emph{not} a test of whether the array @@ -16472,8 +16549,10 @@ There is no way to do that except to scan all the elements. Also, this (incorrect) alternative does: @example +@group if (frequencies[2] != "") print "Subscript 2 is present." +@end group @end example @node Assigning Elements @@ -16530,6 +16609,7 @@ all the lines. When this program is run with the following input: @example +@group @c file eg/misc/arraymax.data 5 I am the Five man 2 Who are you? The new number two! @@ -16537,17 +16617,20 @@ When this program is run with the following input: 1 Who is number one? 3 I three you. @c endfile +@end group @end example @noindent Its output is: @example +@group 1 Who is number one? 2 Who are you? The new number two! 3 I three you. 4 . . . And four on the floor 5 I am the Five man +@end group @end example If a line number is repeated, the last line with a given number overrides @@ -16556,11 +16639,13 @@ Gaps in the line numbers can be handled with an easy improvement to the program's @code{END} rule, as follows: @example +@group END @{ for (x = 1; x <= max; x++) if (x in arr) print arr[x] @} +@end group @end example @node Scanning an Array @@ -16580,8 +16665,10 @@ So @command{awk} has a special kind of @code{for} statement for scanning an array: @example +@group for (@var{var} in @var{array}) @var{body} +@end group @end example @noindent @@ -16602,12 +16689,15 @@ such words. for more information on the built-in function @code{length()}. @example +@group # Record a 1 for each word that is used at least once @{ for (i = 1; i <= NF; i++) used[$i] = 1 @} +@end group +@group # Find number of distinct words more than 10 characters long END @{ for (x in used) @{ @@ -16618,6 +16708,7 @@ END @{ @} print num_long_words, "words longer than 10 characters" @} +@end group @end example @noindent @@ -17005,9 +17096,11 @@ same as assigning it a null value (the empty string, @code{""}). For example: @example +@group foo[4] = "" if (4 in foo) print "This is printed, even though foo[4] is empty" +@end group @end example @cindex lint checking, array elements @@ -17162,22 +17255,26 @@ END @{ When given the input: @example +@group 1 2 3 4 5 6 2 3 4 5 6 1 3 4 5 6 1 2 4 5 6 1 2 3 +@end group @end example @noindent the program produces the following output: @example +@group 4 3 2 1 5 4 3 2 6 5 4 3 1 6 5 4 2 1 6 5 3 2 1 6 +@end group @end example @node Multiscanning @@ -17357,15 +17454,19 @@ you can often devise workarounds using control statements. For example, the following code prints the elements of our main array @code{a}: @example +@group for (i in a) @{ for (j in a[i]) @{ if (j == 3) @{ for (k in a[i][j]) print a[i][j][k] +@end group +@group @} else print a[i][j] @} @} +@end group @end example @noindent @@ -17815,9 +17916,11 @@ asort(a) results in the following contents of @code{a}: @example +@group a[1] = "cul" a[2] = "de" a[3] = "sac" +@end group @end example The @code{asorti()} function works similarly to @code{asort()}; however, @@ -18906,6 +19009,9 @@ a file or pipe that was opened for reading (such as with @code{getline}), or if @var{filename} is not an open file, pipe, or coprocess. In such a case, @code{fflush()} returns @minus{}1, as well. +@c end the table to let the sidebar take up the full width of the page. +@end table + @cindex sidebar, Interactive Versus Noninteractive Buffering @ifdocbook @docbook @@ -19006,6 +19112,7 @@ it is all buffered and sent down the pipe to @command{cat} in one shot. @end cartouche @end ifnotdocbook +@table @asis @item @code{system(@var{command})} @cindexawkfunc{system} @cindex invoke shell command @@ -19798,7 +19905,7 @@ that illustrates the use of these functions: @example @group @c file eg/lib/bits2str.awk -# bits2str --- turn a byte into readable ones and zeros +# bits2str --- turn an integer into readable ones and zeros function bits2str(bits, data, mask) @{ @@ -19820,7 +19927,7 @@ function bits2str(bits, data, mask) @c this is a hack to make testbits.awk self-contained @ignore @c file eg/prog/testbits.awk -# bits2str --- turn a byte into readable 1's and 0's +# bits2str --- turn an integer into readable ones and zeros function bits2str(bits, data, mask) @{ @@ -19861,7 +19968,8 @@ $ @kbd{gawk -f testbits.awk} @print{} 123 = 01111011 @print{} 0123 = 01010011 @print{} 0x99 = 10011001 -@print{} compl(0x99) = 0x3fffffffffff66 = 00111111111111111111111111111111111111111111111101100110 +@print{} compl(0x99) = 0x3fffffffffff66 = +@print{} 00111111111111111111111111111111111111111111111101100110 @print{} lshift(0x99, 2) = 0x264 = 0000001001100100 @print{} rshift(0x99, 2) = 0x26 = 00100110 @end example @@ -20200,10 +20308,12 @@ entire program before starting to execute any of it. The definition of a function named @var{name} looks like this: @display +@group @code{function} @var{name}@code{(}[@var{parameter-list}]@code{)} @code{@{} @var{body-of-function} @code{@}} +@end group @end display @cindex names, functions @@ -20371,11 +20481,13 @@ This function deletes all the elements in an array (recall that the extra whitespace signifies the start of the local variable list): @example +@group function delarray(a, i) @{ for (i in a) delete a[i] @} +@end group @end example When working with arrays, it is often necessary to delete all the elements @@ -20582,10 +20694,12 @@ In addition, recursive calls create new arrays. Consider this example: @example +@group function some_func(p1, a) @{ if (p1++ > 3) return +@end group a[p1] = p1 @@ -20649,12 +20763,14 @@ this has no effect on any other variables. Thus, if @code{myfunc()} does this: @example +@group function myfunc(str) @{ print str str = "zzz" print str @} +@end group @end example @noindent @@ -20810,11 +20926,13 @@ function maxelt(vec, i, ret) return ret @} +@group # Load all fields of each record into nums. @{ for(i = 1; i <= NF; i++) nums[NR, i] = $i @} +@end group END @{ print maxelt(nums) @@ -21108,12 +21226,14 @@ first thing to do is write some comparison functions: @example @c file eg/prog/indirectcall.awk +@group # num_lt --- do a numeric less than comparison function num_lt(left, right) @{ return ((left + 0) < (right + 0)) @} +@end group # num_ge --- do a numeric greater than or equal to comparison @@ -21162,19 +21282,23 @@ names of the two comparison functions: @example @c file eg/prog/indirectcall.awk +@group # sort --- sort the data in ascending order and return it as a string function sort(first, last) @{ return do_sort(first, last, "num_lt") @} +@end group +@group # rsort --- sort the data in descending order and return it as a string function rsort(first, last) @{ return do_sort(first, last, "num_ge") @} +@end group @c endfile @end example @@ -21674,6 +21798,7 @@ been true but was not, and then it kills the program. In C, using @code{assert()} looks this: @example +@group #include <assert.h> int myfunc(int a, double b) @@ -21681,6 +21806,7 @@ int myfunc(int a, double b) assert(a <= 5 && b >= 17.1); @dots{} @} +@end group @end example If the assertion fails, the program prints a message similar to this: @@ -21838,9 +21964,10 @@ function round(x, ival, aval, fraction) @} @c endfile @c don't include test harness in the file that gets installed - +@group # test harness # @{ print $0, round($0) @} +@end group @end example @node Cliff Random Function @@ -22246,7 +22373,7 @@ if (length(contents) == 0) @end example This tests the result to see if it is empty or not. An equivalent -test would be @samp{contents == ""}. +test would be @samp{@w{contents == ""}}. @xref{Extension Sample Readfile} for an extension function that also reads an entire file into memory. @@ -22577,8 +22704,10 @@ $ @kbd{gawk -f rewind.awk -f test.awk data } @print{} data 1 a @print{} data 2 b @print{} data 3 c +@group @print{} data 4 d @print{} data 5 e +@end group @end example @node File Checking @@ -23793,8 +23922,10 @@ function getgrent() _gr_init() if (++_gr_count in _gr_bycount) return _gr_bycount[_gr_count] +@group return "" @} +@end group @c endfile @end example @@ -24324,10 +24455,12 @@ list of fields or characters: if (by_fields == 0 && by_chars == 0) by_fields = 1 # default +@group if (fieldlist == "") @{ print "cut: needs list for -c or -f" > "/dev/stderr" exit 1 @} +@end group if (by_fields) set_fieldlist() @@ -24668,8 +24801,10 @@ function endfile(file) print fcount @} +@group total += fcount @} +@end group @c endfile @end example @@ -24826,11 +24961,15 @@ BEGIN @{ pw = getpwuid(uid) pr_first_field(pw) +@group if (euid != uid) @{ printf(" euid=%d", euid) pw = getpwuid(euid) +@end group +@group pr_first_field(pw) @} +@end group printf(" gid=%d", gid) pw = getgrgid(gid) @@ -24958,14 +25097,17 @@ BEGIN @{ # test argv in case reading from stdin instead of file if (i in ARGV) i++ # skip datafile name +@group if (i in ARGV) @{ outfile = ARGV[i] ARGV[i] = "" @} - +@end group +@group s1 = s2 = "a" out = (outfile s1 s2) @} +@end group @c endfile @end example @@ -25121,11 +25263,15 @@ line into each file on the command line, and then to the standard output: It is also possible to write the loop this way: @example +@group for (i in copy) if (append) print >> copy[i] +@end group +@group else print > copy[i] +@end group @end example @noindent @@ -25276,10 +25422,12 @@ BEGIN @{ usage() @} +@group if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{ charcount = substr(ARGV[Optind], 2) + 0 Optind++ @} +@end group for (i = 1; i < Optind; i++) ARGV[i] = "" @@ -25313,10 +25461,12 @@ strings are then compared and @code{are_equal()} returns the result: @example @c file eg/prog/uniq.awk +@group function are_equal( n, m, clast, cline, alast, aline) @{ if (fcount == 0 && charcount == 0) return (last == $0) +@end group if (fcount > 0) @{ n = split(last, alast) @@ -25331,9 +25481,11 @@ function are_equal( n, m, clast, cline, alast, aline) clast = substr(clast, charcount + 1) cline = substr(cline, charcount + 1) @} +@group return (clast == cline) @} +@end group @c endfile @end example @@ -25392,11 +25544,13 @@ NR == 1 @{ END @{ if (do_count) printf("%4d %s\n", count, last) > outputfile +@group else if ((repeated_only && count > 1) || (non_repeated_only && count == 1)) print last > outputfile close(outputfile) @} +@end group @c endfile @end example @@ -26191,10 +26345,12 @@ At first glance, a program like this would seem to do the job: freq[$i]++ @} +@group END @{ for (word in freq) printf "%s\t%d\n", word, freq[word] @} +@end group @end example The program relies on @command{awk}'s default field-splitting @@ -26584,9 +26740,11 @@ line. That line is then printed to the output file: i++ @} @} +@group print join(a, 1, n, SUBSEP) > curfile @} @} +@end group @c endfile @end example @@ -26672,10 +26830,12 @@ function usage() exit 1 @} +@group BEGIN @{ # validate arguments if (ARGC < 3) usage() +@end group RS = ARGV[1] ORS = ARGV[2] @@ -27069,13 +27229,11 @@ the program is done: continue @} fpath = pathto($2) -@group if (fpath == "") @{ printf("igawk: %s:%d: cannot find %s\n", input[stackptr], FNR, $2) > "/dev/stderr" continue @} -@end group if (! (fpath in processed)) @{ processed[fpath] = input[stackptr] input[++stackptr] = fpath # push onto stack @@ -27332,10 +27490,12 @@ notice and this notice are preserved. Here is the program: @example +@group awk 'BEGIN@{O="~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c"; printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O, X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O, O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),x-O@}' +@end group @end example @cindex Johansen, Chris @@ -27823,11 +27983,13 @@ Our first comparison function can be used to scan an array in numerical order of the indices: @example +@group function cmp_num_idx(i1, v1, i2, v2) @{ # numerical index comparison, ascending order return (i1 - i2) @} +@end group @end example Our second function traverses an array based on the string order of @@ -27932,10 +28094,13 @@ function cmp_field(i1, v1, i2, v2) a[NR][i] = $i @} +@group END @{ PROCINFO["sorted_in"] = "cmp_field" +@end group if (POS < 1 || POS > NF) POS = 1 + for (i in a) @{ for (j = 1; j <= NF; j++) printf("%s%c", a[i][j], j < NF ? ":" : "") @@ -27992,6 +28157,7 @@ function cmp_numeric(i1, v1, i2, v2) return (v1 != v2) ? (v2 - v1) : (i2 - i1) @} +@group function cmp_string(i1, v1, i2, v2) @{ # string value (and index) comparison, descending order @@ -27999,6 +28165,7 @@ function cmp_string(i1, v1, i2, v2) v2 = v2 i2 return (v1 > v2) ? -1 : (v1 != v2) @} +@end group @end example @c Avoid using the term ``stable'' when describing the unpredictable behavior @@ -28152,11 +28319,13 @@ The following example demonstrates the use of a comparison function with both values to lowercase in order to compare them ignoring case. @example +@group # case_fold_compare --- compare as strings, ignoring case function case_fold_compare(i1, v1, i2, v2, l, r) @{ l = tolower(v1) +@end group r = tolower(v2) if (l < r) @@ -29513,8 +29682,10 @@ This is somewhat counterintuitive. and those with positional specifiers in the same string: @example +@group $ @kbd{gawk 'BEGIN @{ printf "%d %3$s\n", 1, 2, "hi" @}'} @error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none +@end group @end example @quotation NOTE @@ -30139,8 +30310,10 @@ be inside this function. To investigate further, we must begin @samp{n} (for ``next''): @example +@group gawk> @kbd{n} @print{} 66 if (fcount > 0) @{ +@end group @end example This tells us that @command{gawk} is now ready to execute line 66, which @@ -30909,10 +31082,12 @@ partial dump of Davide Brini's obfuscated code @c FIXME: This will need updating if num-handler branch is ever merged in. @smallexample +@group gawk> @kbd{dump} @print{} # BEGIN @print{} @print{} [ 1:0xfcd340] Op_rule : [in_rule = BEGIN] [source_file = brini.awk] +@end group @print{} [ 1:0xfcc240] Op_push_i : "~" [MALLOC|STRING|STRCUR] @print{} [ 1:0xfcc2a0] Op_push_i : "~" [MALLOC|STRING|STRCUR] @print{} [ 1:0xfcc280] Op_match : @@ -30945,18 +31120,18 @@ gawk> @kbd{dump} @print{} [ :0xfcc660] Op_no_op : @print{} [ 1:0xfcc520] Op_assign_concat : c @print{} [ :0xfcc620] Op_jmp : [target_jmp = 0xfcc440] -@print{} @dots{} -@print{} @print{} [ 2:0xfcc5a0] Op_K_printf : [expr_count = 17] [redir_type = ""] @print{} [ :0xfcc140] Op_no_op : @print{} [ :0xfcc1c0] Op_atexit : @print{} [ :0xfcc640] Op_stop : @print{} [ :0xfcc180] Op_no_op : @print{} [ :0xfcd150] Op_after_beginfile : +@group @print{} [ :0xfcc160] Op_no_op : @print{} [ :0xfcc1a0] Op_after_endfile : gawk> +@end group @end smallexample @cindex @code{exit} debugger command @@ -31311,6 +31486,7 @@ In computer systems, integer arithmetic is exact, but the possible range of values is limited. Integer arithmetic is generally faster than floating-point arithmetic. +@cindex floating-point, numbers @item Floating-point arithmetic Floating-point numbers represent what were called in school ``real'' numbers (i.e., those that have a fractional part, such as 3.1415927). @@ -31322,6 +31498,12 @@ Modern systems support floating-point arithmetic in hardware, with a limited range of values. There are software libraries that allow the use of arbitrary-precision floating-point calculations. +@cindex floating-point, numbers@comma{} single-precision +@cindex floating-point, numbers@comma{} double-precision +@cindex floating-point, numbers@comma{} arbitrary-precision +@cindex single-precision +@cindex double-precision +@cindex arbitrary-precision POSIX @command{awk} uses @dfn{double-precision} floating-point numbers, which can hold more digits than @dfn{single-precision} floating-point numbers. @command{gawk} has facilities for performing arbitrary-precision @@ -31331,29 +31513,48 @@ floating-point arithmetic, which we describe in more detail shortly. Computers work with integer and floating-point values of different ranges. Integer values are usually either 32 or 64 bits in size. Single-precision floating-point values occupy 32 bits, whereas double-precision -floating-point values occupy 64 bits. Floating-point values are always -signed. The possible ranges of values are shown in @ref{table-numeric-ranges}. +floating-point values occupy 64 bits. +(Quadruple-precision floating point values also exist. They occupy 128 bits, +but such numbers are not available in @command{awk}.) +Floating-point values are always +signed. The possible ranges of values are shown in @ref{table-numeric-ranges} +and @ref{table-floating-point-ranges}. @float Table,table-numeric-ranges -@caption{Value ranges for different numeric representations} +@caption{Value ranges for integer representations} @multitable @columnfractions .34 .33 .33 -@headitem Numeric representation @tab Minimum value @tab Maximum value +@headitem Representation @tab Minimum value @tab Maximum value @item 32-bit signed integer @tab @minus{}2,147,483,648 @tab 2,147,483,647 @item 32-bit unsigned integer @tab 0 @tab 4,294,967,295 @item 64-bit signed integer @tab @minus{}9,223,372,036,854,775,808 @tab 9,223,372,036,854,775,807 @item 64-bit unsigned integer @tab 0 @tab 18,446,744,073,709,551,615 +@end multitable +@end float + +@float Table,table-floating-point-ranges +@caption{Approximate value ranges for floating-point number representations} +@multitable @columnfractions .38 .22 .22 .23 @iftex -@item Single-precision floating point (approximate) @tab @math{1.175494^{-38}} @tab @math{3.402823^{38}} -@item Double-precision floating point (approximate) @tab @math{2.225074^{-308}} @tab @math{1.797693^{308}} +@headitem Representation @tab @w{Minimum positive} @w{nonzero value} @tab Minimum @w{finite value} @tab Maximum @w{finite value} +@end iftex +@ifnottex +@headitem Representation @tab Minimum positive nonzero value @tab Minimum finite value @tab Maximum finite value +@end ifnottex +@iftex +@item @w{Single-precision floating-point} @tab @math{1.175494 @cdot 10^{-38}} @tab @math{-3.402823 @cdot 10^{38}} @tab @math{3.402823 @cdot 10^{38}} +@item @w{Double-precision floating-point} @tab @math{2.225074 @cdot 10^{-308}} @tab @math{-1.797693 @cdot 10^{308}} @tab @math{1.797693 @cdot 10^{308}} +@item @w{Quadruple-precision floating-point} @tab @math{3.362103 @cdot 10^{-4932}} @tab @math{-1.189731 @cdot 10^{4932}} @tab @math{1.189731 @cdot 10^{4932}} @end iftex @ifinfo -@item Single-precision floating point (approximate) @tab 1.175494e-38 @tab 3.402823e38 -@item Double-precision floating point (approximate) @tab 2.225074e-308 @tab 1.797693e308 +@item Single-precision floating-point @tab 1.175494e-38 @tab -3.402823e+38 @tab 3.402823e+38 +@item Double-precision floating-point @tab 2.225074e-308 @tab -1.797693e+308 @tab 1.797693e+308 +@item Quadruple-precision floating-point @tab 3.362103e-4932 @tab -1.189731e+4932 @tab 1.189731e+4932 @end ifinfo @ifnottex @ifnotinfo -@item Single-precision floating point (approximate) @tab 1.175494@sup{-38} @tab 3.402823@sup{38} -@item Double-precision floating point (approximate) @tab 2.225074@sup{-308} @tab 1.797693@sup{308} +@item Single-precision floating-point @tab 1.175494*10@sup{-38} @tab -3.402823*10@sup{38} @tab 3.402823*10@sup{38} +@item Double-precision floating-point @tab 2.225074*10@sup{-308} @tab -1.797693*10@sup{308} @tab 1.797693*10@sup{308} +@item Quadruple-precision floating-point @tab 3.362103*10@sup{-4932} @tab -1.189731*10@sup{4932} @tab 1.189731*10@sup{4932} @end ifnotinfo @end ifnottex @end multitable @@ -31622,12 +31823,14 @@ You have to decide how small a delta is important to you. Code to do this looks something like the following: @example +@group delta = 0.00001 # for example difference = abs(a) - abs(b) # subtract the two values if (difference < delta) # all ok else # not ok +@end group @end example @noindent @@ -32097,6 +32300,7 @@ choose to set: @example @c file eg/prog/pi.awk +@group # pi.awk --- compute the digits of pi @c endfile @c endfile @@ -32112,6 +32316,7 @@ choose to set: BEGIN @{ digits = 100000 two = 2 * 10 ^ digits +@end group pi = two for (m = digits * 4; m > 0; --m) @{ d = m * 2 + 1 @@ -33078,6 +33283,7 @@ of the function using the macro. For example, you might allocate a string value like so: @example +@group awk_value_t result; char *message; const char greet[] = "Don't Panic!"; @@ -33085,8 +33291,10 @@ const char greet[] = "Don't Panic!"; emalloc(message, char *, sizeof(greet), "myfunc"); strcpy(message, greet); make_malloced_string(message, strlen(message), & result); +@end group @end example +@sp 2 @item #define ezalloc(pointer, type, size, message) @dots{} This is like @code{emalloc()}, but it calls @code{gawk_calloc()} instead of @code{gawk_malloc()}. @@ -33222,6 +33430,7 @@ registering parts of your extension with @command{gawk}. Extension functions are described by the following record: @example +@group typedef struct awk_ext_func @{ @ @ @ @ const char *name; @ @ @ @ awk_value_t *(*const function)(int num_actual_args, @@ -33232,6 +33441,7 @@ typedef struct awk_ext_func @{ @ @ @ @ awk_bool_t suppress_lint; @ @ @ @ void *data; /* opaque pointer to any extra state */ @} awk_ext_func_t; +@end group @end example The fields are: @@ -33427,12 +33637,14 @@ Your extension should package these functions inside an @code{awk_input_parser_t}, which looks like this: @example +@group typedef struct awk_input_parser @{ const char *name; /* name of parser */ awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf); awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf); awk_const struct awk_input_parser *awk_const next; /* for gawk */ @} awk_input_parser_t; +@end group @end example The fields are: @@ -34185,6 +34397,7 @@ to a global variable or array. It is an optimization that avoids looking up variables in @command{gawk}'s symbol table every time access is needed. This was discussed earlier, in @ref{General Data Types}. +@need 1500 The following functions let you work with scalar cookies: @table @code @@ -34247,12 +34460,14 @@ your extension's variable in @command{gawk}'s symbol table using using @code{sym_lookup()}: @example +@group static awk_scalar_t magic_var_cookie; /* cookie for MAGIC_VAR */ static void my_extension_init() @{ awk_value_t value; +@end group /* install initial value */ sym_update("MAGIC_VAR", make_number(42.0, & value)); @@ -34756,10 +34971,12 @@ Finally, because everything was successful, the function sets the return value to success, and returns: @example +@group make_number(1.0, result); out: return result; @} +@end group @end example Here is the output from running this part of the test: @@ -34971,7 +35188,7 @@ BEGIN @{ Here is the result of running the script: @example -$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} +$ @kbd{AWKLIBPATH=$PWD gawk -f subarray.awk} @print{} new_array["subarray"]["foo"] = bar @print{} new_array["hello"] = world @print{} new_array["answer"] = 42 @@ -35110,7 +35327,7 @@ It is up to the extension to decide if there are API incompatibilities. Typically, a check like this is enough: @example -if (api->major_version != GAWK_API_MAJOR_VERSION +if ( api->major_version != GAWK_API_MAJOR_VERSION || api->minor_version < GAWK_API_MINOR_VERSION) @{ fprintf(stderr, "foo_extension: version mismatch with gawk!\n"); fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n", @@ -35211,10 +35428,12 @@ as described here. The boilerplate needed is also provided in comments in the @file{gawkapi.h} header file: @example +@group /* Boilerplate code: */ int plugin_is_GPL_compatible; static gawk_api_t *const api; +@end group static awk_ext_id_t ext_id; static const char *ext_version = NULL; /* or @dots{} = "some string" */ @@ -35615,10 +35834,12 @@ The second is a pointer to an @code{awk_value_t} structure, usually named @code{result}: @example +@group /* do_chdir --- provide dynamically loaded chdir() function for gawk */ static awk_value_t * do_chdir(int nargs, awk_value_t *result, struct awk_ext_func *unused) +@end group @{ awk_value_t newdir; int ret = -1; @@ -35745,7 +35966,7 @@ fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf) #endif #ifdef S_IFDOOR /* Solaris weirdness */ @{ S_IFDOOR, "door" @}, -#endif /* S_IFDOOR */ +#endif @}; int j, k; @end example @@ -35788,9 +36009,11 @@ certain members and/or the type of the file. It then returns zero, for success: @example +@group #ifdef HAVE_STRUCT_STAT_ST_BLKSIZE array_set_numeric(array, "blksize", sbuf->st_blksize); -#endif /* HAVE_STRUCT_STAT_ST_BLKSIZE */ +#endif +@end group pmode = format_mode(sbuf->st_mode); array_set(array, "pmode", make_const_string(pmode, strlen(pmode), @@ -35879,20 +36102,24 @@ Next, it gets the information for the file. If the called function /* stat the file; if error, set ERRNO and return */ ret = statfunc(name, & sbuf); +@group if (ret < 0) @{ update_ERRNO_int(errno); return make_number(ret, result); @} +@end group @end example The tedious work is done by @code{fill_stat_array()}, shown earlier. When done, the function returns the result from @code{fill_stat_array()}: @example +@group ret = fill_stat_array(name, array, & sbuf); return make_number(ret, result); @} +@end group @end example Finally, it's necessary to provide the ``glue'' that loads the @@ -41580,14 +41807,24 @@ like this: @code{""}. Humans are used to working in decimal; i.e., base 10. In base 10, numbers go from 0 to 9, and then ``roll over'' into the next +@iftex +column. (Remember grade school? @math{42 = 4\times 10 + 2}.) +@end iftex +@ifnottex column. (Remember grade school? 42 = 4 x 10 + 2.) +@end ifnottex There are other number bases though. Computers commonly use base 2 or @dfn{binary}, base 8 or @dfn{octal}, and base 16 or @dfn{hexadecimal}. In binary, each column represents two times the value in the column to its right. Each column may contain either a 0 or a 1. +@iftex +Thus, binary 1010 represents @math{(1\times 8) + (0\times 4) + (1\times 2) + (0\times 1)}, or decimal 10. +@end iftex +@ifnottex Thus, binary 1010 represents (1 x 8) + (0 x 4) + (1 x 2) + (0 x 1), or decimal 10. +@end ifnottex Octal and hexadecimal are discussed more in @ref{Nondecimal-numbers}. @@ -41727,7 +41964,12 @@ electronic circuitry works ``naturally'' in base 2 (just think of Off/On), everything inside a computer is calculated using base 2. Each digit represents the presence (or absence) of a power of 2 and is called a @dfn{bit}. So, for example, the base-two number @code{10101} is +@iftex +the same as decimal 21, (@math{(1\times 16) + (1\times 4) + (1\times 1)}). +@end iftex +@ifnottex the same as decimal 21, ((1 x 16) + (1 x 4) + (1 x 1)). +@end ifnottex Since base-two numbers quickly become very long to read and write, they are usually grouped by 3 (i.e., they are @@ -41898,7 +42140,7 @@ See also ``Interpreter.'' @item Complemented Bracket Expression The negation of a @dfn{bracket expression}. All that is @emph{not} described by a given bracket expression. The symbol @samp{^} precedes -the negated bracket expression. E.g.: @samp{[[^:digit:]} +the negated bracket expression. E.g.: @samp{[^[:digit:]]} designates whatever character is not a digit. @samp{[^bad]} designates whatever character is not one of the letters @samp{b}, @samp{a}, or @samp{d}. @@ -42167,7 +42409,12 @@ Base 16 notation, where the digits are @code{0}--@code{9} and @code{A}--@code{F}, with @samp{A} representing 10, @samp{B} representing 11, and so on, up to @samp{F} for 15. Hexadecimal numbers are written in C using a leading @samp{0x}, +@iftex +to indicate their base. Thus, @code{0x12} is 18 (@math{(1\times 16) + 2}). +@end iftex +@ifnottex to indicate their base. Thus, @code{0x12} is 18 ((1 x 16) + 2). +@end ifnottex @xref{Nondecimal-numbers}. @item I/O @@ -42231,7 +42478,7 @@ meaning. Keywords are reserved and may not be used as variable names. @code{break}, @code{case}, @code{continue}, -@code{default} +@code{default}, @code{delete}, @code{do@dots{}while}, @code{else}, @@ -42317,7 +42564,12 @@ Ancient @command{awk} implementations used single precision floating-point. @item Octal Base-eight notation, where the digits are @code{0}--@code{7}. Octal numbers are written in C using a leading @samp{0}, +@iftex +to indicate their base. Thus, @code{013} is 11 (@math{(1\times 8) + 3}). +@end iftex +@ifnottex to indicate their base. Thus, @code{013} is 11 ((1 x 8) + 3). +@end ifnottex @xref{Nondecimal-numbers}. @item Output Record |