diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 819 |
1 files changed, 764 insertions, 55 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index f6114b4e..b711949b 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -562,6 +562,7 @@ particular records in a file and perform operations upon them. * Computed Regexps:: Using Dynamic Regexps. * GNU Regexp Operators:: Operators specific to GNU software. * Case-sensitivity:: How to do case-insensitive matching. +* Strong Regexp Constants:: Strongly typed regexp constants. * Regexp Summary:: Regular expressions summary. * Records:: Controlling how data is split into records. @@ -604,6 +605,7 @@ particular records in a file and perform operations upon them. @code{getline}. * Getline Summary:: Summary of @code{getline} Variants. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -633,6 +635,7 @@ particular records in a file and perform operations upon them. * Special Caveats:: Things to watch out for. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. * Values:: Constants, Variables, and Regular @@ -944,6 +947,7 @@ particular records in a file and perform operations upon them. * Array Functions:: Functions for working with arrays. * Flattening Arrays:: How to flatten arrays. * Creating Arrays:: How to create and populate arrays. +* Redirection API:: How to access and manipulate redirections. * Extension API Variables:: Variables provided by the API. * Extension Versioning:: API Version information. * Extension API Informational Variables:: Variables providing information about @@ -1002,6 +1006,7 @@ particular records in a file and perform operations upon them. * Unix Installation:: Installing @command{gawk} under various versions of Unix. * Quick Installation:: Compiling @command{gawk} under Unix. +* Shell Startup Files:: Shell convenience functions. * Additional Configuration Options:: Other compile-time options. * Configuration Philosophy:: How it's all supposed to work. * Non-Unix Installation:: Installation on Other Operating @@ -4084,10 +4089,8 @@ No space is allowed between the @option{-o} and @var{file}, if @var{file} is supplied. @quotation NOTE -Due to the way @command{gawk} has evolved, with this option -your program still executes. This will change in the -next major release, such that @command{gawk} will only -pretty-print the program and not run it. +In the past, this option would also execute your program. +This is no longer the case. @end quotation @item @option{-O} @@ -4133,11 +4136,6 @@ restrictions apply: @cindex newlines @cindex whitespace, newlines as @item -Newlines do not act as whitespace to separate fields when @code{FS} is -equal to a single space -(@pxref{Fields}). - -@item Newlines are not allowed after @samp{?} or @samp{:} (@pxref{Conditional Exp}). @@ -4488,6 +4486,9 @@ searches first in the current directory and then in @file{/usr/local/share/awk}. In practice, this means that you will rarely need to change the value of @env{AWKPATH}. +@xref{Shell Startup Files}, for information on functions that help to +manipulate the @env{AWKPATH} variable. + @command{gawk} places the value of the search path that it used into @code{ENVIRON["AWKPATH"]}. This provides access to the actual search path value from within an @command{awk} program. @@ -4519,6 +4520,9 @@ an empty value, @command{gawk} uses a default path; this is typically @samp{/usr/local/lib/gawk}, although it can vary depending upon how @command{gawk} was built. +@xref{Shell Startup Files}, for information on functions that help to +manipulate the @env{AWKLIBPATH} variable. + @command{gawk} places the value of the search path that it used into @code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search path value from within an @command{awk} program. @@ -4546,6 +4550,8 @@ wait for input before returning with an error. Controls the number of times @command{gawk} attempts to retry a two-way TCP/IP (socket) connection before giving up. @xref{TCP/IP Networking}. +Note that when nonfatal I/O is enabled (@pxref{Nonfatal}), +@command{gawk} only tries to open a TCP/IP socket once. @item POSIXLY_CORRECT Causes @command{gawk} to switch to POSIX-compatibility @@ -4595,14 +4601,6 @@ two regexp matchers that @command{gawk} uses internally. (There aren't supposed to be differences, but occasionally theory and practice don't coordinate with each other.) -@item GAWK_NO_PP_RUN -When @command{gawk} is invoked with the @option{--pretty-print} option, -it will not run the program if this environment variable exists. - -@quotation CAUTION -This variable will not survive into the next major release. -@end quotation - @item GAWK_STACKSIZE This specifies the amount by which @command{gawk} should grow its internal evaluation stack, when needed. @@ -4900,6 +4898,32 @@ Similarly, you may use @code{print} or @code{printf} statements in the @var{init} and @var{increment} parts of a @code{for} loop. This is another long-undocumented ``feature'' of Unix @command{awk}. +@command{gawk} lets you use the names of built-in functions that are +@command{gawk} extensions as the names of parameters in user-defined functions. +This is intended to ``future-proof'' old code that happens to use +function names added by @command{gawk} after the code was written. +Standard @command{awk} built-in functions, such as @code{sin()} or +@code{substr()} are @emph{not} shadowed in this way. + +The @code{PROCINFO["argv"]} array contains all of the command-line arguments +(after glob expansion and redirection processing on platforms where that must +be done manually by the program) with subscripts ranging from 0 through +@code{argc} @minus{} 1. For example, @code{PROCINFO["argv"][0]} will contain +the name by which @command{gawk} was invoked. Here is an example of how this +feature may be used: + +@example +awk ' +BEGIN @{ + for (i = 0; i < length(PROCINFO["argv"]); i++) + print i, PROCINFO["argv"][i] +@}' +@end example + +Please note that this differs from the standard @code{ARGV} array which does +not include command-line arguments that have already been processed by +@command{gawk} (@pxref{ARGC and ARGV}). + @end ignore @node Invoking Summary @@ -4992,6 +5016,7 @@ regular expressions work, we present more complicated instances. * Computed Regexps:: Using Dynamic Regexps. * GNU Regexp Operators:: Operators specific to GNU software. * Case-sensitivity:: How to do case-insensitive matching. +* Strong Regexp Constants:: Strongly typed regexp constants. * Regexp Summary:: Regular expressions summary. @end menu @@ -5182,17 +5207,21 @@ between @samp{0} and @samp{7}. For example, the code for the ASCII ESC @item \x@var{hh}@dots{} The hexadecimal value @var{hh}, where @var{hh} stands for a sequence of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F} -or @samp{a}--@samp{f}). Like the same construct -in ISO C, the escape sequence continues until the first nonhexadecimal -digit is seen. @value{COMMONEXT} -However, using more than two hexadecimal digits produces -undefined results. (The @samp{\x} escape sequence is not allowed in -POSIX @command{awk}.) +or @samp{a}--@samp{f}). A maximum of two digts are allowed after +the @samp{\x}. Any further hexadecimal digits are treated as simple +letters or numbers. @value{COMMONEXT} +(The @samp{\x} escape sequence is not allowed in POSIX awk.) @quotation CAUTION -The next major release of @command{gawk} will change, such -that a maximum of two hexadecimal digits following the -@samp{\x} will be used. +In ISO C, the escape sequence continues until the first nonhexadecimal +digit is seen. +For many years, @command{gawk} would continue incorporating +hexadecimal digits into the value until a non-hexadecimal digit +or the end of the string was encountered. +However, using more than two hexadecimal digits produced +undefined results. +As of @value{PVERSION} 4.2, only two digits +are processed. @end quotation @cindex @code{\} (backslash), @code{\/} escape sequence @@ -6235,6 +6264,89 @@ The value of @code{IGNORECASE} has no effect if @command{gawk} is in compatibility mode (@pxref{Options}). Case is always significant in compatibility mode. +@node Strong Regexp Constants +@section Strongly Typed Regexp Constants + +This @value{SECTION} describes a @command{gawk}-specific feature. + +Regexp constants (@code{/@dots{}/}) hold a strange position in the +@command{awk} language. In most contexts, they act like an expression: +@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to +be matched. In no case are they really a ``first class citizen'' of the +language. That is, you cannot define a scalar variable whose type is +``regexp'' in the same sense that you can define a variable to be a +number or a string: + +@example +num = 42 @ii{Numeric variable} +str = "hi" @ii{String variable} +re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/ +@end example + +For a number of more advanced use cases (described later on in this +@value{DOCUMENT}), it would be nice to have regexp constants that +are @dfn{strongly typed}; in other words, that denote a regexp useful +for matching, and not an expression. + +@command{gawk} provides this feature. A strongly typed regexp constant +looks almost like a regular regexp constant, except that it is preceded +by an @samp{@@} sign: + +@example +re = @@/foo/ @ii{Regexp variable} +@end example + +Strongly typed regexp constants @emph{cannot} be used eveywhere that a +regular regexp constant can, because this would make the language even more +confusing. Instead, you may use them only in certain contexts: + +@itemize @bullet +@item +On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/} +(@pxref{Regexp Usage}). + +@item +In the @code{case} part of a @code{switch} statement +(@pxref{Switch Statement}). + +@item +As an argument to one of the built-in functions that accept regexp constants: +@code{gensub()}, +@code{gsub()}, +@code{match()}, +@code{patsplit()}, +@code{split()}, +and +@code{sub()} +(@pxref{String Functions}). + +@item +As a parameter in a call to a user-defined function +(@pxref{User-defined}). + +@item +On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}. +In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var} +can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions +listed above, or passed as a parameter to a user-defined function. +@end itemize + +You may use the @code{typeof()} built-in function +(@pxref{Type Functions}) +to determine if a variable or function parameter is +a regexp variable. + +The true power of this feature comes from the ability to create variables that +have regexp type. Such variables can be passed on to user-defined functions, +without the confusing aspects of computed regular expressions created from +strings or string constants. They may also be passed through indirect function +calls (@pxref{Indirect Calls}) +onto the built-in functions that accept regexp constants. + +When used in numeric conversions, strongly typed regexp variables convert +to zero. When used in string conversions, they convert to the string +value of the original regexp text. + @node Regexp Summary @section Summary @@ -6278,6 +6390,11 @@ treated as regular expressions). case sensitivity of regexp matching. In other @command{awk} versions, use @code{tolower()} or @code{toupper()}. +@item +Strongly typed regexp constants (@code{@@/.../}) enable +certain advanced use cases to be described later on in the +@value{DOCUMENT}. + @end itemize @@ -6325,6 +6442,7 @@ used with it do not have to be named on the @command{awk} command line * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. +* Retrying Input:: Retrying input after certain errors. * Command-line directories:: What happens if you put a directory on the command line. * Input Summary:: Input summary. @@ -6699,16 +6817,12 @@ Readfile} for another option. @cindex fields @cindex accessing fields @cindex fields, examining -@cindex POSIX @command{awk}, field separators and -@cindex field separators, POSIX and -@cindex separators, field, POSIX and When @command{awk} reads an input record, the record is automatically @dfn{parsed} or separated by the @command{awk} utility into chunks called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, like words in a line. Whitespace in @command{awk} means any string of one or more spaces, -TABs, or newlines;@footnote{In POSIX @command{awk}, newlines are not -considered whitespace for separating fields.} other characters +TABs, or newlines; other characters that are considered whitespace by other languages (such as formfeed, vertical tab, etc.) are @emph{not} considered whitespace by @command{awk}. @@ -7153,7 +7267,6 @@ can massage it first with a separate @command{awk} program.) @node Default Field Splitting @subsection Whitespace Normally Separates Fields -@cindex newlines, as field separators @cindex whitespace, as field separators Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not @@ -8116,6 +8229,13 @@ a record, such as a file that cannot be opened, then @code{getline} returns @minus{}1. In this case, @command{gawk} sets the variable @code{ERRNO} to a string describing the error that occurred. +If @code{ERRNO} indicates that the I/O operation may be +retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set, +then @code{getline} returns @minus{}2 +instead of @minus{}1, and further calls to @code{getline} +may be attemped. @DBXREF{Retrying Input} for further information about +this feature. + In the following examples, @var{command} stands for a string value that represents a shell command. @@ -8770,7 +8890,8 @@ on a per-command or per-connection basis. the attempt to read from the underlying device may succeed in a later attempt. This is a limitation, and it also means that you cannot use this to multiplex input from -two or more sources. +two or more sources. @DBXREF{Retrying Input} for a way to enable +later I/O attempts to succeed. Assigning a timeout value prevents read operations from blocking indefinitely. But bear in mind that there are other ways @@ -8780,6 +8901,36 @@ a connection before it can start reading any data, or the attempt to open a FIFO special file for reading can block indefinitely until some other process opens it for writing. +@node Retrying Input +@section Retrying Reads After Certain Input Errors +@cindex retrying input + +@cindex differences in @command{awk} and @command{gawk}, retrying input +This @value{SECTION} describes a feature that is specific to @command{gawk}. + +When @command{gawk} encounters an error while reading input, by +default @code{getline} returns @minus{}1, and subsequent attempts to +read from that file result in an end-of-file indication. However, you +may optionally instruct @command{gawk} to allow I/O to be retried when +certain errors are encountered by setting a special element in +the @code{PROCINFO} array (@pxref{Auto-set}): + +@example +PROCINFO["@var{input_name}", "RETRY"] = 1 +@end example + +When this element exists, @command{gawk} checks the value of the system +(C language) +@code{errno} variable when an I/O error occurs. If @code{errno} indicates +a subsequent I/O attempt may succeed, @code{getline} instead returns +@minus{}2 and +further calls to @code{getline} may succeed. This applies to the @code{errno} +values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}. + +This feature is useful in conjunction with +@code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file +descriptor has been configured to behave in a non-blocking fashion. + @node Command-line directories @section Directories on the Command Line @cindex differences in @command{awk} and @command{gawk}, command-line directories @@ -8941,6 +9092,7 @@ and discusses the @code{close()} built-in function. @command{gawk} allows access to inherited file descriptors. * Close Files And Pipes:: Closing Input and Output Files and Pipes. +* Nonfatal:: Enabling Nonfatal Output. * Output Summary:: Output summary. * Output Exercises:: Exercises. @end menu @@ -10446,6 +10598,70 @@ when closing a pipe. @end cartouche @end ifnotdocbook +@node Nonfatal +@section Enabling Nonfatal Output + +This @value{SECTION} describes a @command{gawk}-specific feature. + +In standard @command{awk}, output with @code{print} or @code{printf} +to a nonexistent file, or some other I/O error (such as filling up the +disk) is a fatal error. + +@example +$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'} +@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory) +@end example + +@command{gawk} makes it possible to detect that an error has +occurred, allowing you to possibly recover from the error, or +at least print an error message of your choosing before exiting. +You can do this in one of two ways: + +@itemize @bullet +@item +For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}. + +@item +On a per-file basis, by assigning any value to +@code{PROCINFO[@var{filename}, "NONFATAL"]}. +Here, @var{filename} is the name of the file to which +you wish output to be nonfatal. +@end itemize + +Once you have enabled nonfatal output, you must check @code{ERRNO} +after every relevant @code{print} or @code{printf} statement to +see if something went wrong. It is also a good idea to initialize +@code{ERRNO} to zero before attempting the output. For example: + +@example +$ @kbd{gawk '} +> @kbd{BEGIN @{} +> @kbd{ PROCINFO["NONFATAL"] = 1} +> @kbd{ ERRNO = 0} +> @kbd{ print "hi" > "/no/such/file"} +> @kbd{ if (ERRNO) @{} +> @kbd{ print("Output failed:", ERRNO) > "/dev/stderr"} +> @kbd{ exit 1} +> @kbd{ @}} +> @kbd{@}'} +@error{} Output failed: No such file or directory +@end example + +Here, @command{gawk} did not produce a fatal error; instead +it let the @command{awk} program code detect the problem and handle it. + +This mechanism works also for standard output and standard error. +For standard output, you may use @code{PROCINFO["-", "NONFATAL"]} +or @code{PROCINFO["/dev/stdout", "NONFATAL"]}. For standard error, use +@code{PROCINFO["/dev/stderr", "NONFATAL"]}. + +When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}), +@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES} +environment variable (@pxref{Other Environment Variables}) allows you to +override @command{gawk}'s builtin default number of attempts. However, +once nonfatal I/O is enabled for a given socket, @command{gawk} only +retries once, relying on @command{awk}-level code to notice that there +was a problem. @node Output Summary @section Summary @@ -10475,6 +10691,12 @@ Use @code{close()} to close open file, pipe, and coprocess redirections. For coprocesses, it is possible to close only one direction of the communications. +@item +Normally errors with @code{print} or @code{printf} are fatal. +@command{gawk} lets you make output errors be nonfatal either for +all files or on a per-file basis. You must then check for errors +after every relevant output statement. + @end itemize @c EXCLUDE START @@ -14579,12 +14801,11 @@ specify the behavior when @code{FS} is the null string. Nonetheless, some other versions of @command{awk} also treat @code{""} specially.) -@cindex POSIX @command{awk}, @code{FS} variable and The default value is @w{@code{" "}}, a string consisting of a single -space. As a special exception, this value means that any -sequence of spaces, TABs, and/or newlines is a single separator.@footnote{In -POSIX @command{awk}, newline does not count as whitespace.} It also causes -spaces, TABs, and newlines at the beginning and end of a record to be ignored. +space. As a special exception, this value means that any sequence of +spaces, TABs, and/or newlines is a single separator. It also causes +spaces, TABs, and newlines at the beginning and end of a record to +be ignored. You can set the value of @code{FS} on the command line using the @option{-F} option: @@ -14808,10 +15029,24 @@ opens the next file. An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @code{"/home/arnold"}. Changing this array -does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system()} function. -(In a future version of @command{gawk}, it may do so.) +@code{ENVIRON["HOME"]} might be @code{/home/arnold}. + +For POSIX @command{awk}, changing this array does not affect the +environment passed on to any programs that @command{awk} may spawn via +redirection or the @code{system()} function. + +However, beginning with version 4.2, if not in POSIX +compatibility mode, @command{gawk} does update its own environment when +@code{ENVIRON} is changed, thus changing the environment seen by programs +that it creates. You should therefore be especially careful if you +modify @code{ENVIRON["PATH"]}, which is the search path for finding +executable programs. + +This can also affect the running @command{gawk} program, since some of the +built-in functions may pay attention to certain environment variables. +The most notable instance of this is @code{mktime()} (@pxref{Time +Functions}), which pays attention the value of the @env{TZ} environment +variable on many systems. Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @@ -14845,6 +15080,11 @@ value to be meaningful when an I/O operation returns a failure value, such as @code{getline} returning @minus{}1. You are, of course, free to clear it yourself before doing an I/O operation. +If the value of @code{ERRNO} corresponds to a system error in the C +@code{errno} variable, then @code{PROCINFO["errno"]} will be set to the value +of @code{errno}. For non-system errors, @code{PROCINFO["errno"]} will +be zero. + @cindex @code{FILENAME} variable @cindex dark corner, @code{FILENAME} variable @item @code{FILENAME} @@ -14913,6 +15153,10 @@ are guaranteed to be available: @item PROCINFO["egid"] The value of the @code{getegid()} system call. +@item PROCINFO["errno"] +The value of the C @code{errno} variable when @code{ERRNO} is set to +the associated error message. + @item PROCINFO["euid"] @cindex effective user ID of @command{gawk} user The value of the @code{geteuid()} system call. @@ -15036,6 +15280,14 @@ to test for these elements The following elements allow you to change @command{gawk}'s behavior: @table @code +@item PROCINFO["NONFATAL"] +If this element exists, then I/O errors for all output redirections become nonfatal. +@DBXREF{Nonfatal}. + +@item PROCINFO["@var{output_name}", "NONFATAL"] +Make output errors for @var{output_name} be nonfatal. +@DBXREF{Nonfatal}. + @item PROCINFO["@var{command}", "pty"] For two-way communication to @var{command}, use a pseudo-tty instead of setting up a two-way pipe. @@ -16958,6 +17210,23 @@ truncated toward zero. For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} is @minus{}3, and @code{int(-3)} is @minus{}3 as well. +@item @code{intdiv(@var{numerator}, @var{denominator}, @var{result})} +@cindexawkfunc{intdiv} +@cindex intdiv +Perform integer division, similar to the standard C function of the +same name. First, truncate @code{numerator} and @code{denominator} +towards zero, creating integer values. Clear the @code{result} +array, and then set @code{result["quotient"]} to the result of +@samp{numerator / denominator}, truncated towards zero to an integer, +and set @code{result["remainder"]} to the result of @samp{numerator % +denominator}, truncated towards zero to an integer. This function is +primarily intended for use with arbitrary length integers; it avoids +creating MPFR arbitrary precision floating-point values (@pxref{Arbitrary +Precision Integers}). + +This function is a @code{gawk} extension. It is not available in +compatibility mode (@pxref{Options}). + @item @code{log(@var{x})} @cindexawkfunc{log} @cindex logarithm @@ -19209,16 +19478,70 @@ results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions. @node Type Functions @subsection Getting Type Information -@command{gawk} provides a single function that lets you distinguish -an array from a scalar variable. This is necessary for writing code +@command{gawk} provides two functions that lets you distinguish +the type of a variable. +This is necessary for writing code that traverses every element of an array of arrays -(@pxref{Arrays of Arrays}). +(@pxref{Arrays of Arrays}), and in other contexts. @table @code @cindexgawkfunc{isarray} @cindex scalar or array @item isarray(@var{x}) Return a true value if @var{x} is an array. Otherwise, return false. + +@cindexgawkfunc{typeof} +@cindex variable type +@cindex type, of variable +@item typeof(@var{x}) +Return one of the following strings, depending upon the type of @var{x}: + +@c nested table +@table @code +@item "array" +@var{x} is an array. + +@item "regexp" +@var{x} is a strongly typed regexp (@pxref{Strong Regexp Constants}). + +@item "number" +@var{x} is a number. + +@item "string" +@var{x} is a string. + +@item "strnum" +@var{x} is a string that might be a number, such as a field or +the result of calling @code{split()}. (I.e., @var{x} has the STRNUM +attribute; @pxref{Variable Typing}.) + +@item "unassigned" +@var{x} is a scalar variable that has not been assigned a value yet. +For example: + +@example +BEGIN @{ + a[1] # creates a[1] but it has no assigned value + print typeof(a[1]) # scalar_u +@} +@end example + +@item "untyped" +@var{x} has not yet been used yet at all; it can become a scalar or an +array. +For example: + +@example +BEGIN @{ + print typeof(x) # x never used --> untyped + mk_arr(x) + print typeof(x) # x now an array --> array +@} + +function mk_arr(a) @{ a[1] = 1 @} +@end example + +@end table @end table @code{isarray()} is meant for use in two circumstances. The first is when @@ -19236,6 +19559,14 @@ that has not been previously used to @code{isarray()}, @command{gawk} ends up turning it into a scalar. @end quotation +The @code{typeof()} function is general; it allows you to determine +if a variable or function parameter is a scalar, an array, or a strongly +typed regexp. + +@code{isarray()} is deprecated; you should use @code{typeof()} instead. +You should replace any existing uses of @samp{isarray(var)} in your +code with @samp{typeof(var) == "array"}. + @node I18N Functions @subsection String-Translation Functions @cindex @command{gawk}, string-translation functions @@ -27768,8 +28099,7 @@ The profiled version of your program may not look exactly like what you typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty-printing'' its internal representation of the program. The advantage to this is that @command{gawk} can produce -a standard representation. The disadvantage is that all source code -comments are lost. +a standard representation. Also, things such as: @example @@ -27863,10 +28193,26 @@ When called this way, @command{gawk} ``pretty-prints'' the program into @file{awkprof.out}, without any execution counts. @quotation NOTE -The @option{--pretty-print} option still runs your program. -This will change in the next major release. +Once upon a time, the @option{--pretty-print} option would also run +your program. This is is no longer the case. @end quotation +There is a significant difference between the output created when +profiling, and that created when pretty-printing. Pretty-printed output +preserves the original comments that were in the program, although their +placement may not correspond exactly to their original locations in the +source code. + +However, as a deliberate design decision, profiling output @emph{omits} +the original program's comments. This allows you to focus on the +execution count data and helps you avoid the temptation to use the +profiler for pretty-printing. + +Additionally, pretty-printed output does not have the leading indentation +that the profiling output does. This makes it easy to pretty-print your +code once development is completed, and then use the result as the final +version of your program. + @node Advanced Features Summary @section Summary @@ -30078,6 +30424,65 @@ executing, short programs. The @command{gawk} debugger only accepts source code supplied with the @option{-f} option. @end itemize +One other point is worth disucssing. Conventional debuggers run in a +separate process (and thus address space) from the programs that they +debug (the @dfn{debuggee}, if you will). + +The @command{gawk} debugger is different; it is an integrated part +of @command{gawk} itself. This makes it possible, in rare cases, +for @command{gawk} to become an excellent demonstrator of Heisenburg +Uncertainty physics, where the mere act of observing something can change +it. Consider the following:@footnote{Thanks to Hermann Peifer for +this example.} + +@example +$ @kbd{cat test.awk} +@print{} @{ print typeof($1), typeof($2) @} +$ @kbd{cat test.data} +@print{} abc 123 +$ @kbd{gawk -f test.awk test.data} +@print{} strnum strnum +@end example + +This is all as expected: field data has the STRNUM attribute +(@pxref{Variable Typing}). Now watch what happens when we run +this program under the debugger: + +@example +$ @kbd{gawk -D -f test.awk test.data} +gawk> @kbd{w $1} @ii{Set watchpoint on} $1 +@print{} Watchpoint 1: $1 +gawk> @kbd{w $2} @ii{Set watchpoint on} $2 +@print{} Watchpoint 2: $2 +gawk> @kbd{r} @ii{Start the program} +@print{} Starting program: +@print{} Stopping in Rule ... +@print{} Watchpoint 1: $1 @ii{Watchpoint fires} +@print{} Old value: "" +@print{} New value: "abc" +@print{} main() at `test.awk':1 +@print{} 1 @{ print typeof($1), typeof($2) @} +gawk> @kbd{n} @ii{Keep going @dots{}} +@print{} Watchpoint 2: $2 @ii{Watchpoint fires} +@print{} Old value: "" +@print{} New value: "123" +@print{} main() at `test.awk':1 +@print{} 1 @{ print typeof($1), typeof($2) @} +gawk> @kbd{n} @ii{Get result from} typeof() +@print{} strnum string @ii{Result for} $2 @ii{isn't right} +@print{} Program exited normally with exit value: 0 +gawk> @kbd{quit} +@end example + +In this case, the act of comparing the new value of @code{$2} +with the old one caused @command{gawk} to evaluate it and determine that it +is indeed a string, and this is reflected in the result of +@code{typeof()}. + +Cases like this where the debugger is not transparent to the program's +execution should be rare. If you encounter one, please report it +(@pxref{Bugs}). + @ignore Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! @@ -30114,6 +30519,10 @@ If the GNU Readline library is available when @command{gawk} is compiled, it is used by the debugger to provide command-line history and editing. +@item +Usually, the debugger does not not affect the +program being debugged, but occasionally it can. + @end itemize @node Arbitrary Precision Arithmetic @@ -30946,6 +31355,122 @@ to just use the following: gawk -M 'BEGIN @{ n = 13; print n % 2 @}' @end example +When dividing two arbitrary precision integers with either +@samp{/} or @samp{%}, the result is typically an arbitrary +precision floating point value (unless the denominator evenly +divides into the numerator). In order to do integer division +or remainder with arbitrary precision integers, use the built-in +@code{intdiv()} function (@pxref{Numeric Functions}). + +You can simulate the @code{intdiv()} function in standard @command{awk} +using this user-defined function: + +@example +@c file eg/lib/intdiv.awk +# intdiv --- do integer division + +@c endfile +@ignore +@c file eg/lib/intdiv.awk +# +# Arnold Robbins, arnold@@skeeve.com, Public Domain +# July, 2014 +# +# Name changed from div() to intdiv() +# April, 2015 + +@c endfile + +@end ignore +@c file eg/lib/intdiv.awk +function intdiv(numerator, denominator, result) +@{ + split("", result) + + numerator = int(numerator) + denominator = int(denominator) + result["quotient"] = int(numerator / denominator) + result["remainder"] = int(numerator % denominator) + + return 0.0 +@} +@c endfile +@end example + +The following example program, contributed by Katie Wasserman, +uses @code{intdiv()} to +compute the digits of @value{PI} to as many places as you +choose to set: + +@example +@c file eg/prog/pi.awk +# pi.awk --- compute the digits of pi +@c endfile +@c endfile +@ignore +@c file eg/prog/pi.awk +# +# Katie Wasserman, katie@@wass.net +# August 2014 +@c endfile +@end ignore +@c file eg/prog/pi.awk + +BEGIN @{ + digits = 100000 + two = 2 * 10 ^ digits + pi = two + for (m = digits * 4; m > 0; --m) @{ + d = m * 2 + 1 + x = pi * m + intdiv(x, d, result) + pi = result["quotient"] + pi = pi + two + @} + print pi +@} +@c endfile +@end example + +@ignore +Date: Wed, 20 Aug 2014 10:19:11 -0400 +To: arnold@skeeve.com +From: Katherine Wasserman <katie@wass.net> +Subject: Re: computation of digits of pi? + +Arnold, + +>The program that you sent to compute the digits of pi using div(). Is +>that some standard algorithm that every math student knows? If so, +>what's it called? + +It's not that well known but it's not that obscure either + +It's Euler's modification to Newton's method for calculating pi. + +Take a look at lines (23) - (25) here: http://mathworld.wolfram.com/PiFormulas.htm + +The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards. I used this to program HP calculators because it's quite easy to modify for tiny memory devices with smallish word sizes. + +http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899 + +-Katie +@end ignore + +When asked about the algorithm used, Katie replied: + +@quotation +It's not that well known but it's not that obscure either. +It's Euler's modification to Newton's method for calculating pi. +Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}. + +The algorithm I wrote simply expands the multiply by 2 and works from +the innermost expression outwards. I used this to program HP calculators +because it's quite easy to modify for tiny memory devices with smallish +word sizes. See +@uref{http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899}. +@end quotation + @node POSIX Floating Point Problems @section Standards Versus Existing Practice @@ -31345,6 +31870,7 @@ This (rather large) @value{SECTION} describes the API in detail. * Symbol Table Access:: Functions for accessing global variables. * Array Manipulation:: Functions for working with arrays. +* Redirection API:: How to access and manipulate redirections. * Extension API Variables:: Variables provided by the API. * Extension API Boilerplate:: Boilerplate code for using the API. @end menu @@ -31420,6 +31946,10 @@ Clearing an array @item Flattening an array for easy C-style looping over all its indices and elements @end itemize + +@item +Accessing and manipulating redirections. + @end itemize Some points about using the API: @@ -33390,6 +33920,75 @@ $ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk} (@DBXREF{Finding Extensions} for more information on the @env{AWKLIBPATH} environment variable.) +@node Redirection API +@subsection Accessing and Manipulating Redirections + +The following function allows extensions to access and manipulate redirections. + +@table @code +@item awk_bool_t get_file(const char *name, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t name_len, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const char *filetype, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int fd, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_input_buf_t **ibufp, +@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_output_buf_t **obufp); +Look up a file in @command{gawk}'s internal redirection table. +If @code{name} is @code{NULL} or @code{name_len} is zero, return +data for the currently open input file corresponding to @code{FILENAME}. +(This does not access the @code{filetype} argument, so that may be undefined). +If the file is not already open, attempt to open it. +The @code{filetype} argument must be zero-terminated and should be one of: + +@table @code +@item ">" +A file opened for output. + +@item ">>" +A file opened for append. + +@item "<" +A file opened for input. + +@item "|>" +A pipe opened for output. + +@item "|<" +A pipe opened for input. + +@item "|&" +A two-way coprocess. +@end table + +On error, return a @code{false} value. Otherwise, return +@code{true}, and return additional information about the redirection +in the @code{ibufp} and @code{obufp} pointers. For input +redirections, the @code{*ibufp} value should be non-@code{NULL}, +and @code{*obufp} should be @code{NULL}. For output redirections, +the @code{*obufp} value should be non-@code{NULL}, and @code{*ibufp} +should be @code{NULL}. For two-way coprocesses, both values should +be non-@code{NULL}. + +In the usual case, the extension is interested in @code{(*ibufp)->fd} +and/or @code{fileno((*obufp)->fp)}. If the file is not already +open, and the @code{fd} argument is non-negative, @command{gawk} +will use that file descriptor instead of opening the file in the +usual way. If @code{fd} is non-negative, but the file exists already, +@command{gawk} ignores @code{fd} and returns the existing file. It is +the caller's responsibility to notice that neither the @code{fd} in +the returned @code{awk_input_buf_t} nor the @code{fd} in the returned +@code{awk_output_buf_t} matches the requested value. + +Note that supplying a file descriptor is currently @emph{not} supported +for pipes. However, supplying a file descriptor should work for input, +output, append, and two-way (coprocess) sockets. If @code{filetype} +is two-way, @command{gawk} assumes that it is a socket! Note that in +the two-way case, the input and output file descriptors may differ. +To check for success, you must check whether either matches. +@end table + +It is anticipated that this API function will be used to implement I/O +multiplexing and a socket library. + @node Extension API Variables @subsection API Variables @@ -34954,18 +35553,21 @@ As of this writing, there are seven extensions: GD graphics library extension @item +MPFR library extension +(this provides access to a number of MPFR functions that @command{gawk}'s +native MPFR support does not) + +@item PDF extension @item PostgreSQL extension @item -MPFR library extension -(this provides access to a number of MPFR functions that @command{gawk}'s -native MPFR support does not) +Redis extension @item -Redis extension +Select extension @item XML parser extension, using the @uref{http://expat.sourceforge.net, Expat} @@ -35594,6 +36196,10 @@ Indirect function calls @item Directories on the command line produce a warning and are skipped (@pxref{Command-line directories}) + +@item +Output with @code{print} and @code{printf} need not be fatal +(@pxref{Nonfatal}) @end itemize @item @@ -35681,6 +36287,11 @@ The @code{isarray()} function to check if a variable is an array or not The @code{bindtextdomain()}, @code{dcgettext()}, and @code{dcngettext()} functions for internationalization (@pxref{Programmer i18n}) + +@item +The @code{intdiv()} function for doing integer +division and remainder +(@pxref{Numeric Functions}) @end itemize @item @@ -35813,6 +36424,16 @@ for @command{gawk} @value{PVERSION} 4.1: Ultrix @end itemize +@item +Support for the following systems was removed from the code +for @command{gawk} @value{PVERSION} 4.2: + +@c nested table +@itemize @value{MINUS} +@item +MirBSD +@end itemize + @end itemize @c XXX ADD MORE STUFF HERE @@ -36439,6 +37060,49 @@ Support for Ultrix was removed. @end itemize +Version 4.2 introduced the following changes: + +@itemize @bullet +@item +Changes to @code{ENVIRON} are reflected into @command{gawk}'s +environment and that of programs that it runs. +@xref{Auto-set}. + +@item +The @option{--pretty-print} option no longer runs the @command{awk} +program too. +@xref{Options}. + +@item +The @command{igawk} program and its manual page are no longer +installed when @command{gawk} is built. +@xref{Igawk Program}. + +@item +The @code{intdiv()} function. +@xref{Numeric Functions}. + +@item +The maximum number of hexadecimal digits in @samp{\x} escapes +is now two. +@xref{Escape Sequences}. + +@item +Nonfatal output with @code{print} and @code{printf}. +@xref{Nonfatal}. + +@item +For many years, POSIX specified that default field splitting +only allowed spaces and tabs to separate fields, and this was +how @command{gawk} behaved with @option{--posix}. As of 2013, +the standard restored historical behavior, and now default +field splitting with @option{--posix} also allows newlines to +separate fields. + +@item +Support for MirBSD was removed. +@end itemize + @c XXX ADD MORE STUFF HERE @end ifclear @@ -37106,6 +37770,8 @@ The generated Info file for The @command{troff} source for a manual page describing the @command{igawk} program presented in @ref{Igawk Program}. +(Since @command{gawk} can do its own @code{@@include} processing, +neither @command{igawk} nor @file{igawk.1} are installed.) @item doc/Makefile.in The input file used during the configuration process to generate the @@ -37150,8 +37816,6 @@ source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} fil @file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}. The library functions from @ref{Library Functions}, -and the @command{igawk} program from -@DBREF{Igawk Program} are included as ready-to-use files in the @command{gawk} distribution. They are installed as part of the installation process. The rest of the programs in this @value{DOCUMENT} are available in appropriate @@ -37162,6 +37826,12 @@ The source code, manual pages, and infrastructure files for the sample extensions included with @command{gawk}. @xref{Dynamic Extensions}, for more information. +@item extras/* +Additional non-essential files. Currently, this directory contains some shell +startup files to be installed in @file{/etc/profile.d} to aid in manipulating +the @env{AWKPATH} and @env{AWKLIBPATH} environment variables. +@xref{Shell Startup Files}, for more information. + @item posix/* Files needed for building @command{gawk} on POSIX-compliant systems. @@ -37193,6 +37863,7 @@ to configure @command{gawk} for your system yourself. @menu * Quick Installation:: Compiling @command{gawk} under Unix. +* Shell Startup Files:: Shell convenience functions. * Additional Configuration Options:: Other compile-time options. * Configuration Philosophy:: How it's all supposed to work. @end menu @@ -37273,6 +37944,44 @@ is likely that you will be asked for your password, and you will have to have been set up previously as a user who is allowed to run the @command{sudo} command. +@node Shell Startup Files +@appendixsubsec Shell Startup Files + +The distribution contains shell startup files @file{gawk.sh} and +@file{gawk.csh} containing functions to aid in manipulating +the @env{AWKPATH} and @env{AWKLIBPATH} environment variables. +On a Fedora system, these files should be installed in @file{/etc/profile.d}; +on other platforms, the appropriate location may be different. + +@table @command + +@cindex @command{gawkpath_default} shell function +@item gawkpath_default +Reset the @env{AWKPATH} environment variable to its default value. + +@cindex @command{gawkpath_prepend} shell function +@item gawkpath_prepend +Add the argument to the front of the @env{AWKPATH} environment variable. + +@cindex @command{gawkpath_append} shell function +@item gawkpath_append +Add the argument to the end of the @env{AWKPATH} environment variable. + +@cindex @command{gawklibpath_default} shell function +@item gawklibpath_default +Reset the @env{AWKLIBPATH} environment variable to its default value. + +@cindex @command{gawklibpath_prepend} shell function +@item gawklibpath_prepend +Add the argument to the front of the @env{AWKLIBPATH} environment variable. + +@cindex @command{gawklibpath_append} shell function +@item gawklibpath_append +Add the argument to the end of the @env{AWKLIBPATH} environment variable. + +@end table + + @node Additional Configuration Options @appendixsubsec Additional Configuration Options @cindex @command{gawk}, configuring, options |