1 files changed, 764 insertions, 55 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index f6114b4e..b711949b 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -562,6 +562,7 @@ particular records in a file and perform operations upon them.
 * Computed Regexps::                    Using Dynamic Regexps.
 * GNU Regexp Operators::                Operators specific to GNU software.
 * Case-sensitivity::                    How to do case-insensitive matching.
+* Strong Regexp Constants::             Strongly typed regexp constants.
 * Regexp Summary::                      Regular expressions summary.
 * Records::                             Controlling how data is split into
                                         records.
@@ -604,6 +605,7 @@ particular records in a file and perform operations upon them.
                                         @code{getline}.
 * Getline Summary::                     Summary of @code{getline} Variants.
 * Read Timeout::                        Reading input with a timeout.
+* Retrying Input::                      Retrying input after certain errors.
 * Command-line directories::            What happens if you put a directory on
                                         the command line.
 * Input Summary::                       Input summary.
@@ -633,6 +635,7 @@ particular records in a file and perform operations upon them.
 * Special Caveats::                     Things to watch out for.
 * Close Files And Pipes::               Closing Input and Output Files and
                                         Pipes.
+* Nonfatal::                            Enabling Nonfatal Output.
 * Output Summary::                      Output summary.
 * Output Exercises::                    Exercises.
 * Values::                              Constants, Variables, and Regular
@@ -944,6 +947,7 @@ particular records in a file and perform operations upon them.
 * Array Functions::                     Functions for working with arrays.
 * Flattening Arrays::                   How to flatten arrays.
 * Creating Arrays::                     How to create and populate arrays.
+* Redirection API::                     How to access and manipulate redirections.
 * Extension API Variables::             Variables provided by the API.
 * Extension Versioning::                API Version information.
 * Extension API Informational Variables:: Variables providing information about
@@ -1002,6 +1006,7 @@ particular records in a file and perform operations upon them.
 * Unix Installation::                   Installing @command{gawk} under
                                         various versions of Unix.
 * Quick Installation::                  Compiling @command{gawk} under Unix.
+* Shell Startup Files::                 Shell convenience functions.
 * Additional Configuration Options::    Other compile-time options.
 * Configuration Philosophy::            How it's all supposed to work.
 * Non-Unix Installation::               Installation on Other Operating
@@ -4084,10 +4089,8 @@ No space is allowed between the @option{-o} and @var{file}, if
 @var{file} is supplied.
 
 @quotation NOTE
-Due to the way @command{gawk} has evolved, with this option
-your program still executes.  This will change in the
-next major release, such that @command{gawk} will only
-pretty-print the program and not run it.
+In the past, this option would also execute your program.
+This is no longer the case.
 @end quotation
 
 @item @option{-O}
@@ -4133,11 +4136,6 @@ restrictions apply:
 @cindex newlines
 @cindex whitespace, newlines as
 @item
-Newlines do not act as whitespace to separate fields when @code{FS} is
-equal to a single space
-(@pxref{Fields}).
-
-@item
 Newlines are not allowed after @samp{?} or @samp{:}
 (@pxref{Conditional Exp}).
 
@@ -4488,6 +4486,9 @@ searches first in the current directory and then in @file{/usr/local/share/awk}.
 In practice, this means that you will rarely need to change the
 value of @env{AWKPATH}.
 
+@xref{Shell Startup Files}, for information on functions that help to
+manipulate the @env{AWKPATH} variable.
+
 @command{gawk} places the value of the search path that it used into
 @code{ENVIRON["AWKPATH"]}. This provides access to the actual search
 path value from within an @command{awk} program.
@@ -4519,6 +4520,9 @@ an empty value, @command{gawk} uses a default path; this
 is typically @samp{/usr/local/lib/gawk}, although it can vary depending
 upon how @command{gawk} was built.
 
+@xref{Shell Startup Files}, for information on functions that help to
+manipulate the @env{AWKLIBPATH} variable.
+
 @command{gawk} places the value of the search path that it used into
 @code{ENVIRON["AWKLIBPATH"]}. This provides access to the actual search
 path value from within an @command{awk} program.
@@ -4546,6 +4550,8 @@ wait for input before returning with an error.
 Controls the number of times @command{gawk} attempts to
 retry a two-way TCP/IP (socket) connection before giving up.
 @xref{TCP/IP Networking}.
+Note that when nonfatal I/O is enabled (@pxref{Nonfatal}),
+@command{gawk} only tries to open a TCP/IP socket once.
 
 @item POSIXLY_CORRECT
 Causes @command{gawk} to switch to POSIX-compatibility
@@ -4595,14 +4601,6 @@ two regexp matchers that @command{gawk} uses internally. (There aren't
 supposed to be differences, but occasionally theory and practice don't
 coordinate with each other.)
 
-@item GAWK_NO_PP_RUN
-When @command{gawk} is invoked with the @option{--pretty-print} option,
-it will not run the program if this environment variable exists.
-
-@quotation CAUTION
-This variable will not survive into the next major release.
-@end quotation
-
 @item GAWK_STACKSIZE
 This specifies the amount by which @command{gawk} should grow its
 internal evaluation stack, when needed.
@@ -4900,6 +4898,32 @@ Similarly, you may use @code{print} or @code{printf} statements in the
 @var{init} and @var{increment} parts of a @code{for} loop.  This is another
 long-undocumented ``feature'' of Unix @command{awk}.
 
+@command{gawk} lets you use the names of built-in functions that are
+@command{gawk} extensions as the names of parameters in user-defined functions.
+This is intended to ``future-proof'' old code that happens to use
+function names added by @command{gawk} after the code was written.
+Standard @command{awk} built-in functions, such as @code{sin()} or
+@code{substr()} are @emph{not} shadowed in this way.
+
+The @code{PROCINFO["argv"]} array contains all of the command-line arguments
+(after glob expansion and redirection processing on platforms where that must
+be done manually by the program) with subscripts ranging from 0 through
+@code{argc} @minus{} 1.  For example, @code{PROCINFO["argv"][0]} will contain
+the name by which @command{gawk} was invoked.  Here is an example of how this
+feature may be used:
+
+@example
+awk '
+BEGIN @{
+        for (i = 0; i < length(PROCINFO["argv"]); i++)
+                print i, PROCINFO["argv"][i]
+@}'
+@end example
+
+Please note that this differs from the standard @code{ARGV} array which does
+not include command-line arguments that have already been processed by
+@command{gawk} (@pxref{ARGC and ARGV}).
+
 @end ignore
 
 @node Invoking Summary
@@ -4992,6 +5016,7 @@ regular expressions work, we present more complicated instances.
 * Computed Regexps::            Using Dynamic Regexps.
 * GNU Regexp Operators::        Operators specific to GNU software.
 * Case-sensitivity::            How to do case-insensitive matching.
+* Strong Regexp Constants::     Strongly typed regexp constants.
 * Regexp Summary::              Regular expressions summary.
 @end menu
 
@@ -5182,17 +5207,21 @@ between @samp{0} and @samp{7}.  For example, the code for the ASCII ESC
 @item \x@var{hh}@dots{}
 The hexadecimal value @var{hh}, where @var{hh} stands for a sequence
 of hexadecimal digits (@samp{0}--@samp{9}, and either @samp{A}--@samp{F}
-or @samp{a}--@samp{f}).  Like the same construct
-in ISO C, the escape sequence continues until the first nonhexadecimal
-digit is seen. @value{COMMONEXT}
-However, using more than two hexadecimal digits produces
-undefined results. (The @samp{\x} escape sequence is not allowed in
-POSIX @command{awk}.)
+or @samp{a}--@samp{f}).  A maximum of two digts are allowed after
+the @samp{\x}. Any further hexadecimal digits are treated as simple
+letters or numbers.  @value{COMMONEXT}
+(The @samp{\x} escape sequence is not allowed in POSIX awk.)
 
 @quotation CAUTION
-The next major release of @command{gawk} will change, such
-that a maximum of two hexadecimal digits following the
-@samp{\x} will be used.
+In ISO C, the escape sequence continues until the first nonhexadecimal
+digit is seen.
+For many years, @command{gawk} would continue incorporating
+hexadecimal digits into the value until a non-hexadecimal digit
+or the end of the string was encountered.
+However, using more than two hexadecimal digits produced
+undefined results.
+As of @value{PVERSION} 4.2, only two digits
+are processed.
 @end quotation
 
 @cindex @code{\} (backslash), @code{\/} escape sequence
@@ -6235,6 +6264,89 @@ The value of @code{IGNORECASE} has no effect if @command{gawk} is in
 compatibility mode (@pxref{Options}).
 Case is always significant in compatibility mode.
 
+@node Strong Regexp Constants
+@section Strongly Typed Regexp Constants
+
+This @value{SECTION} describes a @command{gawk}-specific feature.
+
+Regexp constants (@code{/@dots{}/}) hold a strange position in the
+@command{awk} language. In most contexts, they act like an expression:
+@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
+be matched. In no case are they really a ``first class citizen'' of the
+language. That is, you cannot define a scalar variable whose type is
+``regexp'' in the same sense that you can define a variable to be a
+number or a string:
+
+@example
+num = 42        @ii{Numeric variable}
+str = "hi"      @ii{String variable}
+re = /foo/      @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
+@end example
+
+For a number of more advanced use cases (described later on in this
+@value{DOCUMENT}), it would be nice to have regexp constants that
+are @dfn{strongly typed}; in other words, that denote a regexp useful
+for matching, and not an expression.
+
+@command{gawk} provides this feature.  A strongly typed regexp constant
+looks almost like a regular regexp constant, except that it is preceded
+by an @samp{@@} sign:
+
+@example
+re = @@/foo/     @ii{Regexp variable}
+@end example
+
+Strongly typed regexp constants @emph{cannot} be used eveywhere that a
+regular regexp constant can, because this would make the language even more
+confusing.  Instead, you may use them only in certain contexts:
+
+@itemize @bullet
+@item
+On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/}
+(@pxref{Regexp Usage}).
+
+@item
+In the @code{case} part of a @code{switch} statement
+(@pxref{Switch Statement}).
+
+@item
+As an argument to one of the built-in functions that accept regexp constants:
+@code{gensub()},
+@code{gsub()},
+@code{match()},
+@code{patsplit()},
+@code{split()},
+and
+@code{sub()}
+(@pxref{String Functions}).
+
+@item
+As a parameter in a call to a user-defined function
+(@pxref{User-defined}).
+
+@item
+On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}.
+In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var}
+can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions
+listed above, or passed as a parameter to a user-defined function.
+@end itemize
+
+You may use the @code{typeof()} built-in function
+(@pxref{Type Functions})
+to determine if a variable or function parameter is
+a regexp variable.
+
+The true power of this feature comes from the ability to create variables that
+have regexp type. Such variables can be passed on to user-defined functions,
+without the confusing aspects of computed regular expressions created from
+strings or string constants. They may also be passed through indirect function
+calls (@pxref{Indirect Calls})
+onto the built-in functions that accept regexp constants.
+
+When used in numeric conversions, strongly typed regexp variables convert
+to zero. When used in string conversions, they convert to the string
+value of the original regexp text.
+
 @node Regexp Summary
 @section Summary
 
@@ -6278,6 +6390,11 @@ treated as regular expressions).
 case sensitivity of regexp matching.  In other @command{awk}
 versions, use @code{tolower()} or @code{toupper()}.
 
+@item
+Strongly typed regexp constants (@code{@@/.../}) enable
+certain advanced use cases to be described later on in the
+@value{DOCUMENT}.
+
 @end itemize
 
 
@@ -6325,6 +6442,7 @@ used with it do not have to be named on the @command{awk} command line
 * Getline::                     Reading files under explicit program control
                                 using the @code{getline} function.
 * Read Timeout::                Reading input with a timeout.
+* Retrying Input::              Retrying input after certain errors.
 * Command-line directories::    What happens if you put a directory on the
                                 command line.
 * Input Summary::               Input summary.
@@ -6699,16 +6817,12 @@ Readfile} for another option.
 @cindex fields
 @cindex accessing fields
 @cindex fields, examining
-@cindex POSIX @command{awk}, field separators and
-@cindex field separators, POSIX and
-@cindex separators, field, POSIX and
 When @command{awk} reads an input record, the record is
 automatically @dfn{parsed} or separated by the @command{awk} utility into chunks
 called @dfn{fields}.  By default, fields are separated by @dfn{whitespace},
 like words in a line.
 Whitespace in @command{awk} means any string of one or more spaces,
-TABs, or newlines;@footnote{In POSIX @command{awk}, newlines are not
-considered whitespace for separating fields.} other characters
+TABs, or newlines; other characters
 that are considered whitespace by other languages
 (such as formfeed, vertical tab, etc.) are @emph{not} considered
 whitespace by @command{awk}.
@@ -7153,7 +7267,6 @@ can massage it first with a separate @command{awk} program.)
 @node Default Field Splitting
 @subsection Whitespace Normally Separates Fields
 
-@cindex newlines, as field separators
 @cindex whitespace, as field separators
 Fields are normally separated by whitespace sequences
 (spaces, TABs, and newlines), not by single spaces.  Two spaces in a row do not
@@ -8116,6 +8229,13 @@ a record, such as a file that cannot be opened, then @code{getline}
 returns @minus{}1.  In this case, @command{gawk} sets the variable
 @code{ERRNO} to a string describing the error that occurred.
 
+If @code{ERRNO} indicates that the I/O operation may be
+retried, and @code{PROCINFO["@var{input}", "RETRY"]} is set,
+then @code{getline} returns @minus{}2
+instead of @minus{}1, and further calls to @code{getline}
+may be attemped.  @DBXREF{Retrying Input} for further information about
+this feature.
+
 In the following examples, @var{command} stands for a string value that
 represents a shell command.
 
@@ -8770,7 +8890,8 @@ on a per-command or per-connection basis.
 the attempt to read from the underlying device may
 succeed in a later attempt. This is a limitation, and it also
 means that you cannot use this to multiplex input from
-two or more sources.
+two or more sources.  @DBXREF{Retrying Input} for a way to enable 
+later I/O attempts to succeed.
 
 Assigning a timeout value prevents read operations from
 blocking indefinitely. But bear in mind that there are other ways
@@ -8780,6 +8901,36 @@ a connection before it can start reading any data,
 or the attempt to open a FIFO special file for reading can block
 indefinitely until some other process opens it for writing.
 
+@node Retrying Input
+@section Retrying Reads After Certain Input Errors
+@cindex retrying input
+
+@cindex differences in @command{awk} and @command{gawk}, retrying input
+This @value{SECTION} describes a feature that is specific to @command{gawk}.
+
+When @command{gawk} encounters an error while reading input, by
+default @code{getline} returns @minus{}1, and subsequent attempts to
+read from that file result in an end-of-file indication.  However, you
+may optionally instruct @command{gawk} to allow I/O to be retried when
+certain errors are encountered by setting a special element in
+the @code{PROCINFO} array (@pxref{Auto-set}):
+
+@example
+PROCINFO["@var{input_name}", "RETRY"] = 1
+@end example
+
+When this element exists, @command{gawk} checks the value of the system
+(C language)
+@code{errno} variable when an I/O error occurs.  If @code{errno} indicates
+a subsequent I/O attempt may succeed, @code{getline} instead returns
+@minus{}2 and
+further calls to @code{getline} may succeed.  This applies to the @code{errno}
+values @code{EAGAIN}, @code{EWOULDBLOCK}, @code{EINTR}, or @code{ETIMEDOUT}.
+
+This feature is useful in conjunction with
+@code{PROCINFO["@var{input_name}", "READ_TIMEOUT"]} or situations where a file
+descriptor has been configured to behave in a non-blocking fashion.
+
 @node Command-line directories
 @section Directories on the Command Line
 @cindex differences in @command{awk} and @command{gawk}, command-line directories
@@ -8941,6 +9092,7 @@ and discusses the @code{close()} built-in function.
                                 @command{gawk} allows access to inherited file
                                 descriptors.
 * Close Files And Pipes::       Closing Input and Output Files and Pipes.
+* Nonfatal::                    Enabling Nonfatal Output.
 * Output Summary::              Output summary.
 * Output Exercises::            Exercises.
 @end menu
@@ -10446,6 +10598,70 @@ when closing a pipe.
 @end cartouche
 @end ifnotdocbook
 
+@node Nonfatal
+@section Enabling Nonfatal Output
+
+This @value{SECTION} describes a @command{gawk}-specific feature.
+
+In standard @command{awk}, output with @code{print} or @code{printf}
+to a nonexistent file, or some other I/O error (such as filling up the
+disk) is a fatal error.
+
+@example
+$ @kbd{gawk 'BEGIN @{ print "hi" > "/no/such/file" @}'}
+@error{} gawk: cmd. line:1: fatal: can't redirect to `/no/such/file' (No such file or directory)
+@end example
+
+@command{gawk} makes it possible to detect that an error has
+occurred, allowing you to possibly recover from the error, or
+at least print an error message of your choosing before exiting.
+You can do this in one of two ways:
+
+@itemize @bullet
+@item
+For all output files, by assigning any value to @code{PROCINFO["NONFATAL"]}.
+
+@item
+On a per-file basis, by assigning any value to
+@code{PROCINFO[@var{filename}, "NONFATAL"]}.
+Here, @var{filename} is the name of the file to which
+you wish output to be nonfatal.
+@end itemize
+
+Once you have enabled nonfatal output, you must check @code{ERRNO}
+after every relevant @code{print} or @code{printf} statement to
+see if something went wrong.  It is also a good idea to initialize
+@code{ERRNO} to zero before attempting the output. For example:
+
+@example
+$ @kbd{gawk '}
+> @kbd{BEGIN @{}
+> @kbd{    PROCINFO["NONFATAL"] = 1}
+> @kbd{    ERRNO = 0}
+> @kbd{    print "hi" > "/no/such/file"}
+> @kbd{    if (ERRNO) @{}
+> @kbd{        print("Output failed:", ERRNO) > "/dev/stderr"}
+> @kbd{        exit 1}
+> @kbd{    @}}
+> @kbd{@}'}
+@error{} Output failed: No such file or directory
+@end example
+
+Here, @command{gawk} did not produce a fatal error; instead
+it let the @command{awk} program code detect the problem and handle it.
+
+This mechanism works also for standard output and standard error.
+For standard output, you may use @code{PROCINFO["-", "NONFATAL"]}
+or @code{PROCINFO["/dev/stdout", "NONFATAL"]}.  For standard error, use
+@code{PROCINFO["/dev/stderr", "NONFATAL"]}.
+
+When attempting to open a TCP/IP socket (@pxref{TCP/IP Networking}),
+@command{gawk} tries multiple times. The @env{GAWK_SOCK_RETRIES}
+environment variable (@pxref{Other Environment Variables}) allows you to
+override @command{gawk}'s builtin default number of attempts.  However,
+once nonfatal I/O is enabled for a given socket, @command{gawk} only
+retries once, relying on @command{awk}-level code to notice that there
+was a problem.
 
 @node Output Summary
 @section Summary
@@ -10475,6 +10691,12 @@ Use @code{close()} to close open file, pipe, and coprocess redirections.
 For coprocesses, it is possible to close only one direction of the
 communications.
 
+@item
+Normally errors with @code{print} or @code{printf} are fatal.
+@command{gawk} lets you make output errors be nonfatal either for
+all files or on a per-file basis. You must then check for errors
+after every relevant output statement.
+
 @end itemize
 
 @c EXCLUDE START
@@ -14579,12 +14801,11 @@ specify the behavior when @code{FS} is the null string.
 Nonetheless, some other versions of @command{awk} also treat
 @code{""} specially.)
 
-@cindex POSIX @command{awk}, @code{FS} variable and
 The default value is @w{@code{" "}}, a string consisting of a single
-space.  As a special exception, this value means that any
-sequence of spaces, TABs, and/or newlines is a single separator.@footnote{In
-POSIX @command{awk}, newline does not count as whitespace.}  It also causes
-spaces, TABs, and newlines at the beginning and end of a record to be ignored.
+space.  As a special exception, this value means that any sequence of
+spaces, TABs, and/or newlines is a single separator.  It also causes
+spaces, TABs, and newlines at the beginning and end of a record to
+be ignored.
 
 You can set the value of @code{FS} on the command line using the
 @option{-F} option:
@@ -14808,10 +15029,24 @@ opens the next file.
 An associative array containing the values of the environment.  The array
 indices are the environment variable names; the elements are the values of
 the particular environment variables.  For example,
-@code{ENVIRON["HOME"]} might be @code{"/home/arnold"}.  Changing this array
-does not affect the environment passed on to any programs that
-@command{awk} may spawn via redirection or the @code{system()} function.
-(In a future version of @command{gawk}, it may do so.)
+@code{ENVIRON["HOME"]} might be @code{/home/arnold}.
+
+For POSIX @command{awk}, changing this array does not affect the
+environment passed on to any programs that @command{awk} may spawn via
+redirection or the @code{system()} function.
+
+However, beginning with version 4.2, if not in POSIX
+compatibility mode, @command{gawk} does update its own environment when
+@code{ENVIRON} is changed, thus changing the environment seen by programs
+that it creates.  You should therefore be especially careful if you
+modify @code{ENVIRON["PATH"]}, which is the search path for finding
+executable programs.
+
+This can also affect the running @command{gawk} program, since some of the
+built-in functions may pay attention to certain environment variables.
+The most notable instance of this is @code{mktime()} (@pxref{Time
+Functions}), which pays attention the value of the @env{TZ} environment
+variable on many systems.
 
 Some operating systems may not have environment variables.
 On such systems, the @code{ENVIRON} array is empty (except for
@@ -14845,6 +15080,11 @@ value to be meaningful when an I/O operation returns a failure value,
 such as @code{getline} returning @minus{}1.  You are, of course, free
 to clear it yourself before doing an I/O operation.
 
+If the value of @code{ERRNO} corresponds to a system error in the C
+@code{errno} variable, then @code{PROCINFO["errno"]} will be set to the value
+of @code{errno}.  For non-system errors, @code{PROCINFO["errno"]} will
+be zero.
+
 @cindex @code{FILENAME} variable
 @cindex dark corner, @code{FILENAME} variable
 @item @code{FILENAME}
@@ -14913,6 +15153,10 @@ are guaranteed to be available:
 @item PROCINFO["egid"]
 The value of the @code{getegid()} system call.
 
+@item PROCINFO["errno"]
+The value of the C @code{errno} variable when @code{ERRNO} is set to
+the associated error message.
+
 @item PROCINFO["euid"]
 @cindex effective user ID of @command{gawk} user
 The value of the @code{geteuid()} system call.
@@ -15036,6 +15280,14 @@ to test for these elements
 The following elements allow you to change @command{gawk}'s behavior:
 
 @table @code
+@item PROCINFO["NONFATAL"]
+If this element exists, then I/O errors for all output redirections become nonfatal.
+@DBXREF{Nonfatal}.
+
+@item PROCINFO["@var{output_name}", "NONFATAL"]
+Make output errors for @var{output_name} be nonfatal.
+@DBXREF{Nonfatal}.
+
 @item PROCINFO["@var{command}", "pty"]
 For two-way communication to @var{command}, use a pseudo-tty instead
 of setting up a two-way pipe.
@@ -16958,6 +17210,23 @@ truncated toward zero.
 For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)}
 is @minus{}3, and @code{int(-3)} is @minus{}3 as well.
 
+@item @code{intdiv(@var{numerator}, @var{denominator}, @var{result})}
+@cindexawkfunc{intdiv}
+@cindex intdiv
+Perform integer division, similar to the standard C function of the
+same name.  First, truncate @code{numerator} and @code{denominator}
+towards zero, creating integer values.  Clear the @code{result}
+array, and then set @code{result["quotient"]} to the result of
+@samp{numerator / denominator}, truncated towards zero to an integer,
+and set @code{result["remainder"]} to the result of @samp{numerator %
+denominator}, truncated towards zero to an integer.  This function is
+primarily intended for use with arbitrary length integers; it avoids
+creating MPFR arbitrary precision floating-point values (@pxref{Arbitrary
+Precision Integers}).
+
+This function is a @code{gawk} extension.  It is not available in
+compatibility mode (@pxref{Options}).
+
 @item @code{log(@var{x})}
 @cindexawkfunc{log}
 @cindex logarithm
@@ -19209,16 +19478,70 @@ results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions.
 @node Type Functions
 @subsection Getting Type Information
 
-@command{gawk} provides a single function that lets you distinguish
-an array from a scalar variable.  This is necessary for writing code
+@command{gawk} provides two functions that lets you distinguish
+the type of a variable.
+This is necessary for writing code
 that traverses every element of an array of arrays
-(@pxref{Arrays of Arrays}).
+(@pxref{Arrays of Arrays}), and in other contexts.
 
 @table @code
 @cindexgawkfunc{isarray}
 @cindex scalar or array
 @item isarray(@var{x})
 Return a true value if @var{x} is an array. Otherwise, return false.
+
+@cindexgawkfunc{typeof}
+@cindex variable type
+@cindex type, of variable
+@item typeof(@var{x})
+Return one of the following strings, depending upon the type of @var{x}:
+
+@c nested table
+@table @code
+@item "array"
+@var{x} is an array.
+
+@item "regexp"
+@var{x} is a strongly typed regexp (@pxref{Strong Regexp Constants}).
+
+@item "number"
+@var{x} is a number.
+
+@item "string"
+@var{x} is a string.
+
+@item "strnum"
+@var{x} is a string that might be a number, such as a field or
+the result of calling @code{split()}. (I.e., @var{x} has the STRNUM
+attribute; @pxref{Variable Typing}.)
+
+@item "unassigned"
+@var{x} is a scalar variable that has not been assigned a value yet.
+For example:
+
+@example
+BEGIN @{
+    a[1]                # creates a[1] but it has no assigned value
+    print typeof(a[1])  # scalar_u
+@}
+@end example
+
+@item "untyped"
+@var{x} has not yet been used yet at all; it can become a scalar or an
+array.
+For example:
+
+@example
+BEGIN @{
+    print typeof(x)     # x never used --> untyped
+    mk_arr(x)
+    print typeof(x)     # x now an array --> array
+@}
+
+function mk_arr(a) @{ a[1] = 1 @}
+@end example
+
+@end table
 @end table
 
 @code{isarray()} is meant for use in two circumstances. The first is when
@@ -19236,6 +19559,14 @@ that has not been previously used to @code{isarray()}, @command{gawk}
 ends up turning it into a scalar.
 @end quotation
 
+The @code{typeof()} function is general; it allows you to determine
+if a variable or function parameter is a scalar, an array, or a strongly
+typed regexp.
+
+@code{isarray()} is deprecated; you should use @code{typeof()} instead.
+You should replace any existing uses of @samp{isarray(var)} in your
+code with @samp{typeof(var) == "array"}.
+
 @node I18N Functions
 @subsection String-Translation Functions
 @cindex @command{gawk}, string-translation functions
@@ -27768,8 +28099,7 @@ The profiled version of your program may not look exactly like what you
 typed when you wrote it.  This is because @command{gawk} creates the
 profiled version by ``pretty-printing'' its internal representation of
 the program.  The advantage to this is that @command{gawk} can produce
-a standard representation.  The disadvantage is that all source code
-comments are lost.
+a standard representation.
 Also, things such as:
 
 @example
@@ -27863,10 +28193,26 @@ When called this way, @command{gawk} ``pretty-prints'' the program into
 @file{awkprof.out}, without any execution counts.
 
 @quotation NOTE
-The @option{--pretty-print} option still runs your program.
-This will change in the next major release.
+Once upon a time, the @option{--pretty-print} option would also run
+your program.  This is is no longer the case.
 @end quotation
 
+There is a significant difference between the output created when
+profiling, and that created when pretty-printing.  Pretty-printed output
+preserves the original comments that were in the program, although their
+placement may not correspond exactly to their original locations in the
+source code.
+
+However, as a deliberate design decision, profiling output @emph{omits}
+the original program's comments. This allows you to focus on the
+execution count data and helps you avoid the temptation to use the
+profiler for pretty-printing.
+
+Additionally, pretty-printed output does not have the leading indentation
+that the profiling output does. This makes it easy to pretty-print your
+code once development is completed, and then use the result as the final
+version of your program.
+
 @node Advanced Features Summary
 @section Summary
 
@@ -30078,6 +30424,65 @@ executing, short programs.
 The @command{gawk} debugger only accepts source code supplied with the @option{-f} option.
 @end itemize
 
+One other point is worth disucssing.  Conventional debuggers run in a
+separate process (and thus address space) from the programs that they
+debug (the @dfn{debuggee}, if you will).
+
+The @command{gawk} debugger is different; it is an integrated part
+of @command{gawk} itself.  This makes it possible, in rare cases,
+for @command{gawk} to become an excellent demonstrator of Heisenburg
+Uncertainty physics, where the mere act of observing something can change
+it. Consider the following:@footnote{Thanks to Hermann Peifer for
+this example.}
+
+@example
+$ @kbd{cat test.awk}
+@print{} @{ print typeof($1), typeof($2) @}
+$ @kbd{cat test.data}
+@print{} abc 123
+$ @kbd{gawk -f test.awk test.data}
+@print{} strnum strnum
+@end example
+
+This is all as expected: field data has the STRNUM attribute
+(@pxref{Variable Typing}).  Now watch what happens when we run
+this program under the debugger:
+
+@example
+$ @kbd{gawk -D -f test.awk test.data}
+gawk> @kbd{w $1}                        @ii{Set watchpoint on} $1
+@print{} Watchpoint 1: $1
+gawk> @kbd{w $2}                        @ii{Set watchpoint on} $2
+@print{} Watchpoint 2: $2
+gawk> @kbd{r}                           @ii{Start the program}
+@print{} Starting program:
+@print{} Stopping in Rule ...
+@print{} Watchpoint 1: $1               @ii{Watchpoint fires}
+@print{}   Old value: ""
+@print{}   New value: "abc"
+@print{} main() at `test.awk':1
+@print{} 1       @{ print typeof($1), typeof($2) @}
+gawk> @kbd{n}                           @ii{Keep going @dots{}}
+@print{} Watchpoint 2: $2               @ii{Watchpoint fires}
+@print{}   Old value: ""
+@print{}   New value: "123"
+@print{} main() at `test.awk':1
+@print{} 1       @{ print typeof($1), typeof($2) @}
+gawk> @kbd{n}                           @ii{Get result from} typeof()
+@print{} strnum string                  @ii{Result for} $2 @ii{isn't right}
+@print{} Program exited normally with exit value: 0
+gawk> @kbd{quit}
+@end example
+
+In this case, the act of comparing the new value of @code{$2}
+with the old one caused @command{gawk} to evaluate it and determine that it
+is indeed a string, and this is reflected in the result of
+@code{typeof()}.
+
+Cases like this where the debugger is not transparent to the program's
+execution should be rare. If you encounter one, please report it
+(@pxref{Bugs}).
+
 @ignore
 Look forward to a future release when these and other missing features may
 be added, and of course feel free to try to add them yourself!
@@ -30114,6 +30519,10 @@ If the GNU Readline library is available when @command{gawk} is
 compiled, it is used by the debugger to provide command-line history
 and editing.
 
+@item
+Usually, the debugger does not not affect the
+program being debugged, but occasionally it can.
+
 @end itemize
 
 @node Arbitrary Precision Arithmetic
@@ -30946,6 +31355,122 @@ to just use the following:
 gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
 @end example
 
+When dividing two arbitrary precision integers with either
+@samp{/} or @samp{%}, the result is typically an arbitrary
+precision floating point value (unless the denominator evenly
+divides into the numerator).  In order to do integer division
+or remainder with arbitrary precision integers, use the built-in
+@code{intdiv()} function (@pxref{Numeric Functions}).
+
+You can simulate the @code{intdiv()} function in standard @command{awk}
+using this user-defined function:
+
+@example
+@c file eg/lib/intdiv.awk
+# intdiv --- do integer division
+
+@c endfile
+@ignore
+@c file eg/lib/intdiv.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# July, 2014
+#
+# Name changed from div() to intdiv()
+# April, 2015
+
+@c endfile
+
+@end ignore
+@c file eg/lib/intdiv.awk
+function intdiv(numerator, denominator, result)
+@{
+    split("", result)
+
+    numerator = int(numerator)
+    denominator = int(denominator)
+    result["quotient"] = int(numerator / denominator)
+    result["remainder"] = int(numerator % denominator)
+
+    return 0.0
+@}
+@c endfile
+@end example
+
+The following example program, contributed by Katie Wasserman,
+uses @code{intdiv()} to
+compute the digits of @value{PI} to as many places as you
+choose to set:
+
+@example
+@c file eg/prog/pi.awk
+# pi.awk --- compute the digits of pi
+@c endfile
+@c endfile
+@ignore
+@c file eg/prog/pi.awk
+#
+# Katie Wasserman, katie@@wass.net
+# August 2014
+@c endfile
+@end ignore
+@c file eg/prog/pi.awk
+
+BEGIN @{
+    digits = 100000
+    two = 2 * 10 ^ digits
+    pi = two
+    for (m = digits * 4; m > 0; --m) @{
+        d = m * 2 + 1
+        x = pi * m
+        intdiv(x, d, result)
+        pi = result["quotient"]
+        pi = pi + two
+    @}
+    print pi
+@}
+@c endfile
+@end example
+
+@ignore
+Date: Wed, 20 Aug 2014 10:19:11 -0400
+To: arnold@skeeve.com
+From: Katherine Wasserman <katie@wass.net>
+Subject: Re: computation of digits of pi?
+
+Arnold,
+
+>The program that you sent to compute the digits of pi using div(). Is
+>that some standard algorithm that every math student knows? If so,
+>what's it called?
+
+It's not that well known but it's not that obscure either
+
+It's Euler's modification to Newton's method for calculating pi.
+
+Take a look at lines (23) - (25)  here: http://mathworld.wolfram.com/PiFormulas.htm
+
+The algorithm I wrote simply expands the multiply by 2 and works from the innermost expression outwards.  I used this to program HP calculators because it's quite easy to modify for tiny memory devices with smallish word sizes.   
+
+http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899
+
+-Katie
+@end ignore
+
+When asked about the algorithm used, Katie replied:
+
+@quotation
+It's not that well known but it's not that obscure either.
+It's Euler's modification to Newton's method for calculating pi.
+Take a look at lines (23) - (25) here: @uref{http://mathworld.wolfram.com/PiFormulas.html}.
+
+The algorithm I wrote simply expands the multiply by 2 and works from
+the innermost expression outwards.  I used this to program HP calculators
+because it's quite easy to modify for tiny memory devices with smallish
+word sizes. See
+@uref{http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899}.
+@end quotation
+
 @node POSIX Floating Point Problems
 @section Standards Versus Existing Practice
 
@@ -31345,6 +31870,7 @@ This (rather large) @value{SECTION} describes the API in detail.
 * Symbol Table Access::                  Functions for accessing global
                                          variables.
 * Array Manipulation::                   Functions for working with arrays.
+* Redirection API::                      How to access and manipulate redirections.
 * Extension API Variables::              Variables provided by the API.
 * Extension API Boilerplate::            Boilerplate code for using the API.
 @end menu
@@ -31420,6 +31946,10 @@ Clearing an array
 @item
 Flattening an array for easy C-style looping over all its indices and elements
 @end itemize
+
+@item
+Accessing and manipulating redirections.
+
 @end itemize
 
 Some points about using the API:
@@ -33390,6 +33920,75 @@ $ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk}
 (@DBXREF{Finding Extensions} for more information on the
 @env{AWKLIBPATH} environment variable.)
 
+@node Redirection API
+@subsection Accessing and Manipulating Redirections
+
+The following function allows extensions to access and manipulate redirections.
+
+@table @code
+@item awk_bool_t get_file(const char *name,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t name_len,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const char *filetype,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int fd,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_input_buf_t **ibufp,
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const awk_output_buf_t **obufp);
+Look up a file in @command{gawk}'s internal redirection table.
+If @code{name} is @code{NULL} or @code{name_len} is zero, return
+data for the currently open input file corresponding to @code{FILENAME}.
+(This does not access the @code{filetype} argument, so that may be undefined).  
+If the file is not already open, attempt to open it.
+The @code{filetype} argument must be zero-terminated and should be one of:
+
+@table @code
+@item ">"
+A file opened for output.
+
+@item ">>"
+A file opened for append.
+
+@item "<"
+A file opened for input.
+
+@item "|>"
+A pipe opened for output.
+
+@item "|<"
+A pipe opened for input.
+
+@item "|&"
+A two-way coprocess.
+@end table
+
+On error, return a @code{false} value.  Otherwise, return
+@code{true}, and return additional information about the redirection
+in the @code{ibufp} and @code{obufp} pointers.  For input
+redirections, the @code{*ibufp} value should be non-@code{NULL},
+and @code{*obufp} should be @code{NULL}.  For output redirections,
+the @code{*obufp} value should be non-@code{NULL}, and @code{*ibufp}
+should be @code{NULL}.  For two-way coprocesses, both values should
+be non-@code{NULL}.
+
+In the usual case, the extension is interested in @code{(*ibufp)->fd}
+and/or @code{fileno((*obufp)->fp)}.  If the file is not already
+open, and the @code{fd} argument is non-negative, @command{gawk}
+will use that file descriptor instead of opening the file in the
+usual way.  If @code{fd} is non-negative, but the file exists already,
+@command{gawk} ignores @code{fd} and returns the existing file.  It is
+the caller's responsibility to notice that neither the @code{fd} in
+the returned @code{awk_input_buf_t} nor the @code{fd} in the returned
+@code{awk_output_buf_t} matches the requested value.
+
+Note that supplying a file descriptor is currently @emph{not} supported
+for pipes.  However, supplying a file descriptor should work for input,
+output, append, and two-way (coprocess) sockets.  If @code{filetype}
+is two-way, @command{gawk} assumes that it is a socket!  Note that in
+the two-way case, the input and output file descriptors may differ.
+To check for success, you must check whether either matches.
+@end table
+
+It is anticipated that this API function will be used to implement I/O
+multiplexing and a socket library.
+
 @node Extension API Variables
 @subsection API Variables
 
@@ -34954,18 +35553,21 @@ As of this writing, there are seven extensions:
 GD graphics library extension
 
 @item
+MPFR library extension
+(this provides access to a number of MPFR functions that @command{gawk}'s
+native MPFR support does not)
+
+@item
 PDF extension
 
 @item
 PostgreSQL extension
 
 @item
-MPFR library extension
-(this provides access to a number of MPFR functions that @command{gawk}'s
-native MPFR support does not)
+Redis extension
 
 @item
-Redis extension
+Select extension
 
 @item
 XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
@@ -35594,6 +36196,10 @@ Indirect function calls
 @item
 Directories on the command line produce a warning and are skipped
 (@pxref{Command-line directories})
+
+@item
+Output with @code{print} and @code{printf} need not be fatal
+(@pxref{Nonfatal})
 @end itemize
 
 @item
@@ -35681,6 +36287,11 @@ The @code{isarray()} function to check if a variable is an array or not
 The @code{bindtextdomain()}, @code{dcgettext()}, and @code{dcngettext()}
 functions for internationalization
 (@pxref{Programmer i18n})
+
+@item
+The @code{intdiv()} function for doing integer
+division and remainder
+(@pxref{Numeric Functions})
 @end itemize
 
 @item
@@ -35813,6 +36424,16 @@ for @command{gawk} @value{PVERSION} 4.1:
 Ultrix
 @end itemize
 
+@item
+Support for the following systems was removed from the code
+for @command{gawk} @value{PVERSION} 4.2:
+
+@c nested table
+@itemize @value{MINUS}
+@item
+MirBSD
+@end itemize
+
 @end itemize
 
 @c XXX ADD MORE STUFF HERE
@@ -36439,6 +37060,49 @@ Support for Ultrix was removed.
 
 @end itemize
 
+Version 4.2 introduced the following changes:
+
+@itemize @bullet
+@item
+Changes to @code{ENVIRON} are reflected into @command{gawk}'s
+environment and that of programs that it runs.
+@xref{Auto-set}.
+
+@item
+The @option{--pretty-print} option no longer runs the @command{awk}
+program too.
+@xref{Options}.
+
+@item
+The @command{igawk} program and its manual page are no longer
+installed when @command{gawk} is built.
+@xref{Igawk Program}.
+
+@item
+The @code{intdiv()} function.
+@xref{Numeric Functions}.
+
+@item
+The maximum number of hexadecimal digits in @samp{\x} escapes
+is now two.
+@xref{Escape Sequences}.
+
+@item
+Nonfatal output with @code{print} and @code{printf}.
+@xref{Nonfatal}.
+
+@item
+For many years, POSIX specified that default field splitting
+only allowed spaces and tabs to separate fields, and this was
+how @command{gawk} behaved with @option{--posix}. As of 2013,
+the standard restored historical behavior, and now default
+field splitting with @option{--posix} also allows newlines to
+separate fields.
+
+@item
+Support for MirBSD was removed.
+@end itemize
+
 @c XXX ADD MORE STUFF HERE
 @end ifclear
 
@@ -37106,6 +37770,8 @@ The generated Info file for
 The @command{troff} source for a manual page describing the @command{igawk}
 program presented in
 @ref{Igawk Program}.
+(Since @command{gawk} can do its own @code{@@include} processing,
+neither @command{igawk} nor @file{igawk.1} are installed.)
 
 @item doc/Makefile.in
 The input file used during the configuration process to generate the
@@ -37150,8 +37816,6 @@ source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} fil
 @file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}.
 The library functions from
 @ref{Library Functions},
-and the @command{igawk} program from
-@DBREF{Igawk Program}
 are included as ready-to-use files in the @command{gawk} distribution.
 They are installed as part of the installation process.
 The rest of the programs in this @value{DOCUMENT} are available in appropriate
@@ -37162,6 +37826,12 @@ The source code, manual pages, and infrastructure files for
 the sample extensions included with @command{gawk}.
 @xref{Dynamic Extensions}, for more information.
 
+@item extras/*
+Additional non-essential files.  Currently, this directory contains some shell
+startup files to be installed in @file{/etc/profile.d} to aid in manipulating
+the @env{AWKPATH} and @env{AWKLIBPATH} environment variables.
+@xref{Shell Startup Files}, for more information.
+
 @item posix/*
 Files needed for building @command{gawk} on POSIX-compliant systems.
 
@@ -37193,6 +37863,7 @@ to configure @command{gawk} for your system yourself.
 
 @menu
 * Quick Installation::               Compiling @command{gawk} under Unix.
+* Shell Startup Files::              Shell convenience functions.
 * Additional Configuration Options:: Other compile-time options.
 * Configuration Philosophy::         How it's all supposed to work.
 @end menu
@@ -37273,6 +37944,44 @@ is likely that you will be asked for your password, and you will have
 to have been set up previously as a user who is allowed to run the
 @command{sudo} command.
 
+@node Shell Startup Files
+@appendixsubsec Shell Startup Files
+
+The distribution contains shell startup files @file{gawk.sh} and
+@file{gawk.csh} containing functions to aid in manipulating
+the @env{AWKPATH} and @env{AWKLIBPATH} environment variables.
+On a Fedora system, these files should be installed in @file{/etc/profile.d};
+on other platforms, the appropriate location may be different.
+
+@table @command
+
+@cindex @command{gawkpath_default} shell function
+@item gawkpath_default
+Reset the @env{AWKPATH} environment variable to its default value.
+
+@cindex @command{gawkpath_prepend} shell function
+@item gawkpath_prepend
+Add the argument to the front of the @env{AWKPATH} environment variable.
+
+@cindex @command{gawkpath_append} shell function
+@item gawkpath_append
+Add the argument to the end of the @env{AWKPATH} environment variable.
+
+@cindex @command{gawklibpath_default} shell function
+@item gawklibpath_default
+Reset the @env{AWKLIBPATH} environment variable to its default value.
+
+@cindex @command{gawklibpath_prepend} shell function
+@item gawklibpath_prepend
+Add the argument to the front of the @env{AWKLIBPATH} environment variable.
+
+@cindex @command{gawklibpath_append} shell function
+@item gawklibpath_append
+Add the argument to the end of the @env{AWKLIBPATH} environment variable.
+
+@end table
+
+
 @node Additional Configuration Options
 @appendixsubsec Additional Configuration Options
 @cindex @command{gawk}, configuring, options