aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi76
1 files changed, 47 insertions, 29 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index a809bd0d..148032aa 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -37,11 +37,13 @@
@ifnotdocbook
@set BULLET @bullet{}
@set MINUS @minus{}
+@set NUL @sc{nul}
@end ifnotdocbook
@ifdocbook
@set BULLET
@set MINUS
+@set NUL NUL
@end ifdocbook
@set xref-automatic-section-title
@@ -5277,10 +5279,10 @@ with @samp{A}.
@cindex POSIX @command{awk}, period (@code{.})@comma{} using
In strict POSIX mode (@pxref{Options}),
-@samp{.} does not match the @sc{nul}
+@samp{.} does not match the @value{NUL}
character, which is a character with all bits equal to zero.
-Otherwise, @sc{nul} is just another character. Other versions of @command{awk}
-may not be able to match the @sc{nul} character.
+Otherwise, @value{NUL} is just another character. Other versions of @command{awk}
+may not be able to match the @value{NUL} character.
@cindex @code{[]} (square brackets), regexp operator
@cindex square brackets (@code{[]}), regexp operator
@@ -6429,7 +6431,7 @@ a value that you know doesn't occur in the input file. This is hard
to do in a general way, such that a program always works for arbitrary
input files.
-You might think that for text files, the @sc{nul} character, which
+You might think that for text files, the @value{NUL} character, which
consists of a character with all bits equal to zero, is a good
value to use for @code{RS} in this case:
@@ -6438,23 +6440,23 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record?
@end example
@cindex differences in @command{awk} and @command{gawk}, strings, storing
-@command{gawk} in fact accepts this, and uses the @sc{nul}
+@command{gawk} in fact accepts this, and uses the @value{NUL}
character for the record separator.
This works for certain special files, such as @file{/proc/environ} on
-GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
+GNU/Linux systems, where the @value{NUL} character is in fact the record separator.
However, this usage is @emph{not} portable
to most other @command{awk} implementations.
@cindex dark corner, strings, storing
Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the
-@sc{nul} character as the string terminator. In effect, this means that
+@value{NUL} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER}
-It happens that recent versions of @command{mawk} can use the @sc{nul}
+It happens that recent versions of @command{mawk} can use the @value{NUL}
character as a record separator. However, this is a special case:
-@command{mawk} does not allow embedded @sc{nul} characters in strings.
+@command{mawk} does not allow embedded @value{NUL} characters in strings.
@cindex records, treating files as
@cindex treating files, as single records
@@ -6479,7 +6481,7 @@ a value that you know doesn't occur in the input file. This is hard
to do in a general way, such that a program always works for arbitrary
input files.
-You might think that for text files, the @sc{nul} character, which
+You might think that for text files, the @value{NUL} character, which
consists of a character with all bits equal to zero, is a good
value to use for @code{RS} in this case:
@@ -6488,23 +6490,23 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record?
@end example
@cindex differences in @command{awk} and @command{gawk}, strings, storing
-@command{gawk} in fact accepts this, and uses the @sc{nul}
+@command{gawk} in fact accepts this, and uses the @value{NUL}
character for the record separator.
This works for certain special files, such as @file{/proc/environ} on
-GNU/Linux systems, where the @sc{nul} character is in fact the record separator.
+GNU/Linux systems, where the @value{NUL} character is in fact the record separator.
However, this usage is @emph{not} portable
to most other @command{awk} implementations.
@cindex dark corner, strings, storing
Almost all other @command{awk} implementations@footnote{At least that we know
about.} store strings internally as C-style strings. C strings use the
-@sc{nul} character as the string terminator. In effect, this means that
+@value{NUL} character as the string terminator. In effect, this means that
@samp{RS = "\0"} is the same as @samp{RS = ""}.
@value{DARKCORNER}
-It happens that recent versions of @command{mawk} can use the @sc{nul}
+It happens that recent versions of @command{mawk} can use the @value{NUL}
character as a record separator. However, this is a special case:
-@command{mawk} does not allow embedded @sc{nul} characters in strings.
+@command{mawk} does not allow embedded @value{NUL} characters in strings.
@cindex records, treating files as
@cindex treating files, as single records
@@ -10425,7 +10427,7 @@ double-quotation marks. For example:
@cindex strings, length limitations
represents the string whose contents are @samp{parrot}. Strings in
@command{gawk} can be of any length, and they can contain any of the possible
-eight-bit ASCII characters including ASCII @sc{nul} (character code zero).
+eight-bit ASCII characters including ASCII @value{NUL} (character code zero).
Other @command{awk}
implementations may have difficulty with some character codes.
@@ -10709,7 +10711,11 @@ on the @command{awk} command line.
Variables let you give names to values and refer to them later. Variables
have already been used in many of the examples. The name of a variable
must be a sequence of letters, digits, or underscores, and it may not begin
-with a digit. Case is significant in variable names; @code{a} and @code{A}
+with a digit.
+Here, a @dfn{letter} is any one of the 52 upper- and lowercase
+English letters. Other characters that may be defined as letters
+in non-English locales are not valid in variable names.
+Case is significant in variable names; @code{a} and @code{A}
are distinct variables.
A variable name is a valid expression by itself; it represents the
@@ -14128,9 +14134,8 @@ the beginning, in the following manner:
@example
NF != 4 @{
- err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR)
- print err > "/dev/stderr"
- next
+ printf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) > "/dev/stderr"
+ next
@}
@end example
@@ -14772,8 +14777,11 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect.
@item PROCINFO["identifiers"]
@cindex program identifiers
-A subarray, indexed by the names of all identifiers used in the
-text of the AWK program. For each identifier, the value of the element is one of the following:
+A subarray, indexed by the names of all identifiers used in the text of
+the AWK program. An @dfn{identifier} is simply the name of a variable
+(be it scalar or array), built-in function, user-defined function, or
+extension function. For each identifier, the value of the element is
+one of the following:
@table @code
@item "array"
@@ -19192,6 +19200,8 @@ The definition of a function named @var{name} looks like this:
Here, @var{name} is the name of the function to define. A valid function
name is like a valid variable name: a sequence of letters, digits, and
underscores that doesn't start with a digit.
+Here too, only the 52 upper- and lowercase English letters may
+be used in a function name.
Within a single @command{awk} program, any particular name can only be
used as a variable, array, or function.
@@ -20532,8 +20542,8 @@ function mystrtonum(str, ret, n, i, k, c)
ret = 0
for (i = 1; i <= n; i++) @{
c = substr(str, i, 1)
- # index() returns 0 if c not in string,
- # includes c == "0"
+ # index() returns 0 if c not in string,
+ # includes c == "0"
k = index("1234567", c)
ret = ret * 8 + k
@@ -20546,8 +20556,8 @@ function mystrtonum(str, ret, n, i, k, c)
for (i = 1; i <= n; i++) @{
c = substr(str, i, 1)
c = tolower(c)
- # index() returns 0 if c not in string,
- # includes c == "0"
+ # index() returns 0 if c not in string,
+ # includes c == "0"
k = index("123456789abcdef", c)
ret = ret * 16 + k
@@ -31300,7 +31310,7 @@ and is managed by @command{gawk} from then on.
The API defines several simple @code{struct}s that map values as seen
from @command{awk}. A value can be a @code{double}, a string, or an
array (as in multidimensional arrays, or when creating a new array).
-String values maintain both pointer and length since embedded @sc{nul}
+String values maintain both pointer and length since embedded @value{NUL}
characters are allowed.
@quotation NOTE
@@ -31432,7 +31442,7 @@ Scalar values in @command{awk} are either numbers or strings. The
indicates what is in the @code{union}.
Representing numbers is easy---the API uses a C @code{double}. Strings
-require more work. Since @command{gawk} allows embedded @sc{nul} bytes
+require more work. Since @command{gawk} allows embedded @value{NUL} bytes
in string values, a string must be represented as a pair containing a
data-pointer and length. This is the @code{awk_string_t} type.
@@ -38678,7 +38688,7 @@ the derived files, because that keeps the repository less cluttered,
and it is easier to see the substantive changes when comparing versions
and trying to understand what changed between commits.
-However, there are two reasons why the @command{gawk} maintainer
+However, there are several reasons why the @command{gawk} maintainer
likes to have everything in the repository.
First, because it is then easy to reproduce any given version completely,
@@ -38747,6 +38757,14 @@ the maintainer is no different than Jane User who wants to try to build
Thus, the maintainer thinks that it's not just important, but critical,
that for any given branch, the above incantation @emph{just works}.
+@c Added 9/2014:
+A third reason to have all the files is that without them, using @samp{git
+bisect} to try to find the commit that introduced a bug is exceedingly
+difficult. The maintainer tried to do that on another project that
+requires running bootstrapping scripts just to create @command{configure}
+and so on; it was really painful. When the repository is self-contained,
+using @command{git bisect} in it is very easy.
+
@c So - that's my reasoning and philosophy.
What are some of the consequences and/or actions to take?