diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 76 |
1 files changed, 47 insertions, 29 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index a809bd0d..148032aa 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -37,11 +37,13 @@ @ifnotdocbook @set BULLET @bullet{} @set MINUS @minus{} +@set NUL @sc{nul} @end ifnotdocbook @ifdocbook @set BULLET @set MINUS +@set NUL NUL @end ifdocbook @set xref-automatic-section-title @@ -5277,10 +5279,10 @@ with @samp{A}. @cindex POSIX @command{awk}, period (@code{.})@comma{} using In strict POSIX mode (@pxref{Options}), -@samp{.} does not match the @sc{nul} +@samp{.} does not match the @value{NUL} character, which is a character with all bits equal to zero. -Otherwise, @sc{nul} is just another character. Other versions of @command{awk} -may not be able to match the @sc{nul} character. +Otherwise, @value{NUL} is just another character. Other versions of @command{awk} +may not be able to match the @value{NUL} character. @cindex @code{[]} (square brackets), regexp operator @cindex square brackets (@code{[]}), regexp operator @@ -6429,7 +6431,7 @@ a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files. -You might think that for text files, the @sc{nul} character, which +You might think that for text files, the @value{NUL} character, which consists of a character with all bits equal to zero, is a good value to use for @code{RS} in this case: @@ -6438,23 +6440,23 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @end example @cindex differences in @command{awk} and @command{gawk}, strings, storing -@command{gawk} in fact accepts this, and uses the @sc{nul} +@command{gawk} in fact accepts this, and uses the @value{NUL} character for the record separator. This works for certain special files, such as @file{/proc/environ} on -GNU/Linux systems, where the @sc{nul} character is in fact the record separator. +GNU/Linux systems, where the @value{NUL} character is in fact the record separator. However, this usage is @emph{not} portable to most other @command{awk} implementations. @cindex dark corner, strings, storing Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the -@sc{nul} character as the string terminator. In effect, this means that +@value{NUL} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} -It happens that recent versions of @command{mawk} can use the @sc{nul} +It happens that recent versions of @command{mawk} can use the @value{NUL} character as a record separator. However, this is a special case: -@command{mawk} does not allow embedded @sc{nul} characters in strings. +@command{mawk} does not allow embedded @value{NUL} characters in strings. @cindex records, treating files as @cindex treating files, as single records @@ -6479,7 +6481,7 @@ a value that you know doesn't occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files. -You might think that for text files, the @sc{nul} character, which +You might think that for text files, the @value{NUL} character, which consists of a character with all bits equal to zero, is a good value to use for @code{RS} in this case: @@ -6488,23 +6490,23 @@ BEGIN @{ RS = "\0" @} # whole file becomes one record? @end example @cindex differences in @command{awk} and @command{gawk}, strings, storing -@command{gawk} in fact accepts this, and uses the @sc{nul} +@command{gawk} in fact accepts this, and uses the @value{NUL} character for the record separator. This works for certain special files, such as @file{/proc/environ} on -GNU/Linux systems, where the @sc{nul} character is in fact the record separator. +GNU/Linux systems, where the @value{NUL} character is in fact the record separator. However, this usage is @emph{not} portable to most other @command{awk} implementations. @cindex dark corner, strings, storing Almost all other @command{awk} implementations@footnote{At least that we know about.} store strings internally as C-style strings. C strings use the -@sc{nul} character as the string terminator. In effect, this means that +@value{NUL} character as the string terminator. In effect, this means that @samp{RS = "\0"} is the same as @samp{RS = ""}. @value{DARKCORNER} -It happens that recent versions of @command{mawk} can use the @sc{nul} +It happens that recent versions of @command{mawk} can use the @value{NUL} character as a record separator. However, this is a special case: -@command{mawk} does not allow embedded @sc{nul} characters in strings. +@command{mawk} does not allow embedded @value{NUL} characters in strings. @cindex records, treating files as @cindex treating files, as single records @@ -10425,7 +10427,7 @@ double-quotation marks. For example: @cindex strings, length limitations represents the string whose contents are @samp{parrot}. Strings in @command{gawk} can be of any length, and they can contain any of the possible -eight-bit ASCII characters including ASCII @sc{nul} (character code zero). +eight-bit ASCII characters including ASCII @value{NUL} (character code zero). Other @command{awk} implementations may have difficulty with some character codes. @@ -10709,7 +10711,11 @@ on the @command{awk} command line. Variables let you give names to values and refer to them later. Variables have already been used in many of the examples. The name of a variable must be a sequence of letters, digits, or underscores, and it may not begin -with a digit. Case is significant in variable names; @code{a} and @code{A} +with a digit. +Here, a @dfn{letter} is any one of the 52 upper- and lowercase +English letters. Other characters that may be defined as letters +in non-English locales are not valid in variable names. +Case is significant in variable names; @code{a} and @code{A} are distinct variables. A variable name is a valid expression by itself; it represents the @@ -14128,9 +14134,8 @@ the beginning, in the following manner: @example NF != 4 @{ - err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) - print err > "/dev/stderr" - next + printf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) > "/dev/stderr" + next @} @end example @@ -14772,8 +14777,11 @@ or @code{"FPAT"} if field matching with @code{FPAT} is in effect. @item PROCINFO["identifiers"] @cindex program identifiers -A subarray, indexed by the names of all identifiers used in the -text of the AWK program. For each identifier, the value of the element is one of the following: +A subarray, indexed by the names of all identifiers used in the text of +the AWK program. An @dfn{identifier} is simply the name of a variable +(be it scalar or array), built-in function, user-defined function, or +extension function. For each identifier, the value of the element is +one of the following: @table @code @item "array" @@ -19192,6 +19200,8 @@ The definition of a function named @var{name} looks like this: Here, @var{name} is the name of the function to define. A valid function name is like a valid variable name: a sequence of letters, digits, and underscores that doesn't start with a digit. +Here too, only the 52 upper- and lowercase English letters may +be used in a function name. Within a single @command{awk} program, any particular name can only be used as a variable, array, or function. @@ -20532,8 +20542,8 @@ function mystrtonum(str, ret, n, i, k, c) ret = 0 for (i = 1; i <= n; i++) @{ c = substr(str, i, 1) - # index() returns 0 if c not in string, - # includes c == "0" + # index() returns 0 if c not in string, + # includes c == "0" k = index("1234567", c) ret = ret * 8 + k @@ -20546,8 +20556,8 @@ function mystrtonum(str, ret, n, i, k, c) for (i = 1; i <= n; i++) @{ c = substr(str, i, 1) c = tolower(c) - # index() returns 0 if c not in string, - # includes c == "0" + # index() returns 0 if c not in string, + # includes c == "0" k = index("123456789abcdef", c) ret = ret * 16 + k @@ -31300,7 +31310,7 @@ and is managed by @command{gawk} from then on. The API defines several simple @code{struct}s that map values as seen from @command{awk}. A value can be a @code{double}, a string, or an array (as in multidimensional arrays, or when creating a new array). -String values maintain both pointer and length since embedded @sc{nul} +String values maintain both pointer and length since embedded @value{NUL} characters are allowed. @quotation NOTE @@ -31432,7 +31442,7 @@ Scalar values in @command{awk} are either numbers or strings. The indicates what is in the @code{union}. Representing numbers is easy---the API uses a C @code{double}. Strings -require more work. Since @command{gawk} allows embedded @sc{nul} bytes +require more work. Since @command{gawk} allows embedded @value{NUL} bytes in string values, a string must be represented as a pair containing a data-pointer and length. This is the @code{awk_string_t} type. @@ -38678,7 +38688,7 @@ the derived files, because that keeps the repository less cluttered, and it is easier to see the substantive changes when comparing versions and trying to understand what changed between commits. -However, there are two reasons why the @command{gawk} maintainer +However, there are several reasons why the @command{gawk} maintainer likes to have everything in the repository. First, because it is then easy to reproduce any given version completely, @@ -38747,6 +38757,14 @@ the maintainer is no different than Jane User who wants to try to build Thus, the maintainer thinks that it's not just important, but critical, that for any given branch, the above incantation @emph{just works}. +@c Added 9/2014: +A third reason to have all the files is that without them, using @samp{git +bisect} to try to find the commit that introduced a bug is exceedingly +difficult. The maintainer tried to do that on another project that +requires running bootstrapping scripts just to create @command{configure} +and so on; it was really painful. When the repository is self-contained, +using @command{git bisect} in it is very easy. + @c So - that's my reasoning and philosophy. What are some of the consequences and/or actions to take? |