aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2015-06-19 12:42:37 +0300
committerArnold D. Robbins <arnold@skeeve.com>2015-06-19 12:42:37 +0300
commitec58524cb5a671c18c4af1b893e599eb04c7760a (patch)
tree1d1c3d298ec82caa03c0cf5caeb0dd14b08ce247 /doc/gawk.texi
parent76e1f5bfee032dbcb5c19b3e4e92f96aa05731c3 (diff)
parentf7cd8a03c09a00c4cb520f881bbe838cf76e718f (diff)
downloadegawk-ec58524cb5a671c18c4af1b893e599eb04c7760a.tar.gz
egawk-ec58524cb5a671c18c4af1b893e599eb04c7760a.tar.bz2
egawk-ec58524cb5a671c18c4af1b893e599eb04c7760a.zip
Merge branch 'master' into feature/cmake
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi156
1 files changed, 151 insertions, 5 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index d61a47de..7552f164 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -562,6 +562,7 @@ particular records in a file and perform operations upon them.
* Computed Regexps:: Using Dynamic Regexps.
* GNU Regexp Operators:: Operators specific to GNU software.
* Case-sensitivity:: How to do case-insensitive matching.
+* Strong Regexp Constants:: Strongly typed regexp constants.
* Regexp Summary:: Regular expressions summary.
* Records:: Controlling how data is split into
records.
@@ -5013,6 +5014,7 @@ regular expressions work, we present more complicated instances.
* Computed Regexps:: Using Dynamic Regexps.
* GNU Regexp Operators:: Operators specific to GNU software.
* Case-sensitivity:: How to do case-insensitive matching.
+* Strong Regexp Constants:: Strongly typed regexp constants.
* Regexp Summary:: Regular expressions summary.
@end menu
@@ -6260,6 +6262,89 @@ The value of @code{IGNORECASE} has no effect if @command{gawk} is in
compatibility mode (@pxref{Options}).
Case is always significant in compatibility mode.
+@node Strong Regexp Constants
+@section Strongly Typed Regexp Constants
+
+This @value{SECTION} describes a @command{gawk}-specific feature.
+
+Regexp constants (@code{/@dots{}/}) hold a strange position in the
+@command{awk} language. In most contexts, they act like an expression:
+@samp{$0 ~ /@dots{}/}. In other contexts, they denote only a regexp to
+be matched. In no case are they really a ``first class citizen'' of the
+language. That is, you cannot define a scalar variable whose type is
+``regexp'' in the same sense that you can define a variable to be a
+number or a string:
+
+@example
+num = 42 @ii{Numeric variable}
+str = "hi" @ii{String variable}
+re = /foo/ @ii{Wrong!} re @ii{is the result of} $0 ~ /foo/
+@end example
+
+For a number of more advanced use cases (described later on in this
+@value{DOCUMENT}), it would be nice to have regexp constants that
+are @dfn{strongly typed}; in other words, that denote a regexp useful
+for matching, and not an expression.
+
+@command{gawk} provides this feature. A strongly typed regexp constant
+looks almost like a regular regexp constant, except that it is preceded
+by an @samp{@@} sign:
+
+@example
+re = @@/foo/ @ii{Regexp variable}
+@end example
+
+Strongly typed regexp constants @emph{cannot} be used eveywhere that a
+regular regexp constant can, because this would make the language even more
+confusing. Instead, you may use them only in certain contexts:
+
+@itemize @bullet
+@item
+On the righthand side of the @samp{~} and @samp{!~} operators: @samp{some_var ~ @@/foo/}
+(@pxref{Regexp Usage}).
+
+@item
+In the @code{case} part of a @code{switch} statement
+(@pxref{Switch Statement}).
+
+@item
+As an argument to one of the built-in functions that accept regexp constants:
+@code{gensub()},
+@code{gsub()},
+@code{match()},
+@code{patsplit()},
+@code{split()},
+and
+@code{sub()}
+(@pxref{String Functions}).
+
+@item
+As a parameter in a call to a user-defined function
+(@pxref{User-defined}).
+
+@item
+On the righthand side of an assignment to a variable: @samp{some_var = @@/foo/}.
+In this case, the type of @code{some_var} is regexp. Additionally, @code{some_var}
+can be used with @samp{~} and @samp{!~}, passed to one of the built-in functions
+listed above, or passed as a parameter to a user-defined function.
+@end itemize
+
+You may use the @code{typeof()} built-in function
+(@pxref{Type Functions})
+to determine if a variable or function parameter is
+a regexp variable.
+
+The true power of this feature comes from the ability to create variables that
+have regexp type. Such variables can be passed on to user-defined functions,
+without the confusing aspects of computed regular expressions created from
+strings or string constants. They may also be passed through indirect function
+calls (@pxref{Indirect Calls})
+onto the built-in functions that accept regexp constants.
+
+When used in numeric conversions, strongly typed regexp variables convert
+to zero. When used in string conversions, they convert to the string
+value of the original regexp text.
+
@node Regexp Summary
@section Summary
@@ -6303,6 +6388,11 @@ treated as regular expressions).
case sensitivity of regexp matching. In other @command{awk}
versions, use @code{tolower()} or @code{toupper()}.
+@item
+Strongly typed regexp constants (@code{@@/.../}) enable
+certain advanced use cases to be described later on in the
+@value{DOCUMENT}.
+
@end itemize
@@ -19387,16 +19477,41 @@ results of the @code{compl()}, @code{lshift()}, and @code{rshift()} functions.
@node Type Functions
@subsection Getting Type Information
-@command{gawk} provides a single function that lets you distinguish
-an array from a scalar variable. This is necessary for writing code
+@command{gawk} provides two functions that lets you distinguish
+the type of a variable.
+This is necessary for writing code
that traverses every element of an array of arrays
-(@pxref{Arrays of Arrays}).
+(@pxref{Arrays of Arrays}), and in other contexts.
@table @code
@cindexgawkfunc{isarray}
@cindex scalar or array
@item isarray(@var{x})
Return a true value if @var{x} is an array. Otherwise, return false.
+
+@cindexgawkfunc{typeof}
+@cindex variable type
+@cindex type, of variable
+@item typeof(@var{x})
+Return one of the following strings, depending upon the type of @var{x}:
+
+@c nested table
+@table @code
+@item "array"
+@var{x} is an array.
+
+@item "regexp"
+@var{x} is a strongly typed regexp (@pxref{Strong Regexp Constants}).
+
+@item "scalar_n"
+@var{x} is a number.
+
+@item "scalar_s"
+@var{x} is a string.
+
+@item "untyped"
+@var{x} has not yet been given a type.
+@end table
@end table
@code{isarray()} is meant for use in two circumstances. The first is when
@@ -19414,6 +19529,14 @@ that has not been previously used to @code{isarray()}, @command{gawk}
ends up turning it into a scalar.
@end quotation
+The @code{typeof()} function is general; it allows you to determine
+if a variable or function parameter is a scalar, an array, or a strongly
+typed regexp.
+
+@code{isarray()} is deprecated; you should use @code{typeof()} instead.
+You should replace any existing uses of @samp{isarray(var)} in your
+code with @samp{typeof(var) == "array"}.
+
@node I18N Functions
@subsection String-Translation Functions
@cindex @command{gawk}, string-translation functions
@@ -34975,17 +35098,31 @@ properly:
# Please set INPLACE_SUFFIX to make a backup copy. For example, you may
# want to set INPLACE_SUFFIX to .bak on the command line or in a BEGIN rule.
+# By default, each filename on the command line will be edited inplace.
+# But you can selectively disable this by adding an inplace=0 argument
+# prior to files that you do not want to process this way. You can then
+# reenable it later on the commandline by putting inplace=1 before files
+# that you wish to be subject to inplace editing.
+
# N.B. We call inplace_end() in the BEGINFILE and END rules so that any
# actions in an ENDFILE rule will be redirected as expected.
+BEGIN @{
+ inplace = 1 # enabled by default
+@}
+
BEGINFILE @{
if (_inplace_filename != "")
inplace_end(_inplace_filename, INPLACE_SUFFIX)
- inplace_begin(_inplace_filename = FILENAME, INPLACE_SUFFIX)
+ if (inplace)
+ inplace_begin(_inplace_filename = FILENAME, INPLACE_SUFFIX)
+ else
+ _inplace_filename = ""
@}
END @{
- inplace_end(FILENAME, INPLACE_SUFFIX)
+ if (_inplace_filename != "")
+ inplace_end(_inplace_filename, INPLACE_SUFFIX)
@}
@end group
@c endfile
@@ -34999,6 +35136,11 @@ If @code{INPLACE_SUFFIX} is not an empty string, the original file is
linked to a backup @value{FN} created by appending that suffix. Finally,
the temporary file is renamed to the original @value{FN}.
+Note that the use of this feature can be controlled by placing @samp{inplace=0}
+on the command-line prior to listing files that should not be processed this
+way. You can reenable inplace editing by adding an @samp{inplace=1} argument
+prior to files that should be subject to inplace editing.
+
The @code{_inplace_filename} variable serves to keep track of the
current filename so as to not invoke @code{inplace_end()} before
processing the first file.
@@ -35019,6 +35161,10 @@ $ @kbd{gawk -i inplace -v INPLACE_SUFFIX=.bak '@{ gsub(/foo/, "bar") @}}
> @kbd{@{ print @}' file1 file2 file3}
@end example
+Please note that, while the extension does attempt to preserve ownership and permissions, it makes no attempt to copy the ACLs from the original file.
+
+If the program dies prematurely, as might happen if an unhandled signal is received, a temporary file may be left behind.
+
@node Extension Sample Ord
@subsection Character and Numeric values: @code{ord()} and @code{chr()}