aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2016-02-20 21:07:13 +0200
committerArnold D. Robbins <arnold@skeeve.com>2016-02-20 21:07:13 +0200
commit090687d94ad8a411c5e2cc434345e843ad381082 (patch)
tree36f02d720c26513123b9c77c1540765af4b98dd7 /doc/gawk.texi
parent7744707de0e95e1e0009204a7d4886d69db24530 (diff)
downloadegawk-090687d94ad8a411c5e2cc434345e843ad381082.tar.gz
egawk-090687d94ad8a411c5e2cc434345e843ad381082.tar.bz2
egawk-090687d94ad8a411c5e2cc434345e843ad381082.zip
Doc update: Unicode in bracket expresssions.
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi9
1 files changed, 9 insertions, 0 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 211a0a7e..dcd49e6e 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -5618,6 +5618,15 @@ set. For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
standard and @command{gawk} have changed over time. This is mainly
of historical interest.)
+With the increasing popularity of the
+@uref{http://www.unicode.org, Unicode character standard},
+there is an additional wrinkle to consider. Octal and hexadecimal
+escape sequences inside bracket expressions are taken to represent
+only single-byte characters (characters whose values fit within
+the range 0--256). To match a range of characters where the endpoints
+of the range are larger than 256, enter the multibyte encodings of
+the characters directly.
+
@cindex @code{\} (backslash), in bracket expressions
@cindex backslash (@code{\}), in bracket expressions
@cindex @code{^} (caret), in bracket expressions