aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2012-07-20 12:26:59 +0300
committerArnold D. Robbins <arnold@skeeve.com>2012-07-20 12:26:59 +0300
commit7bfc288d27bacb715ff63dbf71be53304917685a (patch)
treef575046eebd32bb710198e45072ec30e71255e7f /doc/gawk.texi
parent4fe1f4ac1aa0e4b99c9abb26794fc0d10ebb77c6 (diff)
downloadegawk-7bfc288d27bacb715ff63dbf71be53304917685a.tar.gz
egawk-7bfc288d27bacb715ff63dbf71be53304917685a.tar.bz2
egawk-7bfc288d27bacb715ff63dbf71be53304917685a.zip
Fix doc on ranges and locales.
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi26
1 files changed, 20 insertions, 6 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index fb17b716..bf30d012 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -66,6 +66,15 @@
@set DARKCORNER (d.c.)
@set COMMONEXT (c.e.)
@end ifdocbook
+@ifxml
+@set DOCUMENT book
+@set CHAPTER chapter
+@set APPENDIX appendix
+@set SECTION section
+@set SUBSECTION subsection
+@set DARKCORNER (d.c.)
+@set COMMONEXT (c.e.)
+@end ifxml
@ifplaintext
@set DOCUMENT book
@set CHAPTER chapter
@@ -27062,7 +27071,7 @@ Almost all introductory Unix literature explained range expressions
as working in this fashion, and in particular, would teach that the
``correct'' way to match lowercase letters was with @samp{[a-z]}, and
that @samp{[A-Z]} was the ``correct'' way to match uppercase letters.
-And indeed, this was true.
+And indeed, this was true.@footnote{And Life was good.}
The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}).
Since many locales include other letters besides the plain twenty-six
@@ -27080,12 +27089,14 @@ But outside those locales, the ordering was defined to be based on
In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
In other words, these locales sort characters in dictionary order,
and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
-instead it might be equivalent to @samp{[aBbCcdXxYyz]}, for example.
+instead it might be equivalent to @samp{[aBbCcDdXxYyZz]}, for example.
+(And to make things worse, on other systems, it might be equivalent to
+@samp{[aAbBcCdDxXyYz]}.)
This point needs to be emphasized: Much literature teaches that you should
use @samp{[a-z]} to match a lowercase character. But on systems with
non-ASCII locales, this also matched all of the uppercase characters
-except @samp{Z}! This was a continuous cause of confusion, even well
+except @samp{A} or @samp{Z}! This was a continuous cause of confusion, even well
into the twenty-first century.
To demonstrate these issues, the following example uses the @code{sub()}
@@ -27121,13 +27132,16 @@ the @command{gawk} maintainer grew weary of trying to explain that
@command{gawk} was being nicely standards-compliant, and that the issue
was in the user's locale. During the development of version 4.0,
he modified @command{gawk} to always treat ranges in the original,
-pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).
+pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).@footnote{And
+thus was born the Campain for Rational Range Interpretation (or RRI). A number
+of GNU tools, such as @command{grep} and @command{sed}, have either
+implemented this change, or will soon. Thanks to Karl Berry for coining the phrase
+``Rational Range Interpretation.''}
Fortunately, shortly before the final release of @command{gawk} 4.0,
the maintainer learned that the 2008 standard had changed the
definition of ranges, such that outside the @code{"C"} and @code{"POSIX"}
-locales, the meaning of range expressions was
-@emph{undefined}.@footnote{See
+locales, the meaning of range expressions was @emph{undefined}.@footnote{See
@uref{http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05, the standard}
and
@uref{http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05, its rationale}.}