aboutsummaryrefslogtreecommitdiffstats
path: root/README_d/README.multibyte
diff options
context:
space:
mode:
Diffstat (limited to 'README_d/README.multibyte')
-rw-r--r--README_d/README.multibyte22
1 files changed, 22 insertions, 0 deletions
diff --git a/README_d/README.multibyte b/README_d/README.multibyte
new file mode 100644
index 00000000..6bc973a6
--- /dev/null
+++ b/README_d/README.multibyte
@@ -0,0 +1,22 @@
+Wed Jun 18 16:47:31 IDT 2003
+============================
+
+Multibyte locales can cause occasional weirdness, in particular with
+ranges inside brackets: /[....]/. Something that works great for ASCII
+will choke for, e.g., en_US.UTF-8. One such program is test/gsubtst5.awk.
+
+By default, the test suite runs with LC_ALL=C and LANG=C. You
+can change this by doing (from a Bourne-style shell):
+
+ $ GAWKLOCALE=some_locale make check
+
+Then the test suite will set LC_ALL and LANG to the given locale.
+
+As of this writing, this works for en_US.UTF-8, and all tests
+pass except gsubtst5.
+
+For the normal case of RS = "\n", the locale is largely irrelevant.
+For other single byte record separators, using LC_ALL=C will give you
+much better performance when reading records. Otherwise, gawk has to
+make several function calls, *per input character* to find the record
+terminator. You have been warned.