diff options
Diffstat (limited to 'README_d/README.multibyte')
-rw-r--r-- | README_d/README.multibyte | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/README_d/README.multibyte b/README_d/README.multibyte new file mode 100644 index 00000000..6bc973a6 --- /dev/null +++ b/README_d/README.multibyte @@ -0,0 +1,22 @@ +Wed Jun 18 16:47:31 IDT 2003 +============================ + +Multibyte locales can cause occasional weirdness, in particular with +ranges inside brackets: /[....]/. Something that works great for ASCII +will choke for, e.g., en_US.UTF-8. One such program is test/gsubtst5.awk. + +By default, the test suite runs with LC_ALL=C and LANG=C. You +can change this by doing (from a Bourne-style shell): + + $ GAWKLOCALE=some_locale make check + +Then the test suite will set LC_ALL and LANG to the given locale. + +As of this writing, this works for en_US.UTF-8, and all tests +pass except gsubtst5. + +For the normal case of RS = "\n", the locale is largely irrelevant. +For other single byte record separators, using LC_ALL=C will give you +much better performance when reading records. Otherwise, gawk has to +make several function calls, *per input character* to find the record +terminator. You have been warned. |