From 35c1d682049dd5bbc1a594cf00806439170da64c Mon Sep 17 00:00:00 2001
From: Kaz Kylheku <kaz@kylheku.com>
Date: Fri, 9 Apr 2021 06:53:47 -0700
Subject: doc: more details in string literals section.

* txr.1: advise user that numeric escapes in string literals
are not byte-wise, but specify code points.
---
 txr.1 | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/txr.1 b/txr.1
index f1033d7c..ba0ad124 100644
--- a/txr.1
+++ b/txr.1
@@ -3011,6 +3011,20 @@ as a delimiter. Thus,
 represents
 .strn "!;" .
 
+Note: strings in \*(TX consist of Unicode code points, not UTF-8 bytes;
+therefore the elements of a string literal notation cannot specify individual
+bytes.  Each instance of hexadecimal or octal escape specifies a code point,
+even if its value lies in the 8 bit range.
+However, when a \*(TX string is encoded to UTF-8,
+every code point in the range U+DC00 through U+DCFF is converted to a
+a single byte, by taking the low-order eight bits of its value. By manipulating
+code points in this special range, \*(TX programs can output arbitrary binary
+data into text streams. Also note that the
+.code \eu
+escape sequence for specifying code points found in some languages is
+unnecessary and absent.  More detailed information is given in the section
+Character Handling and International Characters.
+
 If the line ends in the middle of a literal, it is an error, unless the
 last character is a backslash. This backslash is a special escape which does
 not denote a character; rather, it indicates that the string literal continues
-- 
cgit v1.2.3