From 745ed52e240e9f994f8dfc8c093ff43775efe497 Mon Sep 17 00:00:00 2001 From: Kaz Kylheku Date: Sat, 7 Mar 2020 19:40:52 -0800 Subject: strings: bugfix: broken inequality comparisons. Inequality comparisons of strings and symbols are broken due to assuming that cmp_str returns -1 or 1. cmp_str uses the C library function wscsmp, and is exposed as the Lisp function cmp-str. That is correctly documented as returning a negative or positive value. But a few function in lib.c assume otherwise. On newer glibc's, at least on x86, it seems that wcscmp does return 1, 0 or -1 consistently; perhaps the newer optimized assembly routines are ensuring this. It shows up on older glibc installations where the C version just returns the difference between the mismatching characters. * lib.c (cmp_str): Now returns -1, 0 or 1. * txr.1: Specify the stronger requirements on the cmp-str return value, adding a note that older versions conform to a weaker requirement. --- lib.c | 5 ++++- txr.1 | 14 ++++++++++---- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/lib.c b/lib.c index dbeb20a0..add50d30 100644 --- a/lib.c +++ b/lib.c @@ -4467,7 +4467,10 @@ val cmp_str(val astr, val bstr) case TYPE_PAIR(STR, STR): case TYPE_PAIR(LIT, STR): case TYPE_PAIR(STR, LIT): - return num(wcscmp(c_str(astr), c_str(bstr))); + { + int cmp = wcscmp(c_str(astr), c_str(bstr)); + return if3(cmp < 0, negone, if3(cmp > 0, one, zero)); + } case TYPE_PAIR(LSTR, LIT): case TYPE_PAIR(LSTR, STR): case TYPE_PAIR(LIT, LSTR): diff --git a/txr.1 b/txr.1 index 9875d439..fcccdf43 100644 --- a/txr.1 +++ b/txr.1 @@ -23695,12 +23695,12 @@ that length. .desc The .code cmp-str -function returns a negative integer if +function returns -1 if .meta left-string is lexicographically prior to -.metn right-string , -and a positive integer -if the reverse situation is the case. Otherwise the strings are equal +.metn right-string . +If the reverse relationship holds, it returns 1. +Otherwise the strings are equal and zero is returned. If either or both of the strings are lazy, then they are only forced to the @@ -23712,6 +23712,12 @@ The lexicographic ordering is naive, based on the character code point values in Unicode taken as integers, without regard for locale-specific collation orders. +Note: in \*(TX 232 and earlier versions, +.code cmp-str +conforms to a weaker requirements: any negative integer value +may be returned rather than -1, and any positive integer value +can be returned instead of 1. + .coNP Functions @, str= @, str< @, str> @ str>= and @ str<= .synb .mets (str= < left-string << right-string ) -- cgit v1.2.3