txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
*	compiler: rename local variable in optimizer.	Kaz Kylheku	2025-06-16	1	-7/+7
\| \| \| \| \| \| \|	* stdlib/optimize.tl (basic-blocks merge-jump-thunks): A local variable named bb is used for walking a list of basic blocks, and shadowing the self object, also named bb. It should be called bl.
*	compiler: bug: bad slot ref in optimizer.	Kaz Kylheku	2025-06-16	1	-1/+1
\| \| \| \| \| \| \| \|	* stdlib/optimize.tl (basic-blocks do-peephole-block): Update the links slot of the correct object, the basic block bl, not the basic blocks graph bb. This indicates that the code was never run hitherto. Some compiler changes I'm making revealed it.
*	autoload: bug: not clearing expand-hook correctly.	Kaz Kylheku	2025-06-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	* autoload.c (autload_try): We must bind the symbol name expand_hook_s, not its value from the expand_hook macro. This bug causes an intraction with ifx. If a form expanded under ifx hits autoload, the infix expansion hook is in effect for that autoloaded file. Users of the compiled TXR will not experience any ill effects, but when compiled files are removed, triggering fallback on source code, bad things happen.
*	compiler: missing wasteful register move elimination.	Kaz Kylheku	2025-06-16	1	-0/+2
\| \| \| \| \| \| \| \| \|	* stdlib/optimize.tl (basic-blocks do-peephole-block): Adding a case to remove a (mov X X) instruction, moving any register to itself. It's astonishing that this is missing. I'm seeing it happen in tail call cases now because when a tail call passes an unchanging argument, that becomes a self-assignment.
*	compiler: forgotten not/null reductions in if.	Kaz Kylheku	2025-06-16	1	-0/+6
\| \| \| \| \| \| \|	* stdlib/compiler.tl (compiler comp-if): Recognize cases like (if (not <expr>) <then> <else>) and convert to (if <expr> <else> <then>). Also the test (true <expr>) is reduced to <expr>.
*	hash: unused variables.	Kaz Kylheku	2025-06-15	1	-2/+0
\| \| \| \|	* hash.c (hash_isec): Remove unused variables h1 and h2.
*	compiler: value is no optional in fbind/lbind.	Kaz Kylheku	2025-06-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	This should hav been part of the May 26, 2025 commit d70b55a0023eda8d776f18d224e4487f5e0d484e. * stdlib/compiler.tl (compiler comp-fbind): The form is not optional in fbind/lbind bindings; the syntax is (sym form); we don't have to use optional binding syntax.
*	compiler: prepare tail call identification context.	Kaz Kylheku	2025-06-15	1	-2/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/compiler.tl (tail-fun-info): New struct type. The tail-fun special will be bound to instances of this. (compiler compile): Handle sys:rt-defun specially, via new comp-rt-defun. (compiler comp-return-from): Adjustment here; tail-fun does not carry the name, but a context structure with a name slot. (compiler comp-fbind): Whe compiling lbind, and thus potentially recursive functions, bind tail-fun to a new tail-fun-info context object carrying the name and lambda function. The env will be filled in later the compilation of the lambda. (compiler comp-lambda-impl): When compiling exactly that lambda expression that is indicated the tail-fun structure, store the parameter environment object into that structure, and also bind tail-pos to indicate that the body of the lambda is in the tail position. (compiler comp-rt-defun): New method, which destructures the (sys:rt-defun ...) call to extract the name and lambda, and uses those to wrap a tail-fun-info context around the compilation, similarly to what is done for local functions in comp-fbind.
*	compiler: tidiness issue in top dispatcher.	Kaz Kylheku	2025-06-15	1	-4/+4
\| \| \| \| \| \| \|	* stdlib/compiler.tl (compiler compile): Move the compiler-let case into the "compiler-only special operators" group. Consolidate the group of specially handled functions.
*	compiler: immediately called lambda: code gen tweak.	Kaz Kylheku	2025-06-15	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses some irregularities in the output of lambda-appply-transform, to make its output easier to destructure and use in tail recursion logic, in which the inner bindings will be turned into assignments of existing variables. * stdlib/compiler.tl (lambda-apply-transform): Move the binding of the al-val gensym from the inner let* block to the outer let/let where other gensyms are bound. Replace the ign-1 and ign-2 temporaries by a single gensym. Ensure that this gensym is bound.
*	ffi: remove dud elements array of ffi types.	Kaz Kylheku	2025-06-14	1	-27/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	We compute an array of ffi_type for aggregates, but never actually use it for anything; we don't give it to libffi. Let's get rid of it. * ffi.c (struct txr_ffi_type): Remove elements member. (ffi_type_struct_destroy_op): Remove freeing of elements and assignment to zero. (ffi_struct_calcft, ffi_union_calcft, ffi_array_calcft): Remove allocation and calculation of elements. (make_ffi_type_struct, make_ffi_type_union): No need to free elements when we are replacing the existing type.
*	lib: optimize set functions with hashes.	Kaz Kylheku	2025-06-13	2	-10/+67
\| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (diff, symdiff, isec, isecp, uni): When both sequences have 50 items or more, abandon the current approach, built hash tables and use a hash operation. This avoids impractically quadratic behavior on large inputs. * txr.1: Remove wording which states that the diff implementation de facto preserves orders of items from the left argument, like the obsolete set-diff function. Adjustd other wording.
*	hash: new functions hash-seq and hash-isecp.	Kaz Kylheku	2025-06-13	4	-3/+91
\| \| \| \| \| \| \| \| \| \| \|	* hash.c (hash_seq, hash_isecp): New functions. (hash_init): hash-seq and hash-isecp intrinsics registered. * hash.h (hash_seq, hash_isecp): Declared. * tests/010/hash.tl: New tests. * txr.1: DoOcumented.
*	compiler: track tail positions.	Kaz Kylheku	2025-06-13	1	-30/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/compiler.tl (ntp): New macro. (tail-pos, tail-fun): New special variables. (compiler (comp-setq, comp-lisp1-setq, comp-setqf, comp-if, comp-ift, comp-switch, comp-unwind-protect, comp-block, comp-catch, comp-let, comp-fbind, comp-lambda-impl, comp-fun, comp-for)): Identify non-tail-position expressions and turn off the tail position flag for recursing over those. (compiler comp-return-from): The returned expression is in the tail position if this is the block for the current function, otherwise not. (compiler (comp-progn, comp-or)): Positions other than the last are non-tail. (compiler comp-prog1): Nothing is tail position in a prog1 that has two or more expressions. (usr:compile-toplevel): For a new compile job, bind tail-pos to nil. There is no tail position until we are compiling a named function (not yet implemented).
*	listener: ignore_eof_count must be volatile.	Kaz Kylheku	2025-06-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	* parser.c (repl): Due to the longjmp-like non-local control transfers taking place, ignore_eof_count must be volatile. The reason is that we change it after saving the context, and then examine it after catching an exception. I'm seeing it have a bad value after an exception is caught, resulting in the ** EOF ignored by user preference" even though I configured an integer value.
*	listener: bugfix: evaluate listener-egnore-eof after rcfile.	Kaz Kylheku	2025-06-03	1	-1/+3
\| \| \| \| \| \| \| \|	* parser.c (repl): Our first sampling of listener-ignore-eof must occur after we load the rcfile, where it is typically configured, otherwise we pick up a nil value. If Ctrl-D is used on the very first command of a session, TXR will then quit in spite of the user having configured the variable.
*	infix: no phony infix over lambdas and such.	Kaz Kylheku	2025-06-03	1	-1/+2
\| \| \| \| \| \| \|	* stdlib/infix.tl (funp): Do not recognize list forms as functions, such as lambda expressions or (meth ...) syntax. It causes surprisingly wrong transformations.
*	listener: new listener-ignore-eof variable.	Kaz Kylheku	2025-06-03	2	-3/+56
\| \| \| \| \| \| \| \| \| \| \| \| \|	* parser.c (listener_ignore_eof_s): New symbol variable. (repl): Copy the value of listener-ignore-eof into a local variable, which is reloaded after each command evaluation. On EOF, handle the cases involving the variable: positive integers count down, any other integer values quit, any non-nil value prevents quitting. (parse_init): Initialize listener_ignore_eof_s with interned symbol, and register the listener-ignore-eof variable. * txr.1: Documented.
*	listener: deodorize EOF handling.	Kaz Kylheku	2025-06-03	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is motivated by seeing a poor behavior, whose manifestation is platform dependent. In the listener if we run, say (get-json) and hit Ctrl-D, then after the get-json function reports failure, the listener will quit as if it received EOF. On older glibc/Linux systems, the listener does not experience EOF. Furthermore, in the EOF situation, this misleading diagnostic is seen: ** error reading interactive input. * parser.c (repl): Call clear_error on in_stream just before calling linenoise. This gets rid of any sticky EOF condition left behind by an input operation. On POSIX systems, if you use stdin to read from a terminal and receive EOF, you must clearerr(stdin) before continuing to read from the terminal. Otherwise input operations on the stream can just return the cached error indication without attempting to perform any input on the file descriptor. Somehow we are getting away without doing this on older systems like Ubuntu 18. Maybe something changed in the glibc stdio implementation. * linenoise/linenoise.c (complete_line): Don't directly return -1 on EOF, just set the stop = 1 variable, so the WEOF value will be returned, similarly to how it is done in history_search. (history_search): Set the error variable on EOF. (edit): Set the error variable on EOF.
*	streams: new get-buf function.	Kaz Kylheku	2025-06-03	4	-26/+94
\| \| \| \| \| \| \| \| \| \| \| \| \|	* stream.c (get_buf) New function. (stream_init): Register get-buf intrinsic. * stream.h (get_buf): Declared. * stdlib/getput.tl (sys:get-buf-common): Function removed. (file-get-buf, command-get-buf, map-command-buf, map-process-buf): Use get-buf instead of sys:get-buf-common. * txr.1: Documented.
*	*-get-buf: bug in skipping non-seekable streams.	Kaz Kylheku	2025-06-02	2	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/getput.tl (sys:get-buf-common); Fix incorrect algorithm for skipping forward in a stream that doesn't support seek-stream. The problem is that when the seek amont is greater than 4096, it does nothing but 4096 byte reads, which will overshoot the target position if it isn't divisible by 4096. The last read must be adjusted to the remaining seek amount. * tests/018/getput.tl: New test case using property-based approach to show that the read-based skip in get-buf-common fetches the same data as the seek-based skip.
*	json: read-bad-json allows single quoted strings.	Kaz Kylheku	2025-06-02	8	-5302/+5366
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It happens in the wild that sometimes JSON-like data must be processed in which strings are delimited by single quotes rather than double quotes. The data is valid Javascript syntax, so JS people don't even notice anything is wrong. * parser.c (struct parser): New member, json_quote_char. This helps the scanner keep track of which closing character it is expecting. * parser.c (parser_common_init): Initialize json_quote_char. * parser.l (JPUNC, NJPUNC): Include single quote (ASCII apostrophe) in JPUNC, and exclude it from NJPUNC. (grammar): When we see either a double quote or single quote in JLIT mode, we return it as itself if that character is the delimiter for the currently scanned string. Otherwise we return it as a LITCHAR, which gets accumulated by the parser into the current string. Include the double. When we see either a double quote or single quote, we transition to the JLIT state. The parser will check whether a single quoted literal is allowed. We allow \' escapes in a single-quote literal unconditionally. We allow them in a double-quoted literal also, but only in read bad JSON mode. * parser.y (json_val): Recognize single-quoted literals, but generate an error unless in read bad JSON mode. Also, error production for unterminated single quote only diagnosed that way in read bad JSON mode, otherwise rejected as invalid JSON. * tests/010/json.tl: New tests. * txr.1: Documented. * lex.yy.c.shipped, y.tab.c.shipped: Regenerated.
*	streams: get-string for string byte input stream.	Kaz Kylheku	2025-06-02	1	-1/+2
\| \| \| \| \| \|	* stream.c (byte_in_ops): Wire get_string operation to generic_get_string, giving the stream get-line and get-string support.
*	Version 300.txr-300	Kaz Kylheku	2025-05-31	7	-1409/+1552
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* RELNOTES: Updated. * configure (txr_ver): Bumped version. * stdlib/ver.tl (lib-version): Bumped. * txr.1: Bumped version and date. * txr.vim, tl.vim: Regenerated. * protsym.c: Regenerated.
*	streams: regression: gc issue in get_string_from_stream.	Kaz Kylheku	2025-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \|	* stream.c (get_string_from_stream_common): The so->buf = 0 assignment must precede the call to string_own(buf), because the string out stream object may already be garbage, and the string_own call will reclaim it. If we don't null out the buffer, the string will get ownership of a freed buffer. This reproduced in the CSV test case on MacOS Lion, 32 bit x86.
*	parser: scan buflit characters faster.	Kaz Kylheku	2025-05-30	2	-3907/+3935
\| \| \| \| \| \| \| \| \| \|	* parser.l (BUFLIT): Instead of scanning a hexadecimal digit and using strol, we scan three separate cases, and do a very simple subtraction in each one. TXR Lisp .tlo files are full of large buffer literals, so this affects loading speed. * lex.yy.c.shipped: Regenerated.
*	parser: two fixes in buf literals.	Kaz Kylheku	2025-05-30	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	* parser.y (buflit_items): Here we have length_buf($$) referring to the semantic result value of the rule. It should be referring to $1. It works because the Bison-generated code runs the $$ = $1 logic before all rules. (buflit_item): Let's use num_fast rather than num to produce the byte value since. * y.tab.c.shipped: Regenerated.
*	buf: alternative constructor with C type arguments.	Kaz Kylheku	2025-05-30	7	-19/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many functions call make_buf, having to convert C types to the Lisp arguments using num or unum. Those conversions immediately get undone inside make_buf, and are subject to a wasteful type check. * buf.c (make_buf_fast): New function. * buf.h (make_buf): Misleading parameter renamed. (make_buf_fast): Declared. (sub_buf, buf_list, make_buf_stream, buf_fash, buf_and, buf_trunc): Replace make_buf with make_buf_fast. * lib.c (seq_build_init): Likewise. * ffi.c (ffi_put): Likewise. * stream.c (get_line_as_buf, iobuf_get): Likewise. * parser.y (buflit, buflit_items): Likewise. * y.tab.c.shipped: Regenerated.
*	buf: use C types in buf and buf_strm structures.	Kaz Kylheku	2025-05-30	13	-275/+300
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using Lisp types for lengths, indices and buffer sizes has been awkward due to all the conversions. The code in buf.c and other code elsewhere that touches buffers, overall improves when we revise this decision. Mostly there are fewer conversion from Lisp to C which require a type check and self symbol for error diagnsotis, like c_unum(len, self). In a few places, there are conversions in the other direction that were not needed before, like unum(b->len). These are simpler and faster. * lib.h (struct buf): Members len and size change from val to ucnum (pointer-sized unsigned integer). * gc.c (mark_obj): No need to do anything with BUF any more; it has no Lisp object references. * buf.c (BUR_BORROWED): New preprocessor symbol. Because the allocation size can no longer be nil to indicate that the buffer is borrowed (the buffer object doesn't own the memory) we use this value instead: the highest value of the ucnum type. (buf_check_alloc_size): len parameter changes from cnum to ucnum. Defends against the BUF_BORROWED value. (buf_check_index): Returns ucnum rather than cnum. (err_oflow): NORETURN attribute added to prevent some spurious compiler warnings. (prepare_pattern): Drop c_unum conversion of len. (make_buf): Some locals change from cnum to ucnum. We lose an unnecessary conversion. (init_borrowed_buf): Take len as ucnum rather than val. Use BUF_BORROWED value for allocated size to indicate borrowed status. (make_borrowed_buf): Take len as ucnum. (make_duplicate_buf, make_owned_buf): Take len as ucnum, and take a self argument. Check for the size being BUF_BORROWED and reject. (make_ubuf): Parameter renamed. Lose a conversion from ucnum to val. (copy_buf): Check for allocated size being BUF_BORROWED to distinguish the two cases, rather than it being nil or not. (buf_shrink): Simplifies: loses a C to Lisp integer conversion, and no longer needs the local variable self. (buf_trim): Reject borrowed buffers by noticing the BUF_BORROWED value. (buf_do_set_len): len param becomes ucnum. Two Lisp-to-C integer conversiond disappear; one C-to-Lisp moves elsewhere in the code. (buf_set_length): Use buf_check_len to check and convert incoming len to ucnum. This is an improvement over the previous approach of letting buf_do_set_len to just rely on c_num conversions and their generic diagnostics. (buf_free): Check for borrowed buffer by comparing allocated size to BUF_BORROWED constant. (length_buf): ucnum to Lisp conversion now required here. (buf_alloc_size): Check for alloc_size being BUF_BORROWED and convert that to a nil return value. When returning an integer size, we need a conversion to a Lisp integer. (sub_buf): Use self symbol rather than lit("sub") when obtaining buffer handle. Use buf_check_len to validate the length and convert to C type. (replace_buf): Some cnum local variables become ucnum. We need a very careful comparison of w and l because w remains signed while l is unsigned. (buf_list): Substantially rewritten. We don't calculate the length of the sequence upfront, but extend the buffer as we add the elements to it. (buf_move_bytes): size parameter changes from cnum to ucnum. Lisp arithmetic replaced with C arithmetic; conversions eliminated. (buf_put_buf): Conversion eliminated in call to buf_move_bytes. (buf_put_bytes): Function reduced to wrapper for but_move_bytes, since it is almost identical. The only difference is that it performs memcpy rather than memmove which is not worth a separate function. (buf_put_i8, buf_put_u8, buf_put_char, buf_put_uchar, buf_get_i8, buf_get_u8): Simplified with C arithmetic and fewer conversions; cnum use replaced with ucnum. (buf_get_bytes): size parameter goes from cnum to ucnum. Overflow check for p + size addition added. (buf_print): Two conversions removed. (buf_str_sep): Conversion removed. (struct buf_strm): pos member changes from val to ucnum. (buf_strm_mark): Do not mark p->pos, no longer a Lisp object. (buf_strm_put_byte_callback): Lisp arithmetic removed, but a unum conversion is needed now in calling buf_put_uchar. That could be eliminated by not using the public interface. (buf_strm_get_byte_callback): Eliminate buf_check_index to validate the stream position; we simply check it against b->len. Becomes simple one liner. (buf_strm_get_char): Local variable index renamd to pos. Two conversions from and to Lisp eliminated, leaving no conversions. (buf_strm_unget_byte): Local variable p renamed to pos and changes from cnum to ucnum. Two conversions eliminated leaving no conversions. (buf_strm_fill_buf): Conversions eliminated. Check for the allocated size being BUF_BORROWED, in which case we fall back on using the length. Lisp arithmetic eliminated. (buf_strm_seek): Offset calculation done with C arithmetic and bounds checks. (buf_strm_truncate): Check incoming len with buf_check_len and convert to ucnum. Lisp arithmetic and conversions eliminated; buf_do_set_len used instead of public interface buf_set_length. (buf_strm_get_error): Use C comparison rather than ge function, and convert to t or nil result. (buf_strm_get_error_str): Bug: do not call errno_to_string since buffers don't talk to an operating system API that uses errno. The only error condition is eof. Thus, return either "eof" or "no error". (make_buf_stream): Initialize pos to 0 rather than Lisp zero. (swap32, buf_str, str_buf, buf_int, buf_uint, int_buf, uint_buf): Conversions eliminated; int_buf and uint_buf use C multiplication by 8. We know this doesn't overflow because the MPI bignums restrict the number of bits to something countable by a word. (buf_compress, buf_decompress, str_compress, str_decompress): Conversions eliminated. (buf_ash, buf_fash, buf_and, buf_test, buf_or, buf_xor, buf_not, buf_trunc, buf_bitset, buf_bit, buf_zero_p, buf_count_ones, binary_width, buf_xor_pattern): Make necessary adjustments, adding and/or elimiating conversions. * buf.h (make_borrowed_buf, init_borrowed_buf, make_owned_buf, make_duplicate_buf, buf_put_bytes): Declarations updated. * lib.c (equal, less): Conversions eliminated in BUF cases. * eval.c (map_common): Add self argument to make_owned_buf call. * chksum.c (chksum_ensure_buf): len param changes from cnum to ucnum. Conversions eliminated and use of lt() switches to C less-than operator. (sha1_stream, sha1_buf, sha1, sha1_hash, sha1_end, sha256_stream, sha256_buf, sha256, sha256_hash, sha256_end, md5_stream, md5_buf, md5, md5_hash, md5_end): Adjustments: conversions eliminated. (crc32_buf): Conversion eliminated. * genchksum.txr: Changes to chksum.c actually made here. * ffi.c (ffi_buf_in, ffi_buf_get, ffi_buf_d_in, ffi_buf_d_get, buf_carray, put_carray, fill_carray, put_obj, get_obj): Simplified with removal of conversions. (fill_obj): Necessary adjustments, leaving same number of conversions. * hash.c (equal_hash): Remove conversion from BUF case. * rand.c (make_random_state): Remove conversion of seed to Lisp integer. (random_buf): Pass self to make_owned_buf. * strudel.c (strudel_unget_byte, strudel_fill_buf): Coversions removed, streamlining code. * stream.c (iobuf_get, iobuf_put): We cannot overload the len field with serving as a linked list since it's no longer a pointer. We instead use the struct any union member, which has a next pointer for this purpose. Because "t.next" overlaps with "b.size", and we must not clobber the size field, we save "b.size" by copying it into "b.len". When pulling buffers from the iobuf_free_list, we restore b.size from b.len. For good measure, We add a bug_unless assertion that the size is the expected one. I ran into a test case failure while working on this due to the size being clobbered to zero, and subsequent I/O with that zero-sized buffer being interpreted as EOF.
*	buf: bug in sub_buf.	Kaz Kylheku	2025-05-30	1	-1/+1
\| \| \| \| \| \|	* buf.c (sub_buf): Wrong comparison of from to 0 gets interpreted as a null pointer check (is from nil); we want zero at that point in the code.
*	buf: stream: switch approach for unget_char.	Kaz Kylheku	2025-05-29	2	-18/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We switch to the method used in string streams for ungetting characters, whereby we UTF-8 encode the pushed back character and push back the subsequent bytes, thereby unifying character and byte pushback. * buf.c (struct buf_strm): Remove member unget_c. (buf_strm_mark): Remove reference to unget_c. (strm_get_char): Remove code for obtaining previously pushed back character from s->unget_c stack. (buf_strm_unget_char): Rewrite using the approach of using utf8_encode to write the multi-byte representation of the character into utf8_tiny_buf, and then pushing back the bytes. (make_buf_stream): Don't initialize removed unget_c. * tests/018/streams.tl: New tests.
*	utf8: move duplicated code from parser and stream.	Kaz Kylheku	2025-05-28	4	-31/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stream.c (struct byte_input_ungetch): Removed. (byte_in_unget_char_callback): Removed. (byte_in_unget_char): Use struct utf8_tiny_buf instead of struct byte_input_ungetch, and use utf8_tiny_buf_putc instead of byte_in_unget_char_callback. * parser.c (struct shadow_ungetch): Removed. (shadow_unget_char_callback): Removed. (shadow_unget_char): Use struct utf8_tiny_buf instead of struct shadow_ungetch, and use utf8_tiny_buf_putc instead of shadow_unget_char_callback. * utf8.c (utf8_tiny_buf_putc): New function, identical to shadow_unget_char_callback and byte_in_unget_char_callback. * utf8.h (struct utf8_tiny_buf): New struct type, identical to removed struct shadow_ungetch and struct byte_input_ungetch. (utf8_tiny_buf_putc): Declared.
*	buf: implement fill_buf operation for buf streams.	Kaz Kylheku	2025-05-28	2	-1/+38
\| \| \| \| \| \| \|	* buf.c (buf_strm_fill_buf): New function. (buf_strm_ops): Wire in buf_strm_fill_buf operation. * tests/018/streams.tl: New tests.
*	streams: bugfix in string input seek error diagnostic.	Kaz Kylheku	2025-05-28	1	-1/+1
\| \| \| \| \| \| \|	* stream.c (string_in_seek): Update the value of len with the calculated value so that the out-of-bounds seek diagnostic diagnostic shows the string length rather than nil.
*	streams: seek operation for string byte input stream.	Kaz Kylheku	2025-05-28	2	-2/+51
\| \| \| \| \| \| \|	* stream.c (byte_in_seek): New function. (byte_in_ops): Wire in byte_in_seek. * tests/018/streams.tl: New tests.
*	doc: revise description of truncation and seeking.	Kaz Kylheku	2025-05-27	1	-2/+51
\| \| \| \| \| \| \| \|	* txr.1: Fix text saying that positioning operations are not supported on string input streams, and that seeking beyond the end is not allowed. Extend descriptions of seek-stream and truncate-stream to mention behaviors and which streams support them.
*	buf: fix seek and truncate operations.	Kaz Kylheku	2025-05-27	3	-3/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* buf.c (buf_strm_seek): fix incorrect from-end calculation which must be an addition, not subtraction. Throw a file-error, not generic error. When the seek size exceeds the buffer size, extend it with zeros. (buf_strm_truncate): Completely revised. When the requested length lies beyond the current position, the buffer's length is set to that position, which may truncate or extend it. When the requested length lies below the current position, the buffer is truncated only to the current position, not below. The byes below the position, down to the truncation position, are obliterated to zero. Added missing check for negative offset. * tests/018/streams.tl: New tests. * txr.1: Documentation added.
*	buf: buf stream don't need to type check the buf obj.	Kaz Kylheku	2025-05-27	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \|	* buf.c (us_buf_handle): New inline function. (buf_strm_get_byte_callback, buf_strm_unget_byte, buf_strm_seek, buf_strm_get_error): Use us_buf_handle instead of buf_handle. A few functions don't need the self variable because of this. (make_buf_stream): Call buf_handle here to force a type check. After this, we are sure the buf_strm holds a buffer.
*	buf: obtaining a buf handle doesn't require cast.	Kaz Kylheku	2025-05-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	* buf.c (buf_handle): We don't need to coerce here, since we can just take the address of the union member. This function was written that way back in 2017. Why? I may have had a different implementation going in which the Lisp cell was pointing to a separately allocated structure, but then changed it to all fit into the cell (except for the buffer itself) before making the first commit.
*	txr: garbage in debug traces for failed pattern functions.	Kaz Kylheku	2025-05-27	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was broken by a regression on Aug 23, 2015, by commit a6fa35d2877745ba0b285093c40c1a3aad82a0e8, subject line: "Use of new args for function calls in interpreter." * match.c (h_fun, v_fun): Remove args only wrongly referenced in a debugf call. The debugf call has not changed since 2011, and originally referenced a more lexically outer args variable of type val, which was wrongly shadowed. This causes garbage to print in the debug traces like #<bad-float> instead of the arguments.
*	streams: seek-stream must fail on extracted string out stream.	Kaz Kylheku	2025-05-26	1	-0/+3
\| \| \| \| \| \| \|	* stream.c (string_out_seek): Do the extracted error check so that we fail if the data has already been removed form the stream. There is a test case already for this, which is failing.
*	compiler: function bindings syntax cannot be atom.	Kaz Kylheku	2025-05-26	1	-3/+2
\| \| \| \| \| \| \| \| \|	* stdlib/compiler.tl (compiler comp-fbind): We don't have to normalize the function binding syntax of a sys:fbind or sys:lbind. This code was copy and pated from (compiler comp-let). These bindings are always (name lambda) pairs and are machine-generated that way. If a name ocurred, it woudl not be correct to rewrite it to (name).
*	repl: regression: not completing keywords.	Kaz Kylheku	2025-05-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	This was broken on May 18, 2021 by commit ade5f33b12edce9b707a6038bf630d459f78212, subject line: "listener: don't complete on unbound symbols". * parser.c (find_matching_syms): If the symbol is a keyword, do not require it to have binding.
*	streams: seek and truncate ops for string streams.	Kaz Kylheku	2025-05-25	4	-28/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes seek-stream and truncate-stream work for string output streams, and seek-stream for string input streams. * stream.c (string_in_seek): New static function. (string_in_ops): Wire seek operation to string_in_seek. (struct string out): New member, len. Keeps track of the length of the string, so that fill can be freely positioned. (string_out_put_string): We can no longer add the string with a null terminator, because the put operation could be happening at any position. We only add the null terminator when we are writing the data at the end. This function also now supports buffer extension: the seek operation can seek beyond the current string. The seek operation then calls string_out_put_string with a null string. This function then grows th buffer as needed. In that case there is a need to fill the space with space characters. (string_out_truncate, string_out_seek): New static functions. (string_out_ops): Wire in string_out_seek and string_out_truncate. (make_string_output_stream); Initialize new so->len member to zero. (get_string_from_stream_common): New function, renamed from get_string_from_stream, and taking a parameter to optionally request non-destructive readout. (stream_init): Update registration of get-string-from-stream to get_string_from_stream_common. * stream.h (get_string_from_stream_common): Declared. (get_string_from_stream): Becomes inline function which calls get_string_from_stream, defaulting the argument. Why I didn't add the argument to get_string_from_stream is not to have to edit numerous calls to get_string_to_stream throughout the code base. * tests/018/streams.tl: New tests. * txr.1: Documentation updated to correct text claiming that string streams don't support truncate-stream and seek-stream, and describe the support in detal.
*	streams: implement get_string using get_string virtual op.	Kaz Kylheku	2025-05-24	2	-13/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stream.c (get_string): Replace inefficient loop that pushes characters into a string output stream with a call to ops->get_string. Because that virtual can return nil at end-of-stream or if nchars is zero, we check for that and convert that to an empty string return, since get-string never returns nil. * txr.1: Clarify that get-string never returns nil, and also document the conditions under which get-delimited-stream (the most transparent wrapper around the get_string virtual) does return nil.
*	streams: replace get_line virtual with new interface.	Kaz Kylheku	2025-05-24	10	-53/+239
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stream.h (struct strm_ops): The simple get_line virtual is replaced by get_string, which takes a character limit and a delimiting stop character. (strm_ops_init): Rename get_line parameter to get_string. (get_string_s): Declared. (generic_get_line): Declaration removed. (generic_get_string, get_delimited_string): Declared. * stream.c (get_string_s): New symbol variable. (unimpl_get_line): Function removed. (unimpl_get_string): New function. (null_get_line): Function removed. (null_get_string): New function. (fill_stream_ops): Configure ops->get_string rather than ops->get_line. (null_ops): Wire null_get_string in place of null_get_line. (generic_get_line): Renamed to generic_get_string. (generic_get_string): Implement the limit and stop_char parameters. (get_line_limited_check): New static function. (stdio_ops): Wire in generic_get_string instead of generic_get_line. (tail_get_line): Replaced by tail_get_string. (tail_get_string): Call generic_get_string instead of generic_get_line, and pass the limit and stop_char arguments down. (tail_ops): Wire in tail_get_string instead of tail_get_line. (pipe_ops): Wire generic_get_string instead of generic_get_line. (dir_get_line): Renamed to dir_get_string. (dir_get_string): Use get_line_limited_check to defend against unhandled argument values. (dir_ops): Wire dir_get_string instead of dir_get_line. (string_in_get_line): Replaced by string_in_get_string. (string_in_get_string): Implement limit and stop_char parameters. (string_in_ops): Wire string_in_get_string instead of string_in_get_line. (strlist_in_get_line): Replaced with strlist_in_get_string. (strlist_in_get_string): Use get_line_limited_check to defend against unsupported arguments. (strlist_in_ops): Wire in strlist_in_get_string instead of strlist_in_get_line. (cat_get_line): Replaced by cat_get_string. (cat_get_string): Rather than recursing into the get_line public interface, we fetch the stream's get_string virtual and pass all arguments to it. (cat_stream_ops): Wire cat_get_string instead of cat_get_line. (record_adapter_get_line): Replaced by record_adapter_get_string. (record_adapter_get_string): use get_line_limited_check to guard against unsupported arguments. (record_adapter_ops): Wire record_adapter_get_string instead of record_adapter_get_line. (get_line): Implement using get_string virtual now. We pass UINT_PTR_MAX as limit, which means no character limit, and '\n' as the delimiter for reading a line. (get_delimited_string): New function, which exposes the full semantics of the get_string virtual. (stream_init): Initialize get_string_s. Register get-delimited-string function. Use get_string_s symbol in registration of get-string. * strudel.c (strudel_get_line): Replaced by strudel_get_string. (strudel_get_string): Call look up the get-string method and pass all arguments to it, encoded into Lisp values in the right way, nil indicating not present. (strudel_ops): Wire strudel_get_string in place of strudel_get_line. * parser.c (shadow_ops_template): Replace generic_get_line with generic_get_string. * buf.c (buf_strm_ops): Likewise. * socket.c (dgram_strm_ops): Likewise. * gzio.c (gzio_ops_rd): Likewise. * stdlib/stream-wrap.tl (stream-wrap get-line): Method replaced by (stream-wrap get-string). This calls get-delimited-string rather than get-line. * tests/018/streams.tl: New tests, mainly concerned with the new logic in the string input stream which has its own implementation of get_string with several cases. * txr.1: Document new get-delimited-string function, and the get-string method of the delegate stream, removing the documentation for removed get-line method.
*	parser: json: bugfix: nuke lookahead token.	Kaz Kylheku	2025-05-24	2	-1/+8
\| \| \| \| \| \| \| \| \| \|	The get-json function leaves a lookahead token in the parser, which is interpreted by the next call to the parser. * parser.c (prime_parser_post); Obliterate the yy_char token in the JSON case, just like we do for iread. * tests/018/streams.tl: New test case.
*	streams: fill_buf for string byte input streams.	Kaz Kylheku	2025-05-24	1	-1/+20
\| \| \| \| \| \|	* stream.c (byte_in_fill_buf): New function. (byte_in_ops): Wire byte_in_fill_buf in place of generic_fill_buf.
*	parser: implement fill_buf for shadow stream.	Kaz Kylheku	2025-05-24	1	-1/+25
\| \| \| \| \| \|	* parser.c (shadow_fill_buf): New function. (shadow_ops_template): Wire shadow_fill_buf in place of generic_fill_buf.
*	parser: shadow stream: get bytes without copying.	Kaz Kylheku	2025-05-24	4	-34/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use Flex buffer cycling to avoid a memcpy. * parser.h (struct shadow_context): Moved here out of parser.c. New members bs and scanner: it will keep track of the flex buffer directly rather than copying it. (scanner_get_buffered_bytes): Declaration updated. (scanner_free_buffer_bytes): Declared. * parser.l (scanner_get_buffered_bytes): Reimplemented with different interface. We switch the scanner to a new, empty buffer, which liberates the previous one, allowing us to take ownership. We store the scanner and that buffer into the context, and set up the buf, index and size to reference into the buffer. We no longer have to mess with yy_hold_char; it is restored into the buffer by the yy_switch_to_buffer operation. (scanner_free_buffer_bytes): New function. * parser.c (struct shadow context): Removed from here. (shadow_detach): Call scanner_free_buffered_bytes. * lex.yy.c: Regenerated.