txr - TXR: A data munging language.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	infix: revise auto-detection.	Kaz Kylheku	2025-04-05	1	-17/+20
\| \| \| \| \| \| \| \| \| \| \|	* stdlib/infix.tl (parse-infix): Drop usr: package prefix; autoload.c interns this symbol in the usr package. (detect-infix): New function, whose single responsibility is determining whether the argument expression should be treated via parse-infix. (infix-expand-hook): Simplified by using detect-infix function.
*	infix: whitespace fix.	Kaz Kylheku	2025-04-05	1	-1/+1
\| \| \| \| \|	* stdlib/infix.tl (infix-error): Remove trailing whitespace.
*	infix: define = operator mapping to identity	Kaz Kylheku	2025-04-05	1	-0/+2
\| \| \| \| \| \| \| \|	* stdlib/infix.tl (toplevel): New prefix operator = at 0 precedence. This is useful for specifying an infix formula that is not being autodetected by ifx nicely. For instance an expression containing only array references can be obtained as (= a[i][j]).
*	infix: dynamic precedence algorithm	Kaz Kylheku	2025-04-04	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We implement a dynamic precedence algorithm whereby when an infix operator is immediately followed by a clump of one or more consecutive prefix operators, the infix operator's precedence is lowered to one less than the lowest one of the prefix operators. This creates nice handling for situations like (sqrt x + y - sqrt z + w) whose visual symmetry parses into (- (sqrt (+ x y)) (sqrt (+ z w))) rather than subordinating the second sqrt to the first one. * stdlib/infix.tl (parse-infix): Before processing an infix operator, calculate the prefix of the rest of the input that consists of nothing but consecutive prefix operators, and if it is nonempty, then use it to adjust the effective precedence used for the infix operator. This algorithm must only ever lower the precedence, never raise it.
*	infix: assignment must be right associative	Kaz Kylheku	2025-04-04	1	-1/+1
\| \| \| \| \| \|	* stdlib/infix.tl (toplevel): The := operator must be assoc :right so a := b := c becomes (set a (set b c)) and not (set (set a b) c).
*	infix: adjust operator expected diagnostic.	Kaz Kylheku	2025-04-04	1	-1/+1
\| \| \| \| \| \| \| \| \|	* stdlib/infix.tl (parse-infix): The operator expected diagnostic can occur not just before an an operand, but before an prefix operator. For instance "a cos b". An operator is expected between a and cos. We don't want to say "before operand cos" because cos is an operator.
*	Initial implementation of infix expressions.	Kaz Kylheku	2025-04-03	2	-0/+196
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The infix module provides a macro called ifx. Forms (evaluated expressions) enclosed inside ifx at any nesting level, which are not special operator or macro forms, are subject to automatic detection of an infix notation, which is transformed into regular Lisp. The notation is based on Lisp atoms; no read syntax is introduced. Infix may be freely mixed with ordinary Lisp. * autoload.c (infix_set_entries, infix_instantiate): New static functions. (autoload_init): Register new infix module for autoload. * stdlib/infix.tl: New file.
*	expander: expand arguments after hook processing.	Kaz Kylheku	2025-04-02	1	-14/+16
\| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (do_expand): Do not expand the arguments of a function call prior to checking for and dispatching the expand hook. The expand hook may rewrite the entire form and the arguments, so that those expansions are then thrown away. Not only is it wasteful to calculate them but possibly wrong. A form that is rewritten by a hook may have strange syntax, such that the hook itself will get confused if it is unleashed recursively on the constituent fragments of the syntax.
*	signal: consider SIGSYS as synchronous signal.	Kaz Kylheku	2025-04-01	1	-3/+3
\| \| \| \| \| \| \| \| \|	* signal.c (is_cpu_exception): Function renamed to is_synchronous, and also checks for SIGSYS. SIGSYS isn't a CPU exception, but is exception-like in that the program is immediately informed that it did something wrong. (sig_handler): Follow above rename.
*	expand-hook-combine: bugfix.	Kaz Kylheku	2025-04-01	3	-15/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here we fix bugs in expand-hook-combine, imrprove the tests and make different recommendations in the manual about hook order. * eval.c (expand_hook_combine_fun): Fix incorrect tests which cause the next function to be ignored. * tests/011/exphook.tl: (pico-style-expand-hook): Needs tweak to evaluate constantp using standard expansion (without pico-style), so that pico-style can nest with ifx in either order. (pico-style): Now when we call expand-hook-combine we give the new hook first, and the existing one next. This behavior makes more sense as a default go-to strategy because it gives priority to the innermost hook-based macro, closest to the code. (infix-expand-hook, ifx): Add test cases which test nesting of hook-based macros. * txr.1: Opposite recommendation made about chaining of expand hooks: new first, fall back on old. Example adjusted.
*	constantp: muffle all expander warnings.	Kaz Kylheku	2025-04-01	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes nuisance diagnostics from constantp, such as when invalid forms are given. An example is (constantp '(1)). * eval.c (no_ub_warn_expand): New name for previous no_warn_expand function. (no_warn_expand): Rewritten to muffle all warnings, and call no_ub_warn_expand. The constantp function continues to call no_warn_expand. (eval_init): Retarget expand intrinsic to no_ub_warn_expand, to preserve its documented behavior. The only function which currently uses no_warn_expand is constantp.
*	New function: expand-hook-combine.	Kaz Kylheku	2025-04-01	3	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \| \|	This function provides a functional combinator that takes the responsibility of combining expand hooks. * eval.c (expand_hook_combine_fun, expand_hook_combine): New static functions. (eval_init): Register expand-hook-combine intrinsic. * tests/011/exphook.tl: New file. * txr.1: Documented.
*	match: new pattern matching macro, match-tuple-case.	Kaz Kylheku	2025-04-01	4	-2/+98
\| \| \| \| \| \| \| \| \| \| \| \| \|	* autolod.c (match_set_entries): Autoload match module on match-tuple-case. * match.tl (match-tuple-case): New macro. * tests/011/patmatch.tl: New tests. The macro is trivial; if lambda-match works, the macro works. * txr.1: Documented.
*	New feature: expand-hook.	Kaz Kylheku	2025-03-31	3	-1/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (expand_hook_s): New symbol variable. (do_expand): Check for expand hook in several places and call it: symbol macros, macros, functions, and forms that not confirmed function calls. (eval_init): Initialize new symbol, and register the expand-hook special variable. * eval.h (expand_hook_s): Declared. (expand_hook): New macro. * txr.1: Documented.
*	New function keep: generalized keepqual.	Kaz Kylheku	2025-03-28	5	-0/+18
\| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Register keep intrinsic. * lib.[ch] (keep): New function. * stdlib/compiler.tl (compiler comp-fun-form): Transform two argument keep to keepqual. * txr.1: Documented.
*	compiler: reduce some equal-based sequence functions.	Kaz Kylheku	2025-03-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlib/compiler.tl (compiler comp-fun-form): Recognize two-argument forms of remove, count, pos, member and subst. When these don't specify test, key or map functions, they are equivalent to remqual, countqual, posqual, memqual and subqual. These functions are a bit faster because they have no arguments to default and some of their C implementations call the equal function either directly or via a pointer, rather than via going via funcall. The exceptions are posqual and subqual which actually call pos; but even for these it is still slightly advantageous to convert to to the fixed arity function, because funcall2 doesn't have to default the optional arguments with colon_k.
*	New function: remove.	Kaz Kylheku	2025-03-27	5	-5/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need a remove function that doesn't have an equality suffix, analogous to member, pos, count. * eval.c (eval_init): Register remove intrinsic. * lib.[ch] (remov): New function. Named this way to avoid clashing with the ISO C remove function in <stdlib.h>. * tests/012/seq.tl: New tests. * txr.1: Documented.
*	case synonyms for more readable case macros.	Kaz Kylheku	2025-03-24	2	-9/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a general trend in TXR Lisp to use a nice name for the variant of a function which uses equal equality. With the case macros, we have to use casequal, ecasequal, ecasequal or ecasequal. Let's introduce synonyms for these: case, case, ecase and ecase. eval.c (case_s, case_star_s): New symbol variables. (me_case): Check for case_star_s also to determine whether we have a "star" macro. (eval_init): Initialize symbol variables, and register case, case, ecase and ecase intrinsics. * txr.1: Documented.
*	rand: rearrange code to test fixnump first.	Kaz Kylheku	2025-03-23	1	-46/+46
\| \| \| \| \| \|	* rand.c (random): The fixnump is going to come up more often than bignum, and is also fast to test for since fixnums are identified by just the tag in value word.
*	rand: use PRNG bits more economically for small moduli.	Kaz Kylheku	2025-03-21	4	-61/+225
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rand function always calls rand32 at least one for each call, even for small moduli that need only a few pseudo-random bits. For instance when the modulus is 2, the function requires only one pseudo-random bit from the PRNG, yet it takes 32 and throws away 31. With this commit, that changes. For moduli of 65536 or smaller, the bits are used more efficiently; and for a modulus of 2, the function can satisfy 32 calls using the bits of a single rand32_t word: one stepping of the WELL512a PRNG. * rand.c (struct rand_state): New member, shift. Holds the shift register for rand/random to take bits from, replenished from rand32() when it runs out. The shift register detects when it runs out of bits in a clever way, without any additional variable. The register is regarded as being 33 bits wide, with a top bit that is always 1. When the register is empty, a 32 bit word is taken from the PRNG. The required random bits are taken from the word, and it is then shifted to the right. (We take only power-of-two amounts out of the shift register: 1, 2, 4, 8 or 16 bits). Even the smallest shift produces enough room that the 33rd bit can be added to the word, into its shifted position. After that, the shift register is considered to have enough bits for a given modulus if its value is less than equal to the mask. I.e if we were to take bits from it, we would be including the unconditional signaling bit. At that point we clobber the shift register with a new set of 32 bits from the PRNG, take the random bits we need, shift it to the right and add the signaling bit. (opt_noshift): New static variable; indicates whether we are in compatibility mode, requiring the shift register optimization to be defeated. (make_random_state): Initialize shift register to 0 in several places. (random): Implement various small modulus cases. There are specific cases for moduli that are exactly 65536, 256, 16, 4, 3 and 2. The in-between cases are handled by shifting the bits in the same amounts as the next higher power of two from this list of sizes: 16, 8, or 4 bits. For these cases, we calculate the smallest Mersenne modulus which covers the bits of the actual moduls and use that for rejecting potential values, just as we do in the general large modulus case. For instance if the modulus is 60 (range 0 to 59), that lands into the 8 bit shift range: we pull 8 bits at a time from the shift register. But the modulus 60 is covered by the six bit mask 63. We mask each 8 bit value with 63, and if it is in the required range 0 to 59, we accept it, otherwise draw another 8 bits. (rand_compat_fixup): Initialize opt_noshift to 1 if the requested compat version is 299 or less. * tests/012/sort.tl: Fix one test case involving shuffled data. The shufle function uses rand with small moduli, so its behavior changes for the same PRNG sequence. * tests/013/maze.expected: Likewise, the generated pseudo-random maze in the maze test case is different now; we must update to the new expected output. * txr.1: Document that a value of 299 or less of the compatibility -C option has an effect on rand.
*	rand: eliminate small static function called only once.	Kaz Kylheku	2025-03-21	1	-8/+2
\| \| \| \| \| \|	* rand.c (make_state): Static function removed. (make_random_state): make_state logic integrated into this one and only caller.
*	New macro: letrec.	Kaz Kylheku	2025-03-21	3	-1/+211
\| \| \| \| \| \| \| \| \|	* eval.c (me_letrec): New function. (eval_init): Register letrec intrinsic macro. * tests/012/let.tl: New file. * txr.1: Documented, and also referenced from mlet.
*	doc: document that let allows (var).	Kaz Kylheku	2025-03-21	1	-4/+12
\| \| \| \| \| \| \| \| \| \|	TXR Lisp's let and let* support var, (var) and (var init-form), exactly like Common Lisp, and this has been the case for a very long time. The (var) variant is now documented. * txr.1: Show the init-form in square brackets, making it clear that it's optional. Mention the (sym) variant in the documentation below.
*	place: fix bad indentation.	Kaz Kylheku	2025-03-11	1	-4/+4
\| \| \| \| \| \|	* stdlib/place.tl (sys:placelet-1): Fix a misindented call-update-expander function call. Also indent its arguments in function style.
*	Expose brace expansion bexp function.	Kaz Kylheku	2025-03-09	4	-175/+229
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* autoload.c (glob_set_entries): Remove autoload on sys:brace-expand. Add usr:exp. * stdlib/glob.tl (brace-expand): Renamed to usr:bexp. (glob): Call bexp rather than brace-expand. tests/018/glob.tl: Rename references to sys:brace expand to bexp. * txr.1: Add section describing the bexp function. Move brace expansion documentation from glob* to this new section, adjusting the wording a little bit, mainly to avoid referring to "patterns". Point glob* documentation to bexp, which also in turn references glob*.
*	glob*: add string and integer ranges to brace expansion.	Kaz Kylheku	2025-03-08	3	-19/+229
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* stdlb/glob.tl (bexp-parse): Recognize .. as a token. (bexp-parse-brace): If a brace expansion doesn't contain commas, then check whether it contains .. and that its elements are all strings. In that case it is a possible range expansion and we thus transform it to a (- ...) node, subject to more validation in bexp-expand. (bexp-expand): Add casees to handle range expansion, taking care that invalid forms translate to verbatim syntax. * tests/018/glob.tl: New tests. * txr.1: Documented.
*	repl: fix abort: tab completion over nonexistent package.	Kaz Kylheku	2025-03-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a problem caused by b7a7370a921bbc02b54d87600ff74d5cb9efde28, on Feb 4, 2020, which adds unwinding logic into the provide_completions function. Two return statements are correctly turned into "goto out", but one is overlooked. When this stray return statement is executed, it leaves dangling unwind frames in the unwind stack, causing an assertion when control returns to the repl function which calls uw_pop_frame to remove an unwind frame that it pushed. Steps to reproduce: 1. complete a symbol with a nonexistent package 1> (foo:bar[Tab] 2. Backspace over it (or use Ctrl-U) and hit Enter: 1> txr: unwind.c:296: uw_pop_frame: Assertion `fr == uw_stack' failed. Aborted (core dumped) * parser.c (provide_completions): When find_package fails, return by executing goto out, rather than return, so that the catch frame is removed.
*	New feature: range iteration with skip.	Kaz Kylheku	2025-03-07	4	-8/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The notation X..Y..Z now denotes an iterable range, if X..Y is a valid iterable range on its own, and Z is a positive integer. Z gives a step size: 1 takes every element, 2 every other and so on. * lib.c (seq_iter_get_skip, set_iter_peek_skip): New static functions. (si_skip_ops): New static structure. (iter_dynamic): Function relocated earlier in file to avoid forward declaration. (seq_iter_init_with_info): When the iterated object is a range, check for the to element itself being a range. If so, it is potentially a skip iteration. Validate it and implement via a skip iterator referencing a dynamic range iterator. * lib.h (struct seq_iter): New sub-union member, ul.skip. We could use an existing member of type cnum; this is for naming clarity. * tests/012/iter.tl: New tests. * txr.1: Documented.
*	New function: iterp.	Kaz Kylheku	2025-03-07	5	-31/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* eval.c (eval_init): Register iterp intrinsic. * lib.[ch] (iterp): New function. * tests/012/iter.tl: New tests. * txr.1: Document iterp. Update documentation for iter-more, iter-item and iter-step to more precisely identify which objects are valid arguments in terms of iterp and additional conditions, and that other objects throw a type-error exception. Fix wrong references to iter-more under documentation for iter-item. Removed obsolete text specifying that iter-step uses car on list-like sequences, a dubious behavior removed in the previous commit.
*	iterator API: reject objects that don't make sense.	Kaz Kylheku	2025-03-06	2	-14/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* lib.c (iter_more): Do not return t for unrecognized objects, but thrown an exception. Do return t for conses, which is necessary since they are iterators for lists. Also, as a special case, we return t for struct objects that don't have an iter-more method. This is needed for the documented fast protocol. Iterator objects implementing the fast protocol still get iter-more invoked on them. The client usually doesn't know that the iterator implements the fast protocol, and so calls iter-more, which unconditionally has to returns true. (iter_item): Do not fall back on car(iter) for all unhandled objects. Only conses are handled via car. All unrecognized objects trigger an exception. (iter_step): Do not try to handle list-like objects via cdr, only lists. Improve the diagnostic for hitting the end of an improper list: diagnostic shows the cons cell rather than just the terminating atom. * tests/012/iter.tl: Some test cases validating that the functions error out for strings and vectors. Much more coverage is possible here but doesn't seem worth it; e.g. that the functions reject a buffer, regex, function, ...
*	build: don't create .build_id file for non-users of build_id.	Kaz Kylheku	2025-02-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	* Makefile: put the current build_id into the .build_id file only if it is nonblank. If it is blank, then delete the file if it exists. This handles the case when the build_id user removes the build_id. A rebuild of txr.o is forced one last time, and the .build_id is removed. Users who don't know about build_id or don't use it will no longer see a blank .build_id file being created.
*	configure: fix bad gcc verison check.	Kaz Kylheku	2025-02-18	1	-9/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* configure: The way the verison is represented in the output of gcc varies. A vendor build version may be indicated in parentheses, and that may precede or follow the version number by itself. I don't know all the variations. What I'm implementing here is looping over the white-space separated output of gcc --version, looking for an item of the form number.number.number where number is any decimal string from 0 to 99 with no leading zero, except for zero itself. If we find this item, we assume that is the gcc version, and break it up and process it as before, terminating the loop. We print the parsed gcc version in parentheses to help with spotting problems.
*	Version 299.txr-299	Kaz Kylheku	2025-02-16	6	-1198/+1257
\| \| \| \| \| \| \| \| \| \| \| \|	* RELNOTES: Updated. * configure (txr_ver): Bumped version. * stdlib/ver.tl (lib-version): Bumped. * txr.1: Bumped version and date. * txr.vim, tl.vim: Regenerated.
*	ffi: rework endian-type rput/rget routines on big endian.	Kaz Kylheku	2025-02-15	1	-24/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_be_i16_rput, ffi_be_i16_rget, ffi_be_u16_rput, ffi_be_u16_rget, ffi_be_i32_rput, ffi_be_i32_rget, ffi_be_u32_rput, ffi_be_u32_rget, ffi_le_i16_rput, ffi_le_i16_rget, ffi_le_u16_rput, ffi_le_u16_rget, ffi_le_i32_rput, ffi_le_i32_rget, ffi_le_u32_rput, ffi_le_u32_rget): Rewriten to avoid memory clearing, memsets, pointer arithmetic and use of helper functions. The big endian rput and rget functions just wrap the non-endian versions. The ones which need byte swapping work in terms of a full ffi_arg word. For instance to prepare a 16 bit big endian unsigned return value we byte swap the uin16_t, then convert fo ffi_arg.
*	ffi: big endian: broken be-int16 closure return.	Kaz Kylheku	2025-02-13	1	-1/+1
\| \| \| \| \| \| \|	* ffi.c (ffi_be_i16_rput): We need to memset the remaining parts of the 64 bit word to 0, like in all the other ffi_be_xxx_put functions that are less than 64 bits wide. Also removing the (void) tft cast is removed since tft is used.
*	ffi: big endian: broken int8 and uint8 return values.	Kaz Kylheku	2025-02-13	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \|	* ffi.c (ffi_i8_rput, ffi_i8_rget, ffi_u8_rput, ffi_u8_rget): These functions are not doing the correct job; they are just casting the pointer to the target type, like on little endian. The big endian rget must fetch the entire 64 bit word (ffi_arg) and convert its value to the target type. If it's a character value, the actual bits are found at (src + 7) not at src. The rput function must do the reverse; convert the value to the 64 bit ffi_arg and store that.
*	vm: missed cases of signal check in backwards branch	Kaz Kylheku	2025-02-07	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Only the JMP instruction is checking for a backwards branch and calling sig_check_fast() so that a loop can be interrupted by Ctrl-C. The compiler can optimize that so that a backwards jump is performed by an instruction in the IF family. * vm.c (vm_if, vm_ifq, vm_ifql): Check for a backwards branch and call sig_check_fast. Also, eliminate the redundant call to vm_insn_bigop, which is just a masking macro. The ip variable is already the result of vm_insn_bigop.
*	read-until-match: streamline get_char calls.	Kaz Kylheku	2025-01-30	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Similarly to what was done in get_csv, we optimize he use of get_char and unget_char. I see a 7.5% speed improvement in a simple benchmark of the awk macro with the record separator rs set to #/\n/. * regex.c (scan_until_common): Obtain the strm_ops operations of the stream, and pull the low level get_char and unget_char virtual operations from it. Call these directly in the loop. Thereby, we avoid all the type checking overhead in these functions.
*	get-csv: further get-char optimization.	Kaz Kylheku	2025-01-30	1	-15/+6
\| \| \| \| \| \| \| \| \| \| \|	Another 5-6% gained form this. * stream.c (us_get_char, us_unget_char): Static functions removed. (get_csv): Retrieve the get_char and unget_char pointers from the strm_ops structure outside of the loop, and then just call these pointers. Careful: the unget_char virtual has reversed parameters compared to the global function.
*	get-csv: use unsafe version string-extend.	Kaz Kylheku	2025-01-30	3	-11/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Another almost 16% speedup. * lib.c (us_length_STR): New static function. (string_extend): Use us_length_STR, since we know the object is of type STR. (us_string_extend_STR_CHR): New function. (length_str): Handle STR case via use_length_STR. * lib.h (us_string_extend_STR_CHR): Declared. * stream.c (get_csv): Use us_string_extend_STR_CHR instead of string_extend.
*	string-extend: don't use set macro to update length.	Kaz Kylheku	2025-01-30	1	-1/+1
\| \| \| \| \| \| \|	* lib.c (string_extend): We know that num_fast + delta is in the fixnum range, because we checked this condition. So we can just assign it without informing the garbage collector. This yields about a 16% speedup in get-csv.
*	awk: add CSV support.	Kaz Kylheku	2025-01-30	3	-3/+82
\| \| \| \| \| \| \| \| \| \| \|	* stdlib/awk (awk-state upd-rec-to-f): Handle a new case of fs being the keyword symbol :csv, producing a field-splitting lambda that calls get-csv. * tests/015/awk-basic.tl: Several new test cases for this CSV feature. * txr.1: Documented.
*	get-csv: speed up with unsafe get-char.	Kaz Kylheku	2025-01-30	1	-4/+17
\| \| \| \| \| \| \| \| \|	I'm seeing about a 6% improvement in get-csv from this. * stream.c (us_get_char, us_unget_char): New static functions, which assume all arguments have correct type. (get_csv): If we use source_opt, validate that it's a stream with class_check. Use us_get_char and use_unget_char.
*	cobj: optimize subclass checks based on depth 1 assumption	Kaz Kylheku	2025-01-30	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nowhere in the image do we have cobj_class inheritance deeper than one. No class has a superclass which itself has a superclass. Based on this, we can eliminate loops coded to handle the general case. * lib.c (sutypep, cobjclassp): Do not iterate to chase the chain of super pointers. Do the subclass check based on the assumption that there is at most a super pointer to class which itself then has no super. (cobj_register_super): Assert if the situation occurs that a class is registered with a super that is not a root. All these calls take place on startup, so if the assumption is wrong, the assert will be 100% reproducible.
*	vector: ensure minimum alloc size.	Kaz Kylheku	2025-01-29	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Like in a recent commit for mkstring, we impose a minimum allocation size of 6 for vectors, which means 8 cells together with the two informaton words at the base of the vector. * lib.c (vec_own): Take an alloc parameter in addition to the length, which is stored in v[vec_alloc]. (vector): Impose a minimum alloc size of 6. (copy_vec, nested_vec_of_v): Pass alloc parameter to vec_own which is the same as the length parameter; i.e. no behavior change for these functions.
*	string-extend: grow faster.	Kaz Kylheku	2025-01-29	1	-2/+2
\| \| \| \| \| \| \|	* lib.c (string_extend): When more space is needed in the string, grow by 50% rather than 25%. This speeds up code at the expense of some wasted space. Waste space can be dealt with by the final flag in programs where it matters.
*	mkstring: minimum 7 char alloc size.	Kaz Kylheku	2025-01-29	1	-2/+3
\| \| \| \| \| \|	* lib.c (mkstring): Do not allocate less than 8 characters, including null terminator, to the string. This speeds up code which builds up strings from empty, one character at a time.
*	build: remove HAVE_MALLOC_USABLE_SIZE.	Kaz Kylheku	2025-01-29	3	-66/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The malloc_usable_size use in the STR type actually makes operations like string_extend substantially slower. It is faster to store the allocated size locally. Originally, on platforms that have malloc_usable_size, we were able to use the word of memory reclaimed from the string type to store a cached hash code. But that logic was revereted in 2022, so there is no such benefit. * configure (have_malloc_usable_size): Variable removed. Test for the malloc_usable_size function removed. (HAVE_MALLOC_USABLE_SIZE, HAVE_MALLOC_NP): Do not define these preprocessor symbols. * lib.c (HAVE_MALLOC_NP_H): Do not test for this variale to include <malloc_np.h> (string-own, string, string_utf8, mkstring, mkustring, string_extend, string_finish, string_set_code, string_get_code, length_str): Eliminate #ifdefs on HAVE_MALLOC_USABLE_SIZE. * lib.h (struct wstring): Eliminate #ifdef on MALLOC_USABLE_SIZE, so alloc member is unconditionally defined on all platforms.
*	awk: use prepared lambdas for field separation.	Kaz Kylheku	2025-01-28	3	-64/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handle field separations with lambdas, similarly to record separation. The idea is that we replace the rec-to-f method, which contains a cond statement checking the variables for which field separation discipline applies, with a lambda which is updated whenever any of those ariables change. * awk.tl (awk-state): New instance slot, rec-to-f. (awk-state :postinit): Call new upd-rec-to-f method so that rec-to-f is populated with the default field separating lambda. (awk-state rec-to-f): Method removed. (awk-state upd-rec-to-f): New method, based on rec-to-f. This doesn't perform the field separation, but returns a lambda which will perform it. (awk-state loop): We must call upd-rec-to-f whenever we change par-mode, because it influences field separation. (awk-mac-let): Replace the symbol macros fs, ft, fw and kfs with new implementations that use the reactive slot mechanism provided by rslot. Whenever the awk macro assigns any of these, the upd-rec-to-f method will be called. * tests/015/awk-basic.tl: New file. These basic tests of field separation pass before and after this change. * tests/common.tl (otest, motest): New macros.
*	doc: print-flo-format: show string value in quotes.	Kaz Kylheku	2025-01-25	1	-1/+1
\| \| \| \| \|	* txr.1: The example possible value "~3,4f" should be shown as a string literal in quotes.