What is egawk
?
egawk
is Enhanced GNU Awk. It is a fork of GNU Awk with some
enhancements designed and implemented by Kaz Kylheku.
NOTE: If you have problems with egawk
or questions about it, please do
not contact the GNU Awk maintainers or the bug-gawk
mailing list at
gnu.org
. Only contact those people if you have a bug that you can
reproduce with the mainline GNU Awk. Always remember to try to produce
a minimal sample of code, and any required input, to reproduce the problem.
The @let
statement.
The @let
statement in Enhanced GNU Awk provides block-scoped lexical
variables. The syntax looks like this:
@let (x = 1, y = 3, z = f(x, y))
print z
The token sequence @
and let
introduces the statement. This is
followed by a list of variable bindings in parentheses. That list is
then followed by a statement.
The statement is executed in a scope in which the variables are visible.
As the above example shows, the bindings are established sequentially, which is
why z
can be initialized using an expression which depends on x
and y
.
A @let
variable need not have an initializer:
@let (a, b)
print a == 0 && b == "" # prints 1
Variables without an initializer are reliably initialized to the Awk
null value: the same value that is exhibited by ordinary Awk variables
that have not been assigned. This value compares equal to both 0 and
the empty string ""
under the ==
operator.
The scope of a @let
variable begins immediately after its binding,
including initializing expression, if any. The following is possible:
function f(x)
{
@let (x = x + 1)
return x
}
Here x
is initialized with an expression that uses x
. That expression
still refers to the previously visible x
; the scope of the newly
introduced x
begins after that initializing expression.
The new x
shadows the previous x
.
Restrictions
@let
variables may not have the same names as Awk's special variables such as
NF
, FS
and whatnot.
Inside a function, a @let
variable must not have the same name as the
function.
Lastly, variables may not use namespace prefixes: foo::bar
cannot be used
as a @let
variables names.
These restrictions are not new; mainline GNU Awk's function parameters have the same restrictions.
@let
may appear inside functions, as well as outside of functions in
the actions bodies of patterns, and in the BEGIN
and END
blocks:
BEGIN { @let (x = 3) ... }
/^id=/ { @let (id = ...) ... }
Rationale
Why not Javascript-like syntax?
{
let x = 3
...
}
The reason is that this syntax is not friendly toward macros. The motivation
for egawk
comes from the cppawk
project. With @let
, this sort of thing is possible:
#define repeat(n) @let (__c, __n = (n)) for (__c = 0; __c < __n; __c++)
Here, the expansion of repeat(42)
produces the structure
@let (...) for (...)
which just requires the addition of a statement
to produce a complete construct:
repeat(42) { print "hello" }
The Javascript-style syntax doesn't make it possible. We would have
to rely on the feature of declaring variables inside the for
:
for (let __c = 0, __n = (n); __c < __n; c++)
This is not attractive because it requires us to inject the let
syntax into the phrase structure of every statement type: if
,
while
, switch
. Whereas the selected design blends easily with
any statement like a prefix:
@let (x = 3)
return x
@let (x = c / 2) switch (x) {
}
The @
prefix in @let
follows a convention established by GNU Awk.
GNU has extensions like @include
for including files, and @fun(arg)
for indirect functions.
Compatibility
If you have GNU Awk code that uses let
as the name of an
indirect function, egawk
interpret that as the start of a let statement.
It's possible that no syntax error will take place, only different
behavior. This GNU Awk program produces the output 42
, because
@let()
means "call the function whose name is stored in the let
variable":
function f() { print 42 }
BEGIN { let = "f"; @let(); }
When executed with egawk
, it produces no output, because @let();
looks like an empty let statement. The superfluous semicolon satisfies
its need for a statement and so everything parses.
Implementation Notes
The implementation of @let
is different inside functions versus outside.
@let
statements outside of a function are compiled to code which
uses hidden, global variables. These variables have numbered names similar to
$let0001
. When the GNU Awk -d-
option is used to dump the symbol table,
these names show up in it.
Inside a function @let
is compiled to code which assumes that the variables
are allocated in the function's local frame. Unmodified, upstream GNU Awk
has a parameter frame which is entirely dedicated to parameter passing.
Local variables are simulated by defining additional parameters, which
is a standard Awk idiom. Enhanced GNU Awk separates the frame into a parameter
area and a locals-only area that is off-limits to the parameter passing
mechanism. The compiler extends this local-only area to accommodate all the
@let
variables that occur in the function.
Whether inside or outside a function, @let
statements allocate variables
in a stack-like fashion. Whenever a @let
scope terminates, the compiler
releases the storage locations used for that let, allowing them to be re-used
for a subsequent @let
. Thus this program allocates exactly two hidden
global variables:
BEGIN {
@let (a, b);
@let (c, d);
@let (e, f);
}
This one allocates three:
BEGIN {
@let (z) {
@let (a, b);
@let (c, d);
@let (e, f);
}
}
In order to support the dynamic (compile-time) extension of the local frame
with new local variables, I changed the representation of the function
parameter frame. Upstream gawk
has it as dynamic array of NODE
objects; I
made it a dynamic array of NODE *
pointers to individually allocated NODE
objects. Gawk's fixed array cannot be reallocated to fit the exact size,
because the NODE
addresses would change, after they have been inserted into
generated bytecode. I have an idea for solving that, which could restore
the original representation.
Enhanced GNU Awk adds one bytecode instruction called Op_clear_var
.
This is necessary to reset lexical variables. Recall from the above
paragraphs that lexicals variables whose scopes do not overlap are allocated
in the same storage. This causes several problems. When a new variable
is allocated in the space of an old one, the space contains the prior value.
That "garbage" must be cleared out. (Contrast that with the C language in
which uninitialized block-scope locals appear to have garbage values,
taking on whatever bits happen to be in the memory.) There is another
problem though, which affects even initialized variables. Awk does not
like it when a variable that holds an array is used as a scalar, or vice versa:
x[3] = 42
x = "abc" # error: array used as scalar
y = 3.14
y["foo"] = "bar" # error: scalar used as array
In standard Awk, and in GNU Awk, there is nothing that a program can do
change the variable x
such that it forgets it was an array, or to
change y
to forget that it was a scalar and work as an array.
The new Op_clear_var
opcode used by the @let
implementation in egawk
solves this problem, thanks to its access to the internal representation of
a variable.
Credits
The @let
syntax is inspired by Lisp:
(let* ((x 1)
(y 2))
...)