diff options
-rw-r--r-- | ChangeLog | 4 | ||||
-rw-r--r-- | FUTURES | 20 | ||||
-rw-r--r-- | TODO.xgawk | 11 | ||||
-rw-r--r-- | doc/ChangeLog | 5 | ||||
-rw-r--r-- | doc/awkcard.in | 46 | ||||
-rw-r--r-- | doc/gawk.1 | 29 | ||||
-rw-r--r-- | doc/gawk.info | 2204 | ||||
-rw-r--r-- | doc/gawk.texi | 1449 |
8 files changed, 1504 insertions, 2264 deletions
@@ -1,3 +1,7 @@ +2012-08-10 Arnold D. Robbins <arnold@skeeve.com> + + * FUTURES, TODO.xgawk: Updates. + 2012-08-08 Arnold D. Robbins <arnold@skeeve.com> * configure.ac: Add -DNDEBUG to remove asserts if not developing. @@ -13,17 +13,17 @@ For 4.1 ======= DONE: Merge gawk/pgawk/dgawk into one executable - Consider removing use of and/or need for the protos.h file. - - Consider moving var_value info into Node_var itself - to reduce memory usage. - DONE: Merge xmlgawk -l feature - Merge xmlgawk XML extensions + DONE: Merge xmlgawk XML extensions (via source forge project that + works with new API) DONE: Integrate MPFR to provide high precision arithmetic. + DONE: Implement designed API for loadable modules + + DONE: Redo the loadable modules interface from the awk level. + Continue code reviews / code cleanup Consider making gawk output +nan for NaN values so that it @@ -31,16 +31,16 @@ For 4.1 For 4.2 ======= - Implement designed API for loadable modules - Redo the loadable modules interface from the awk level. + Consider removing use of and/or need for the protos.h file. + + Consider moving var_value info into Node_var itself + to reduce memory usage. Rework management of array index storage. (Partially DONE.) DBM storage of awk arrays. Try to allow multiple dbm packages. - ? Move the loadable modules interface to libtool. - ? Add an optional base to strtonum, allowing 2-36. ? Optional third argument for index indicating where to start the @@ -3,11 +3,6 @@ To-do list for xgawk enhancements: - Attempting to load the same file with -f and -i (or @include) should be a fatal error. -- Review open hook implementation. - * Mostly done. - * Still to go: Rework iop_alloc, interaction with open hooks, and - skipping command line directories. - Low priority: - Enhance extension/fork.c waitpid to allow the caller to specify the options. @@ -140,3 +135,9 @@ Done: - MPFR. This is probably not useful now that MPFR support has been integrated into gawk. Are there any users who need this extension? + +- Review open hook implementation. + * Mostly done. + * Still to go: Rework iop_alloc, interaction with open hooks, and + skipping command line directories. + diff --git a/doc/ChangeLog b/doc/ChangeLog index 65907bc1..32ef1a1c 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,8 @@ +2012-08-10 Arnold D. Robbins <arnold@skeeve.com> + + * awkcard.in, gawk.1, gawk.texi: Updated. Mostly for new API stuff + but also some other things. + 2012-08-01 Arnold D. Robbins <arnold@skeeve.com> * Makefile.am (install-data-hook): Install a dgawk.1 link to the diff --git a/doc/awkcard.in b/doc/awkcard.in index d0c1578a..9615b58e 100644 --- a/doc/awkcard.in +++ b/doc/awkcard.in @@ -271,6 +271,8 @@ for localization. .TI "\*(FC\-h\*(FR, \*(FC\-\^\-help\*(FR Print a short summary of the available options on \*(FCstdout\*(FR, then exit zero. +.TI "\*(FC\-i \*(FIfile\*(FR, \*(FC\-\^\-include \*(FIfile\*(FR +Include library AWK code in \*(FIfile\*(FR. .TI "\*(FC\-l \*(FIlib\*(FR, \*(FC\-\^\-load \*(FIlib\*(FR Load dynamic extension \*(FIlib\fP. .TI "\*(FC\-L \*(FR[\*(FC\*(FIvalue\*(FR], \*(FC\-\^\-lint\*(FR[\*(FC=\*(FIvalue\*(FR] @@ -300,13 +302,7 @@ Send profiling data to \*(FIfile\*(FR The profile contains execution counts in the left margin of each statement in the program. .TI "\*(FC\-P\*(FR, \*(FC\-\^\-posix\*(FR -Disable common and GNU extensions. -.TI "\*(FC\-r\*(FR, \*(FC\-\^\-re\-interval\*(FR -Enable \*(FIinterval expressions\*(FR.\*(CB -... in regular -... expression matching (see \fHRegular -... Expressions\fP below). Useful if -... \*(FC\-\^\-traditional\*(FR is specified +Disable common and GNU extensions.\*(CB .in -4n .EB "\s+2\f(HBCOMMAND LINE ARGUMENTS (\*(GK\f(HB)\*(FR\s0" @@ -318,6 +314,12 @@ Enable \*(FIinterval expressions\*(FR.\*(CB .ES .fi .in +4n +.TI "\*(FC\-r\*(FR, \*(FC\-\^\-re\-interval\*(FR +Enable \*(FIinterval expressions\*(FR. +... in regular +... expression matching (see \fHRegular +... Expressions\fP below). Useful if +... \*(FC\-\^\-traditional\*(FR is specified .TI "\*(FC\-S\*(FR, \*(FC\-\^\-sandbox\*(FR Disable the \*(FCsystem()\*(FR function, input redirection with \*(FCgetline\*(FR, @@ -342,7 +344,7 @@ options are passed on to the AWK program in \*(FCARGV\*(FR for processing.\*(CB .EB "\s+2\f(HBCOMMAND LINE ARGUMENTS (\*(GK\f(HB)\*(FR\s0" - +.sp .4 .\" .\" .\" --- Command Line Arguments (mawk) @@ -454,7 +456,7 @@ The program text is read as if all the \*(FIprog-file\*(FR(s) \*(CBand command line source texts\*(CD had been concatenated. .sp -\*(GK includes files named on \*(FC@include\*(FR lines. +\*(CB\*(GK includes files named on \*(FC@include\*(FR lines. Nested includes are allowed.\*(CD .sp .5 AWK programs execute in the following order. @@ -1141,7 +1143,10 @@ The default path is If a file name given to the \*(FC\-f\fP option contains a ``/'' character, no path search is performed. .sp .5 -.PP +The variable \*(FCAWKLIBPATH\fP +specifies the search path for dynamic extensions to use +with \*(FC@load\fP and the \*(FC\-l\fP option. +.sp .5 For socket communication, \*(FCGAWK_SOCK_RETRIES\fP controls the number of retries, and @@ -1151,6 +1156,10 @@ The interval is in milliseconds. On systems that do not support \*(FIusleep\fP(3), the value is rounded up to an integral number of seconds. .sp .5 +The value of \*(FCGAWK_READ_TIMEOUT\fP specifies the time, in milliseconds, +for \*(GK to +wait for input before returning with an error. +.sp .5 If \*(FCPOSIXLY_CORRECT\fP exists .\" in the environment, then \*(GK @@ -1845,16 +1854,15 @@ Return the bitwise XOR of the arguments.\*(CB .fi .in +.2i .ti -.2i -\*(CD\*(FCextension(\*(FIlib\*(FC, \*(FIfunc\*(FC)\*(FR +\*(CD\*(FC@load "\*(FIextension\*(FC"\*(FR .br -Dynamically load the shared library -\*(FIlib\*(FR -and call -\*(FIfunc\*(FR -in it to initialize the library. +Dynamically load the named \*(FIextension\*(FR. This adds new built-in functions to \*(GK. -It returns the value returned by -\*(FIfunc\*(FR.\*(CB +.\" The extension should use the API defined by the +.\" \*(FCgawkapi.h\*(FR header file, as documented in +.\" the full manual. +The extension is loaded during the parsing of the program. +See the manual for details.\*(CB .in -.2i .EB "\s+2\f(HBDYNAMIC EXTENSIONS (\*(GK\f(HB)\*(FR\s0" .BT @@ -1955,7 +1963,7 @@ maintains it.\*(CX .ES .fi \*(CDCopyright \(co 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, -2007, 2009, 2010, 2011 Free Software Foundation, Inc. +2007, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. .sp .5 Permission is granted to make and distribute verbatim copies of this reference card provided the copyright notice and this permission notice @@ -14,7 +14,7 @@ . if \w'\(rq' .ds rq "\(rq . \} .\} -.TH GAWK 1 "Nov 10 2011" "Free Software Foundation" "Utility Commands" +.TH GAWK 1 "Aug 09 2012" "Free Software Foundation" "Utility Commands" .SH NAME gawk \- pattern scanning and processing language .SH SYNOPSIS @@ -3181,24 +3181,11 @@ may be used in place of .SH DYNAMICALLY LOADING NEW FUNCTIONS You can dynamically add new built-in functions to the running .I gawk -interpreter. +interpreter with the +.B @load +statement. The full details are beyond the scope of this manual page; -see \*(EP for the details. -.PP -.TP 8 -\fBextension(\fIobject\fB, \fIfunction\fB)\fR -Dynamically link the shared object file named by -.IR object , -and invoke -.I function -in that object, to perform initialization. -These should both be provided as strings. -Return the value returned by -.IR function . -.PP -Using this feature at the C level is not pretty, but -it is unlikely to go away. Additional mechanisms may -be added at some point. +see \*(EP. .SH SIGNALS The .I gawk @@ -3727,7 +3714,7 @@ status is 2. On non-POSIX systems, this value may be mapped to .SH VERSION INFORMATION This man page documents .IR gawk , -version 4.0. +version 4.1. .SH AUTHORS The original version of \*(UX .I awk @@ -3805,6 +3792,7 @@ While the developers occasionally read this newsgroup, posting bug reports there is an unreliable way to report bugs. Instead, please use the electronic mail addresses given above. +Really. .PP If you're using a GNU/Linux or BSD-based system, you may wish to submit a bug report to the vendor of your distribution. @@ -3824,6 +3812,7 @@ are surprisingly difficult to diagnose in the completely general case, and the effort to do so really is not worth it. .SH SEE ALSO .IR egrep (1), +.IR sed (1), .IR getpid (2), .IR getppid (2), .IR getpgrp (2), @@ -3839,7 +3828,7 @@ Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, Addison-Wesley, 1988. ISBN 0-201-07981-X. .PP \*(EP, -Edition 4.0, shipped with the +Edition 4.1, shipped with the .I gawk source. The current version of this document is available online at diff --git a/doc/gawk.info b/doc/gawk.info index bcc773d6..65bf903c 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -97,12 +97,14 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Sample Programs:: Many `awk' programs with complete explanations. * Debugger:: The `gawk' debugger. +* Dynamic Extensions:: Adding new built-in functions to + `gawk'. * Language History:: The evolution of the `awk' language. * Installation:: Installing `gawk' under various operating systems. -* Notes:: Notes about `gawk' extensions and - possible future work. +* Notes:: Notes about adding things to `gawk' + and possible future work. * Basic Concepts:: A very quick introduction to programming concepts. * Glossary:: An explanation of some unfamiliar terms. @@ -359,21 +361,22 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * I18N Portability:: `awk'-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: `gawk' is also internationalized. -* Floating-point Programming:: Effective floating-point programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Arbitrary Precision Floats:: Arbitrary precision floating-point - arithmetic with `gawk'. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point numbers. -* Integer Programming:: Effective integer programming. -* Arbitrary Precision Integers:: Arbitrary precision integer - arithmetic with `gawk'. -* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with `gawk'. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point + Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + `gawk'. +* MPFR and GMP Libraries :: * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -438,14 +441,14 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. -* Debugging:: Introduction to `gawk' Debugger. +* Debugging:: Introduction to `gawk' debugger. * Debugging Concepts:: Debugging in General. * Debugging Terms:: Additional Debugging Concepts. * Awk Debugging:: Awk Debugging. -* Sample Debugging Session:: Sample Debugging Session. +* Sample Debugging Session:: Sample debugging session. * Debugger Invocation:: How to Start the Debugger. * Finding The Bug:: Finding the Bug. -* List of Debugger Commands:: Main Commands. +* List of Debugger Commands:: Main debugger commands. * Breakpoint Control:: Control of Breakpoints. * Debugger Execution Control:: Control of Execution. * Viewing And Changing Data:: Viewing and Changing Data. @@ -453,8 +456,13 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) * Debugger Info:: Obtaining Information about the Program and the Debugger State. * Miscellaneous Debugger Commands:: Miscellaneous Commands. -* Readline Support:: Readline Support. -* Limitations:: Limitations and Future Plans. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. * SVR4:: Minor changes between System V Releases 3.1 @@ -505,16 +513,6 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) `gawk'. * New Ports:: Porting `gawk' to a new operating system. -* Dynamic Extensions:: Adding new built-in functions to - `gawk'. -* Internals:: A brief look at some `gawk' - internals. -* Plugin License:: A note about licensing. -* Loading Extensions:: How to load dynamic extensions. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. * Future Extensions:: New features that may be implemented one day. * Basic High Level:: The high level view. @@ -883,8 +881,8 @@ non-POSIX systems. It also describes how to report bugs in `gawk' and where to get other freely available `awk' implementations. *note Notes::, describes how to disable `gawk''s extensions, as well -as how to contribute new code to `gawk', how to write extension -libraries, and some possible future directions for `gawk' development. +as how to contribute new code to `gawk', and some possible future +directions for `gawk' development. *note Basic Concepts::, provides some very cursory background material for those who are completely unfamiliar with computer @@ -2594,8 +2592,8 @@ A number of environment variables influence how `gawk' behaves. * AWKPATH Variable:: Searching directories for `awk' programs. -* AWKLIBPATH Variable:: Searching directories for `awk' - shared libraries. +* AWKLIBPATH Variable:: Searching directories for `awk' shared + libraries. * Other Environment Variables:: The environment variables. @@ -3737,7 +3735,6 @@ have to be named on the `awk' command line (*note Getline::). * Getline:: Reading files under explicit program control using the `getline' function. * Read Timeout:: Reading input with a timeout. - * Command line directories:: What happens if you put a directory on the command line. @@ -8520,10 +8517,10 @@ would otherwise be difficult or impossible to perform: entirely. Otherwise, `gawk' exits with the usual fatal error. * If you have written extensions that modify the record handling (by - inserting an "open hook"), you can invoke them at this point, + inserting an "input parser"), you can invoke them at this point, before `gawk' has started processing the file. (This is a _very_ - advanced feature, currently used only by the XMLgawk project - (http://xmlgawk.sourceforge.net).) + advanced feature, currently used only by the `gawkextlib' project + (http://gawkextlib.sourceforge.net).) The `ENDFILE' rule is called when `gawk' has finished processing the last record in an input file. For the last input file, it will be @@ -13771,21 +13768,22 @@ numbers. * Menu: -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary Floating-point Representation. -* Floating-point Context:: Floating-point Context. -* Rounding Mode:: Floating-point Rounding Mode. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with `gawk'. -* Setting Precision:: Setting the Working Precision. -* Setting Rounding Mode:: Setting the Rounding Mode. -* Floating-point Constants:: Representing Floating-point Constants. -* Changing Precision:: Changing the Precision of a Number. -* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers. -* Integer Programming:: Effective Integer Programming. -* Arbitrary Precision Integers:: Arbitrary Precision Integer - Arithmetic with `gawk'. -* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with `gawk'. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point + Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + `gawk'. +* MPFR and GMP Libraries :: ---------- Footnotes ---------- @@ -19689,7 +19687,7 @@ supplies the following copyright terms: We leave it to you to determine what the program does. -File: gawk.info, Node: Debugger, Next: Language History, Prev: Sample Programs, Up: Top +File: gawk.info, Node: Debugger, Next: Dynamic Extensions, Prev: Sample Programs, Up: Top 15 Debugging `awk' Programs *************************** @@ -20741,7 +20739,400 @@ features may be added, and of course feel free to try to add them yourself! -File: gawk.info, Node: Language History, Next: Installation, Prev: Debugger, Up: Top +File: gawk.info, Node: Dynamic Extensions, Next: Language History, Prev: Debugger, Up: Top + +16 Writing Extensions for `gawk' +******************************** + +This chapter is a placeholder, pending a rewrite for the new API. Some +of the old bits remain, since they can be partially reused. + + It is possible to add new built-in functions to `gawk' using +dynamically loaded libraries. This facility is available on systems +(such as GNU/Linux) that support the C `dlopen()' and `dlsym()' +functions. This major node describes how to write and use dynamically +loaded extensions for `gawk'. Experience with programming in C or C++ +is necessary when reading this minor node. + + NOTE: When `--sandbox' is specified, extensions are disabled + (*note Options::. + +* Menu: + +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. + + +File: gawk.info, Node: Plugin License, Next: Sample Library, Up: Dynamic Extensions + +16.1 Extension Licensing +======================== + +Every dynamic extension should define the global symbol +`plugin_is_GPL_compatible' to assert that it has been licensed under a +GPL-compatible license. If this symbol does not exist, `gawk' will +emit a fatal error and exit. + + The declared type of the symbol should be `int'. It does not need +to be in any allocated section, though. The code merely asserts that +the symbol exists in the global scope. Something like this is enough: + + int plugin_is_GPL_compatible; + + +File: gawk.info, Node: Sample Library, Prev: Plugin License, Up: Dynamic Extensions + +16.2 Example: Directory and File Operation Built-ins +==================================================== + +Two useful functions that are not in `awk' are `chdir()' (so that an +`awk' program can change its directory) and `stat()' (so that an `awk' +program can gather information about a file). This minor node +implements these functions for `gawk' in an external extension library. + +* Menu: + +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. + + +File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library + +16.2.1 Using `chdir()' and `stat()' +----------------------------------- + +This minor node shows how to use the new functions at the `awk' level +once they've been integrated into the running `gawk' interpreter. +Using `chdir()' is very straightforward. It takes one argument, the new +directory to change to: + + ... + newdir = "/home/arnold/funstuff" + ret = chdir(newdir) + if (ret < 0) { + printf("could not change to %s: %s\n", + newdir, ERRNO) > "/dev/stderr" + exit 1 + } + ... + + The return value is negative if the `chdir' failed, and `ERRNO' +(*note Built-in Variables::) is set to a string indicating the error. + + Using `stat()' is a bit more complicated. The C `stat()' function +fills in a structure that has a fair amount of information. The right +way to model this in `awk' is to fill in an associative array with the +appropriate information: + + file = "/home/arnold/.profile" + fdata[1] = "x" # force `fdata' to be an array + ret = stat(file, fdata) + if (ret < 0) { + printf("could not stat %s: %s\n", + file, ERRNO) > "/dev/stderr" + exit 1 + } + printf("size of %s is %d bytes\n", file, fdata["size"]) + + The `stat()' function always clears the data array, even if the +`stat()' fails. It fills in the following elements: + +`"name"' + The name of the file that was `stat()''ed. + +`"dev"' +`"ino"' + The file's device and inode numbers, respectively. + +`"mode"' + The file's mode, as a numeric value. This includes both the file's + type and its permissions. + +`"nlink"' + The number of hard links (directory entries) the file has. + +`"uid"' +`"gid"' + The numeric user and group ID numbers of the file's owner. + +`"size"' + The size in bytes of the file. + +`"blocks"' + The number of disk blocks the file actually occupies. This may not + be a function of the file's size if the file has holes. + +`"atime"' +`"mtime"' +`"ctime"' + The file's last access, modification, and inode update times, + respectively. These are numeric timestamps, suitable for + formatting with `strftime()' (*note Built-in::). + +`"pmode"' + The file's "printable mode." This is a string representation of + the file's type and permissions, such as what is produced by `ls + -l'--for example, `"drwxr-xr-x"'. + +`"type"' + A printable string representation of the file's type. The value + is one of the following: + + `"blockdev"' + `"chardev"' + The file is a block or character device ("special file"). + + `"directory"' + The file is a directory. + + `"fifo"' + The file is a named-pipe (also known as a FIFO). + + `"file"' + The file is just a regular file. + + `"socket"' + The file is an `AF_UNIX' ("Unix domain") socket in the + filesystem. + + `"symlink"' + The file is a symbolic link. + + Several additional elements may be present depending upon the +operating system and the type of the file. You can test for them in +your `awk' program by using the `in' operator (*note Reference to +Elements::): + +`"blksize"' + The preferred block size for I/O to the file. This field is not + present on all POSIX-like systems in the C `stat' structure. + +`"linkval"' + If the file is a symbolic link, this element is the name of the + file the link points to (i.e., the value of the link). + +`"rdev"' +`"major"' +`"minor"' + If the file is a block or character device file, then these values + represent the numeric device number and the major and minor + components of that number, respectively. + + +File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library + +16.2.2 C Code for `chdir()' and `stat()' +---------------------------------------- + +Here is the C code for these extensions. They were written for +GNU/Linux. The code needs some more work for complete portability to +other POSIX-compliant systems:(1) + + #include "awk.h" + + #include <sys/sysmacros.h> + + int plugin_is_GPL_compatible; + + /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ + + static NODE * + do_chdir(int nargs) + { + NODE *newdir; + int ret = -1; + + if (do_lint && nargs != 1) + lintwarn("chdir: called with incorrect number of arguments"); + + newdir = get_scalar_argument(0, FALSE); + + The file includes the `"awk.h"' header file for definitions for the +`gawk' internals. It includes `<sys/sysmacros.h>' for access to the +`major()' and `minor'() macros. + + By convention, for an `awk' function `foo', the function that +implements it is called `do_foo'. The function should take a `int' +argument, usually called `nargs', that represents the number of defined +arguments for the function. The `newdir' variable represents the new +directory to change to, retrieved with `get_scalar_argument()'. Note +that the first argument is numbered zero. + + This code actually accomplishes the `chdir()'. It first forces the +argument to be a string and passes the string value to the `chdir()' +system call. If the `chdir()' fails, `ERRNO' is updated. + + (void) force_string(newdir); + ret = chdir(newdir->stptr); + if (ret < 0) + update_ERRNO_int(errno); + + Finally, the function returns the return value to the `awk' level: + + return make_number((AWKNUM) ret); + } + + The `stat()' built-in is more involved. First comes a function that +turns a numeric mode into a printable representation (e.g., 644 becomes +`-rw-r--r--'). This is omitted here for brevity: + + /* format_mode --- turn a stat mode field into something readable */ + + static char * + format_mode(unsigned long fmode) + { + ... + } + + Next comes the `do_stat()' function. It starts with variable +declarations and argument checking: + + /* do_stat --- provide a stat() function for gawk */ + + static NODE * + do_stat(int nargs) + { + NODE *file, *array, *tmp; + struct stat sbuf; + int ret; + NODE **aptr; + char *pmode; /* printable mode */ + char *type = "unknown"; + + if (do_lint && nargs > 2) + lintwarn("stat: called with too many arguments"); + + Then comes the actual work. First, the function gets the arguments. +Then, it always clears the array. The code use `lstat()' (instead of +`stat()') to get the file information, in case the file is a symbolic +link. If there's an error, it sets `ERRNO' and returns: + + /* file is first arg, array to hold results is second */ + file = get_scalar_argument(0, FALSE); + array = get_array_argument(1, FALSE); + + /* empty out the array */ + assoc_clear(array); + + /* lstat the file, if error, set ERRNO and return */ + (void) force_string(file); + ret = lstat(file->stptr, & sbuf); + if (ret < 0) { + update_ERRNO_int(errno); + return make_number((AWKNUM) ret); + } + + Now comes the tedious part: filling in the array. Only a few of the +calls are shown here, since they all follow the same pattern: + + /* fill in the array */ + aptr = assoc_lookup(array, tmp = make_string("name", 4)); + *aptr = dupnode(file); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("mode", 4)); + *aptr = make_number((AWKNUM) sbuf.st_mode); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); + pmode = format_mode(sbuf.st_mode); + *aptr = make_string(pmode, strlen(pmode)); + unref(tmp); + + When done, return the `lstat()' return value: + + + return make_number((AWKNUM) ret); + } + + Finally, it's necessary to provide the "glue" that loads the new +function(s) into `gawk'. By convention, each library has a routine +named `dl_load()' that does the job. The simplest way is to use the +`dl_load_func' macro in `gawkapi.h'. + + And that's it! As an exercise, consider adding functions to +implement system calls such as `chown()', `chmod()', and `umask()'. + + ---------- Footnotes ---------- + + (1) This version is edited slightly for presentation. See +`extension/filefuncs.c' in the `gawk' distribution for the complete +version. + + +File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library + +16.2.3 Integrating the Extensions +--------------------------------- + +Now that the code is written, it must be possible to add it at runtime +to the running `gawk' interpreter. First, the code must be compiled. +Assuming that the functions are in a file named `filefuncs.c', and IDIR +is the location of the `gawk' include files, the following steps create +a GNU/Linux shared library: + + $ gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -IIDIR filefuncs.c + $ ld -o filefuncs.so -shared filefuncs.o + + Once the library exists, it is loaded by calling the `extension()' +built-in function. This function takes two arguments: the name of the +library to load and the name of a function to call when the library is +first loaded. This function adds the new functions to `gawk'. It +returns the value returned by the initialization function within the +shared library: + + # file testff.awk + BEGIN { + extension("./filefuncs.so", "dl_load") + + chdir(".") # no-op + + data[1] = 1 # force `data' to be an array + print "Info for testff.awk" + ret = stat("testff.awk", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "testff.awk modified:", + strftime("%m %d %y %H:%M:%S", data["mtime"]) + + print "\nInfo for JUNK" + ret = stat("JUNK", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) + } + + Here are the results of running the program: + + $ gawk -f testff.awk + -| Info for testff.awk + -| ret = 0 + -| data["size"] = 607 + -| data["ino"] = 14945891 + -| data["name"] = testff.awk + -| data["pmode"] = -rw-rw-r-- + -| data["nlink"] = 1 + -| data["atime"] = 1293993369 + -| data["mtime"] = 1288520752 + -| data["mode"] = 33204 + -| data["blksize"] = 4096 + -| data["dev"] = 2054 + -| data["type"] = file + -| data["gid"] = 500 + -| data["uid"] = 500 + -| data["blocks"] = 8 + -| data["ctime"] = 1290113572 + -| testff.awk modified: 10 31 10 12:25:52 + -| + -| Info for JUNK + -| ret = -1 + -| JUNK modified: 01 01 70 02:00:00 + + +File: gawk.info, Node: Language History, Next: Installation, Prev: Dynamic Extensions, Up: Top Appendix A The Evolution of the `awk' Language ********************************************** @@ -21030,9 +21421,6 @@ the current version of `gawk'. - The `bindtextdomain()', `dcgettext()' and `dcngettext()' functions for internationalization (*note Programmer i18n::). - - The `extension()' built-in function and the ability to add - new functions dynamically (*note Dynamic Extensions::). - - The `fflush()' function from Brian Kernighan's version of `awk' (*note I/O Functions::). @@ -21051,11 +21439,13 @@ the current version of `gawk'. search for the `-l' command-line option (*note Options::). - The ability to use GNU-style long-named options that start - with `--' and the `--characters-as-bytes', `--compat', - `--dump-variables', `--exec', `--gen-pot', `--lint', - `--lint-old', `--non-decimal-data', `--posix', `--profile', - `--re-interval', `--sandbox', `--source', `--traditional', and - `--use-lc-numeric' options (*note Options::). + with `--' and the `--bignum', `--characters-as-bytes', + `--copyright', `--debug', `--dump-variables', `--exec', + `--gen-pot', `--include', `--lint', `--lint-old', `--load', + `--non-decimal-data', `--optimize', `--posix', + `--pretty-print', `--profile', `--re-interval', `--sandbox', + `--source', `--traditional', and `--use-lc-numeric' options + (*note Options::). * Support for the following obsolete systems was removed from the code and the documentation for `gawk' version 4.0: @@ -21277,7 +21667,7 @@ Info file, in approximate chronological order: various PC platforms. * Christos Zoulas provided the `extension()' built-in function for - dynamically adding new modules. + dynamically adding new modules. (This was removed at `gawk' 4.1.) * Ju"rgen Kahrs contributed the initial version of the TCP/IP networking code and documentation, and motivated the inclusion of @@ -22400,8 +22790,6 @@ and maintainers of `gawk'. Everything in it applies specifically to * Compatibility Mode:: How to disable certain `gawk' extensions. * Additions:: Making Additions To `gawk'. -* Dynamic Extensions:: Adding new built-in functions to - `gawk'. * Future Extensions:: New features that may be implemented one day. @@ -22428,7 +22816,7 @@ for the casual user. It probably has not even been compiled into your version of `gawk', since it slows down execution. -File: gawk.info, Node: Additions, Next: Dynamic Extensions, Prev: Compatibility Mode, Up: Notes +File: gawk.info, Node: Additions, Next: Future Extensions, Prev: Compatibility Mode, Up: Notes C.2 Making Additions to `gawk' ============================== @@ -22597,9 +22985,10 @@ possible to include your changes: 7. Submit changes as unified diffs. Use `diff -u -r -N' to compare the original `gawk' source tree with your version. I recommend - using the GNU version of `diff'. Send the output produced by - either run of `diff' to me when you submit your changes. (*Note - Bugs::, for the electronic mail information.) + using the GNU version of `diff', or best of all, `git diff' or + `git format-patch'. Send the output produced by `diff' to me when + you submit your changes. (*Note Bugs::, for the electronic mail + information.) Using this format makes it easy for me to apply your changes to the master version of the `gawk' source code (using `patch'). If I @@ -22698,661 +23087,9 @@ code that is already there. style and brace layout that suits your taste. -File: gawk.info, Node: Dynamic Extensions, Next: Future Extensions, Prev: Additions, Up: Notes - -C.3 Adding New Built-in Functions to `gawk' -=========================================== - - Danger Will Robinson! Danger!! - Warning! Warning! - The Robot - - It is possible to add new built-in functions to `gawk' using -dynamically loaded libraries. This facility is available on systems -(such as GNU/Linux) that support the C `dlopen()' and `dlsym()' -functions. This minor node describes how to write and use dynamically -loaded extensions for `gawk'. Experience with programming in C or C++ -is necessary when reading this minor node. - - CAUTION: The facilities described in this minor node are very much - subject to change in a future `gawk' release. Be aware that you - may have to re-do everything, at some future time. - - If you have written your own dynamic extensions, be sure to - recompile them for each new `gawk' release. There is no guarantee - of binary compatibility between different releases, nor will there - ever be such a guarantee. - - NOTE: When `--sandbox' is specified, extensions are disabled - (*note Options::. - -* Menu: - -* Internals:: A brief look at some `gawk' internals. -* Plugin License:: A note about licensing. -* Loading Extensions:: How to load dynamic extensions. -* Sample Library:: A example of new functions. - - -File: gawk.info, Node: Internals, Next: Plugin License, Up: Dynamic Extensions - -C.3.1 A Minimal Introduction to `gawk' Internals ------------------------------------------------- - -The truth is that `gawk' was not designed for simple extensibility. -The facilities for adding functions using shared libraries work, but -are something of a "bag on the side." Thus, this tour is brief and -simplistic; would-be `gawk' hackers are encouraged to spend some time -reading the source code before trying to write extensions based on the -material presented here. Of particular note are the files `awk.h', -`builtin.c', and `eval.c'. Reading `awkgram.y' in order to see how the -parse tree is built would also be of use. - - With the disclaimers out of the way, the following types, structure -members, functions, and macros are declared in `awk.h' and are of use -when writing extensions. The next minor node shows how they are used: - -`AWKNUM' - An `AWKNUM' is the internal type of `awk' floating-point numbers. - Typically, it is a C `double'. - -`NODE' - Just about everything is done using objects of type `NODE'. These - contain both strings and numbers, as well as variables and arrays. - -`AWKNUM force_number(NODE *n)' - This macro forces a value to be numeric. It returns the actual - numeric value contained in the node. It may end up calling an - internal `gawk' function. - -`void force_string(NODE *n)' - This macro guarantees that a `NODE''s string value is current. It - may end up calling an internal `gawk' function. It also - guarantees that the string is zero-terminated. - -`void force_wstring(NODE *n)' - Similarly, this macro guarantees that a `NODE''s wide-string value - is current. It may end up calling an internal `gawk' function. - It also guarantees that the wide string is zero-terminated. - -`nargs' - Inside an extension function, this is the actual number of - parameters passed to the current function. - -`n->stptr' -`n->stlen' - The data and length of a `NODE''s string value, respectively. The - string is _not_ guaranteed to be zero-terminated. If you need to - pass the string value to a C library function, save the value in - `n->stptr[n->stlen]', assign `'\0'' to it, call the routine, and - then restore the value. - -`n->wstptr' -`n->wstlen' - The data and length of a `NODE''s wide-string value, respectively. - Use `force_wstring()' to make sure these values are current. - -`n->type' - The type of the `NODE'. This is a C `enum'. Values should be one - of `Node_var', `Node_var_new', or `Node_var_array' for function - parameters. - -`n->vname' - The "variable name" of a node. This is not of much use inside - externally written extensions. - -`void assoc_clear(NODE *n)' - Clears the associative array pointed to by `n'. Make sure that - `n->type == Node_var_array' first. - -`NODE **assoc_lookup(NODE *symbol, NODE *subs)' - Finds, and installs if necessary, array elements. `symbol' is the - array, `subs' is the subscript. This is usually a value created - with `make_string()' (see below). - -`NODE *make_string(char *s, size_t len)' - Take a C string and turn it into a pointer to a `NODE' that can be - stored appropriately. This is permanent storage; understanding of - `gawk' memory management is helpful. - -`NODE *make_number(AWKNUM val)' - Take an `AWKNUM' and turn it into a pointer to a `NODE' that can - be stored appropriately. This is permanent storage; understanding - of `gawk' memory management is helpful. - -`NODE *dupnode(NODE *n)' - Duplicate a node. In most cases, this increments an internal - reference count instead of actually duplicating the entire `NODE'; - understanding of `gawk' memory management is helpful. - -`void unref(NODE *n)' - This macro releases the memory associated with a `NODE' allocated - with `make_string()' or `make_number()'. Understanding of `gawk' - memory management is helpful. - -`void make_builtin(const char *name, NODE *(*func)(NODE *), int count)' - Register a C function pointed to by `func' as new built-in - function `name'. `name' is a regular C string. `count' is the - maximum number of arguments that the function takes. The function - should be written in the following manner: - - /* do_xxx --- do xxx function for gawk */ - - NODE * - do_xxx(int nargs) - { - ... - } - -`NODE *get_argument(int i)' - This function is called from within a C extension function to get - the `i'-th argument from the function call. The first argument is - argument zero. - -`NODE *get_actual_argument(int i,' -` int optional, int wantarray);' - This function retrieves a particular argument `i'. `wantarray' is - `TRUE' if the argument should be an array, `FALSE' otherwise. If - `optional' is `TRUE', the argument need not have been supplied. - If it wasn't, the return value is `NULL'. It is a fatal error if - `optional' is `TRUE' but the argument was not provided. - -`get_scalar_argument(i, opt)' - This is a convenience macro that calls `get_actual_argument()'. - -`get_array_argument(i, opt)' - This is a convenience macro that calls `get_actual_argument()'. - -`void update_ERRNO_int(int errno_saved)' - This function is called from within a C extension function to set - the value of `gawk''s `ERRNO' variable, based on the error value - provided as the argument. It is provided as a convenience. - -`void update_ERRNO_string(const char *string, enum errno_translate)' - This function is called from within a C extension function to set - the value of `gawk''s `ERRNO' variable to a given string. The - second argument determines whether the string is translated before - being installed into `ERRNO'. It is provided as a convenience. - -`void unset_ERRNO(void)' - This function is called from within a C extension function to set - the value of `gawk''s `ERRNO' variable to a null string. It is - provided as a convenience. - -`void register_deferred_variable(const char *name, NODE *(*load_func)(void))' - This function is called to register a function to be called when a - reference to an undefined variable with the given name is - encountered. The callback function will never be called if the - variable exists already, so, unless the calling code is running at - program startup, it should first check whether a variable of the - given name already exists. The argument function must return a - pointer to a `NODE' containing the newly created variable. This - function is used to implement the builtin `ENVIRON' and `PROCINFO' - arrays, so you can refer to them for examples. - -`void register_open_hook(void *(*open_func)(IOBUF *))' - This function is called to register a function to be called - whenever a new data file is opened, leading to the creation of an - `IOBUF' structure in `iop_alloc()'. After creating the new - `IOBUF', `iop_alloc()' will call (in reverse order of - registration, so the last function registered is called first) - each open hook until one returns non-`NULL'. If any hook returns - a non-`NULL' value, that value is assigned to the `IOBUF''s - `opaque' field (which will presumably point to a structure - containing additional state associated with the input processing), - and no further open hooks are called. - - The function called will most likely want to set the `IOBUF''s - `get_record' method to indicate that future input records should - be retrieved by calling that method instead of using the standard - `gawk' input processing. - - And the function will also probably want to set the `IOBUF''s - `close_func' method to be called when the file is closed to clean - up any state associated with the input. - - Finally, hook functions should be prepared to receive an `IOBUF' - structure where the `fd' field is set to `INVALID_HANDLE', meaning - that `gawk' was not able to open the file itself. In this case, - the hook function must be able to successfully open the file and - place a valid file descriptor there. - - Currently, for example, the hook function facility is used to - implement the XML parser shared library extension. For more info, - please look in `awk.h' and in `io.c'. - - An argument that is supposed to be an array needs to be handled with -some extra code, in case the array being passed in is actually from a -function parameter. - - The following boilerplate code shows how to do this: - - NODE *the_arg; - - /* assume need 3rd arg, 0-based */ - the_arg = get_array_argument(2, FALSE); - - Again, you should spend time studying the `gawk' internals; don't -just blindly copy this code. - - -File: gawk.info, Node: Plugin License, Next: Loading Extensions, Prev: Internals, Up: Dynamic Extensions - -C.3.2 Extension Licensing -------------------------- - -Every dynamic extension should define the global symbol -`plugin_is_GPL_compatible' to assert that it has been licensed under a -GPL-compatible license. If this symbol does not exist, `gawk' will -emit a fatal error and exit. - - The declared type of the symbol should be `int'. It does not need -to be in any allocated section, though. The code merely asserts that -the symbol exists in the global scope. Something like this is enough: - - int plugin_is_GPL_compatible; - - -File: gawk.info, Node: Loading Extensions, Next: Sample Library, Prev: Plugin License, Up: Dynamic Extensions - -C.3.3 Loading a Dynamic Extension ---------------------------------- - -There are two ways to load a dynamically linked library. The first is -to use the builtin `extension()': - - extension(libname, init_func) - - where `libname' is the library to load, and `init_func' is the name -of the initialization or bootstrap routine to run once loaded. - - The second method for dynamic loading of a library is to use the -command line option `-l': - - $ gawk -l libname -f myprog - - This will work only if the initialization routine is named -`dl_load()'. - - If you use `extension()', the library will be loaded at run time. -This means that the functions are available only to the rest of your -script. If you use the command line option `-l' instead, the library -will be loaded before `gawk' starts compiling the actual program. The -net effect is that you can use those functions anywhere in the program. - - `gawk' has a list of directories where it searches for libraries. -By default, the list includes directories that depend upon how gawk was -built and installed (*note AWKLIBPATH Variable::). If you want `gawk' -to look for libraries in your private directory, you have to tell it. -The way to do it is to set the `AWKLIBPATH' environment variable (*note -AWKLIBPATH Variable::). `gawk' supplies the default shared library -platform suffix if it is not present in the name of the library. If -the name of your library is `mylib.so', you can simply type - - $ gawk -l mylib -f myprog - - and `gawk' will do everything necessary to load in your library, and -then call your `dl_load()' routine. - - You can always specify the library using an absolute pathname, in -which case `gawk' will not use `AWKLIBPATH' to search for it. - - -File: gawk.info, Node: Sample Library, Prev: Loading Extensions, Up: Dynamic Extensions - -C.3.4 Example: Directory and File Operation Built-ins ------------------------------------------------------ - -Two useful functions that are not in `awk' are `chdir()' (so that an -`awk' program can change its directory) and `stat()' (so that an `awk' -program can gather information about a file). This minor node -implements these functions for `gawk' in an external extension library. - -* Menu: - -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. - - -File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library - -C.3.4.1 Using `chdir()' and `stat()' -.................................... - -This minor node shows how to use the new functions at the `awk' level -once they've been integrated into the running `gawk' interpreter. -Using `chdir()' is very straightforward. It takes one argument, the new -directory to change to: - - ... - newdir = "/home/arnold/funstuff" - ret = chdir(newdir) - if (ret < 0) { - printf("could not change to %s: %s\n", - newdir, ERRNO) > "/dev/stderr" - exit 1 - } - ... - - The return value is negative if the `chdir' failed, and `ERRNO' -(*note Built-in Variables::) is set to a string indicating the error. - - Using `stat()' is a bit more complicated. The C `stat()' function -fills in a structure that has a fair amount of information. The right -way to model this in `awk' is to fill in an associative array with the -appropriate information: - - file = "/home/arnold/.profile" - fdata[1] = "x" # force `fdata' to be an array - ret = stat(file, fdata) - if (ret < 0) { - printf("could not stat %s: %s\n", - file, ERRNO) > "/dev/stderr" - exit 1 - } - printf("size of %s is %d bytes\n", file, fdata["size"]) - - The `stat()' function always clears the data array, even if the -`stat()' fails. It fills in the following elements: - -`"name"' - The name of the file that was `stat()''ed. - -`"dev"' -`"ino"' - The file's device and inode numbers, respectively. - -`"mode"' - The file's mode, as a numeric value. This includes both the file's - type and its permissions. - -`"nlink"' - The number of hard links (directory entries) the file has. - -`"uid"' -`"gid"' - The numeric user and group ID numbers of the file's owner. - -`"size"' - The size in bytes of the file. - -`"blocks"' - The number of disk blocks the file actually occupies. This may not - be a function of the file's size if the file has holes. - -`"atime"' -`"mtime"' -`"ctime"' - The file's last access, modification, and inode update times, - respectively. These are numeric timestamps, suitable for - formatting with `strftime()' (*note Built-in::). - -`"pmode"' - The file's "printable mode." This is a string representation of - the file's type and permissions, such as what is produced by `ls - -l'--for example, `"drwxr-xr-x"'. - -`"type"' - A printable string representation of the file's type. The value - is one of the following: - - `"blockdev"' - `"chardev"' - The file is a block or character device ("special file"). - - `"directory"' - The file is a directory. - - `"fifo"' - The file is a named-pipe (also known as a FIFO). - - `"file"' - The file is just a regular file. - - `"socket"' - The file is an `AF_UNIX' ("Unix domain") socket in the - filesystem. - - `"symlink"' - The file is a symbolic link. - - Several additional elements may be present depending upon the -operating system and the type of the file. You can test for them in -your `awk' program by using the `in' operator (*note Reference to -Elements::): - -`"blksize"' - The preferred block size for I/O to the file. This field is not - present on all POSIX-like systems in the C `stat' structure. - -`"linkval"' - If the file is a symbolic link, this element is the name of the - file the link points to (i.e., the value of the link). - -`"rdev"' -`"major"' -`"minor"' - If the file is a block or character device file, then these values - represent the numeric device number and the major and minor - components of that number, respectively. - - -File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library - -C.3.4.2 C Code for `chdir()' and `stat()' -......................................... - -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability to -other POSIX-compliant systems:(1) - - #include "awk.h" - - #include <sys/sysmacros.h> - - int plugin_is_GPL_compatible; - - /* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - - static NODE * - do_chdir(int nargs) - { - NODE *newdir; - int ret = -1; - - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); - - newdir = get_scalar_argument(0, FALSE); - - The file includes the `"awk.h"' header file for definitions for the -`gawk' internals. It includes `<sys/sysmacros.h>' for access to the -`major()' and `minor'() macros. - - By convention, for an `awk' function `foo', the function that -implements it is called `do_foo'. The function should take a `int' -argument, usually called `nargs', that represents the number of defined -arguments for the function. The `newdir' variable represents the new -directory to change to, retrieved with `get_scalar_argument()'. Note -that the first argument is numbered zero. - - This code actually accomplishes the `chdir()'. It first forces the -argument to be a string and passes the string value to the `chdir()' -system call. If the `chdir()' fails, `ERRNO' is updated. - - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); - - Finally, the function returns the return value to the `awk' level: - - return make_number((AWKNUM) ret); - } - - The `stat()' built-in is more involved. First comes a function that -turns a numeric mode into a printable representation (e.g., 644 becomes -`-rw-r--r--'). This is omitted here for brevity: - - /* format_mode --- turn a stat mode field into something readable */ - - static char * - format_mode(unsigned long fmode) - { - ... - } - - Next comes the `do_stat()' function. It starts with variable -declarations and argument checking: - - /* do_stat --- provide a stat() function for gawk */ - - static NODE * - do_stat(int nargs) - { - NODE *file, *array, *tmp; - struct stat sbuf; - int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; - - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); - - Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. The code use `lstat()' (instead of -`stat()') to get the file information, in case the file is a symbolic -link. If there's an error, it sets `ERRNO' and returns: - - /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); - - /* empty out the array */ - assoc_clear(array); - - /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); - if (ret < 0) { - update_ERRNO_int(errno); - return make_number((AWKNUM) ret); - } - - Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: - - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); - - When done, return the `lstat()' return value: - - - return make_number((AWKNUM) ret); - } - - Finally, it's necessary to provide the "glue" that loads the new -function(s) into `gawk'. By convention, each library has a routine -named `dl_load()' that does the job. The simplest way is to use the -`dl_load_func' macro in `gawkapi.h'. - - And that's it! As an exercise, consider adding functions to -implement system calls such as `chown()', `chmod()', and `umask()'. - - ---------- Footnotes ---------- - - (1) This version is edited slightly for presentation. See -`extension/filefuncs.c' in the `gawk' distribution for the complete -version. - - -File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library +File: gawk.info, Node: Future Extensions, Prev: Additions, Up: Notes -C.3.4.3 Integrating the Extensions -.................................. - -Now that the code is written, it must be possible to add it at runtime -to the running `gawk' interpreter. First, the code must be compiled. -Assuming that the functions are in a file named `filefuncs.c', and IDIR -is the location of the `gawk' include files, the following steps create -a GNU/Linux shared library: - - $ gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -IIDIR filefuncs.c - $ ld -o filefuncs.so -shared filefuncs.o - - Once the library exists, it is loaded by calling the `extension()' -built-in function. This function takes two arguments: the name of the -library to load and the name of a function to call when the library is -first loaded. This function adds the new functions to `gawk'. It -returns the value returned by the initialization function within the -shared library: - - # file testff.awk - BEGIN { - extension("./filefuncs.so", "dl_load") - - chdir(".") # no-op - - data[1] = 1 # force `data' to be an array - print "Info for testff.awk" - ret = stat("testff.awk", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "testff.awk modified:", - strftime("%m %d %y %H:%M:%S", data["mtime"]) - - print "\nInfo for JUNK" - ret = stat("JUNK", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) - } - - Here are the results of running the program: - - $ gawk -f testff.awk - -| Info for testff.awk - -| ret = 0 - -| data["size"] = 607 - -| data["ino"] = 14945891 - -| data["name"] = testff.awk - -| data["pmode"] = -rw-rw-r-- - -| data["nlink"] = 1 - -| data["atime"] = 1293993369 - -| data["mtime"] = 1288520752 - -| data["mode"] = 33204 - -| data["blksize"] = 4096 - -| data["dev"] = 2054 - -| data["type"] = file - -| data["gid"] = 500 - -| data["uid"] = 500 - -| data["blocks"] = 8 - -| data["ctime"] = 1290113572 - -| testff.awk modified: 10 31 10 12:25:52 - -| - -| Info for JUNK - -| ret = -1 - -| JUNK modified: 01 01 70 02:00:00 - - -File: gawk.info, Node: Future Extensions, Prev: Dynamic Extensions, Up: Notes - -C.4 Probable Future Extensions +C.3 Probable Future Extensions ============================== AWK is a language similar to PERL, only considerably more elegant. @@ -23369,12 +23106,9 @@ well. Following is a list of probable future changes visible at the `awk' language level: -Loadable module interface - It is not clear that the `awk'-level interface to the modules - facility is as good as it should be. The interface needs to be - redesigned, particularly taking namespace issues into account, as - well as possibly including issues such as library search path order - and versioning. +Databases + It may be possible to map a GDBM/NDBM/SDBM file into an `awk' + array. `RECLEN' variable for fixed-length records Along with `FIELDWIDTHS', this would speed up the processing of @@ -23382,30 +23116,12 @@ Loadable module interface `"RECLEN"', depending upon which kind of record processing is in effect. -Databases - It may be possible to map a GDBM/NDBM/SDBM file into an `awk' - array. - More `lint' warnings There are more things that could be checked for portability. Following is a list of probable improvements that will make `gawk''s source code easier to work with: -Loadable module mechanics - The current extension mechanism works (*note Dynamic Extensions::), - but is rather primitive. It requires a fair amount of manual work - to create and integrate a loadable module. Nor is the current - mechanism as portable as might be desired. The GNU `libtool' - package provides a number of features that would make using - loadable modules much easier. `gawk' should be changed to use - `libtool'. - -Loadable module internals - The API to its internals that `gawk' "exports" should be revised. - Too many things are needlessly exposed. A new API should be - designed and implemented to make module writing easier. - Better array subscript management `gawk''s management of array subscript storage could use revamping, so that using the same value to index multiple arrays only stores @@ -25955,7 +25671,7 @@ Index * Ada programming language: Glossary. (line 20) * adding, features to gawk: Adding Code. (line 6) * adding, fields: Changing Fields. (line 53) -* adding, functions to gawk: Dynamic Extensions. (line 10) +* adding, functions to gawk: Dynamic Extensions. (line 9) * advanced features, buffering: I/O Functions. (line 98) * advanced features, close() function: Close Files And Pipes. (line 131) @@ -26014,18 +25730,15 @@ Index * arguments, command-line, invoking awk: Command Line. (line 6) * arguments, in function calls: Function Calls. (line 16) * arguments, processing: Getopt Function. (line 6) -* arguments, retrieving: Internals. (line 111) * arithmetic operators: Arithmetic Ops. (line 6) * arrays: Arrays. (line 6) * arrays, as parameters to functions: Pass By Value/Reference. (line 47) * arrays, associative: Array Intro. (line 50) -* arrays, associative, clearing: Internals. (line 68) * arrays, associative, library functions and: Library Names. (line 57) * arrays, deleting entire contents: Delete. (line 39) * arrays, elements, assigning: Assigning Elements. (line 6) * arrays, elements, deleting: Delete. (line 6) -* arrays, elements, installing: Internals. (line 72) * arrays, elements, order of: Scanning an Array. (line 48) * arrays, elements, referencing: Reference to Elements. (line 6) @@ -26064,8 +25777,6 @@ Index * assignment operators, evaluation order: Assignment Ops. (line 111) * assignment operators, lvalues/rvalues: Assignment Ops. (line 32) * assignments as filenames: Ignoring Assigns. (line 6) -* assoc_clear() internal function: Internals. (line 68) -* assoc_lookup() internal function: Internals. (line 72) * associative arrays: Array Intro. (line 50) * asterisk (*), * operator, as multiplication operator: Precedence. (line 55) @@ -26134,10 +25845,8 @@ Index * awk, versions of, See Also Brian Kernighan's awk <1>: Other Versions. (line 13) * awk, versions of, See Also Brian Kernighan's awk: BTL. (line 6) -* awk.h file (internal): Internals. (line 15) * awka compiler for awk: Other Versions. (line 55) * AWKLIBPATH environment variable: AWKLIBPATH Variable. (line 6) -* AWKNUM internal type: Internals. (line 19) * AWKPATH environment variable <1>: PC Using. (line 11) * AWKPATH environment variable: AWKPATH Variable. (line 6) * awkprof.out file: Profiling. (line 6) @@ -26339,7 +26048,6 @@ Index * close() function, two-way pipes and: Two-way I/O. (line 77) * Close, Diane <1>: Contributors. (line 21) * Close, Diane: Manual History. (line 41) -* close_func() input method: Internals. (line 157) * collating elements: Bracket Expressions. (line 69) * collating symbols: Bracket Expressions. (line 76) * Colombo, Antonio: Acknowledgments. (line 60) @@ -26719,7 +26427,6 @@ Index * DuBois, John: Acknowledgments. (line 60) * dump debugger command: Miscellaneous Debugger Commands. (line 9) -* dupnode() internal function: Internals. (line 87) * dupword.awk program: Dupword Program. (line 31) * e debugger command (alias for enable): Breakpoint Control. (line 73) * EBCDIC: Ordinal Functions. (line 45) @@ -26760,7 +26467,6 @@ Index * endgrent() user-defined function: Group Functions. (line 218) * endpwent() function (C library): Passwd Functions. (line 210) * endpwent() user-defined function: Passwd Functions. (line 213) -* ENVIRON array <1>: Internals. (line 146) * ENVIRON array: Auto-set. (line 60) * environment variables: Auto-set. (line 60) * epoch, definition of: Glossary. (line 239) @@ -26769,11 +26475,10 @@ Index * equals sign (=), == operator: Comparison Operators. (line 11) * EREs (Extended Regular Expressions): Bracket Expressions. (line 24) -* ERRNO variable <1>: Internals. (line 130) -* ERRNO variable <2>: TCP/IP Networking. (line 54) -* ERRNO variable <3>: Auto-set. (line 73) -* ERRNO variable <4>: BEGINFILE/ENDFILE. (line 26) -* ERRNO variable <5>: Close Files And Pipes. +* ERRNO variable <1>: TCP/IP Networking. (line 54) +* ERRNO variable <2>: Auto-set. (line 73) +* ERRNO variable <3>: BEGINFILE/ENDFILE. (line 26) +* ERRNO variable <4>: Close Files And Pipes. (line 139) * ERRNO variable: Getline. (line 19) * error handling: Special FD. (line 16) @@ -26818,7 +26523,6 @@ Index (line 9) * expressions, selecting: Conditional Exp. (line 6) * Extended Regular Expressions (EREs): Bracket Expressions. (line 24) -* eXtensible Markup Language (XML): Internals. (line 157) * extension() function (gawk): Using Internal File Ops. (line 15) * extensions, Brian Kernighan's awk <1>: Other Versions. (line 13) @@ -26958,15 +26662,11 @@ Index (line 6) * floating-point, numbers <1>: Unexpected Results. (line 6) * floating-point, numbers: Basic Data Typing. (line 21) -* floating-point, numbers, AWKNUM internal type: Internals. (line 19) * FNR variable <1>: Auto-set. (line 103) * FNR variable: Records. (line 6) * FNR variable, changing: Auto-set. (line 225) * for statement: For Statement. (line 6) * for statement, in arrays: Scanning an Array. (line 20) -* force_number() internal function: Internals. (line 27) -* force_string() internal function: Internals. (line 32) -* force_wstring() internal function: Internals. (line 37) * format specifiers, mixing regular with positional specifiers: Printf Ordering. (line 57) * format specifiers, printf statement: Control Letters. (line 6) @@ -27014,7 +26714,7 @@ Index (line 47) * functions, built-in <1>: Functions. (line 6) * functions, built-in: Function Calls. (line 10) -* functions, built-in, adding to gawk: Dynamic Extensions. (line 10) +* functions, built-in, adding to gawk: Dynamic Extensions. (line 9) * functions, built-in, evaluation order: Calling Built-in. (line 30) * functions, defining: Definition Syntax. (line 6) * functions, library: Library Functions. (line 6) @@ -27042,7 +26742,6 @@ Index * functions, names of <1>: Definition Syntax. (line 20) * functions, names of: Arrays. (line 18) * functions, recursive: Definition Syntax. (line 73) -* functions, return values, setting: Internals. (line 130) * functions, string-translation: I18N Functions. (line 6) * functions, undefined: Pass By Value/Reference. (line 71) @@ -27096,8 +26795,7 @@ Index * gawk, FPAT variable in: Splitting By Content. (line 26) * gawk, function arguments and: Calling Built-in. (line 16) -* gawk, functions, adding: Dynamic Extensions. (line 10) -* gawk, functions, loading: Loading Extensions. (line 6) +* gawk, functions, adding: Dynamic Extensions. (line 9) * gawk, hexadecimal numbers and: Nondecimal-numbers. (line 42) * gawk, IGNORECASE variable in <1>: Array Sorting Functions. (line 81) @@ -27112,7 +26810,6 @@ Index * gawk, implementation issues, limits: Getline Notes. (line 14) * gawk, implementation issues, pipes: Redirection. (line 135) * gawk, installing: Installation. (line 6) -* gawk, internals: Internals. (line 6) * gawk, internationalization and, See internationalization: Internationalization. (line 13) * gawk, interpreter, adding code to: Using Internal File Ops. @@ -27158,11 +26855,6 @@ Index * gensub() function (gawk): Using Constant Regexps. (line 43) * gensub() function (gawk), escape processing: Gory Details. (line 6) -* get_actual_argument() internal function: Internals. (line 116) -* get_argument() internal function: Internals. (line 111) -* get_array_argument() internal macro: Internals. (line 127) -* get_record() input method: Internals. (line 157) -* get_scalar_argument() internal macro: Internals. (line 124) * getaddrinfo() function (C library): TCP/IP Networking. (line 38) * getgrent() function (C library): Group Functions. (line 6) * getgrent() user-defined function: Group Functions. (line 6) @@ -27323,37 +27015,6 @@ Index * integers: Basic Data Typing. (line 21) * integers, unsigned: Basic Data Typing. (line 30) * interacting with other programs: I/O Functions. (line 63) -* internal constant, INVALID_HANDLE: Internals. (line 157) -* internal function, assoc_clear(): Internals. (line 68) -* internal function, assoc_lookup(): Internals. (line 72) -* internal function, dupnode(): Internals. (line 87) -* internal function, force_number(): Internals. (line 27) -* internal function, force_string(): Internals. (line 32) -* internal function, force_wstring(): Internals. (line 37) -* internal function, get_actual_argument(): Internals. (line 116) -* internal function, get_argument(): Internals. (line 111) -* internal function, iop_alloc(): Internals. (line 157) -* internal function, make_builtin(): Internals. (line 97) -* internal function, make_number(): Internals. (line 82) -* internal function, make_string(): Internals. (line 77) -* internal function, register_deferred_variable(): Internals. (line 146) -* internal function, register_open_hook(): Internals. (line 157) -* internal function, unref(): Internals. (line 92) -* internal function, unset_ERRNO(): Internals. (line 141) -* internal function, update_ERRNO_int(): Internals. (line 130) -* internal function, update_ERRNO_string(): Internals. (line 135) -* internal macro, get_array_argument(): Internals. (line 127) -* internal macro, get_scalar_argument(): Internals. (line 124) -* internal structure, IOBUF: Internals. (line 157) -* internal type, AWKNUM: Internals. (line 19) -* internal type, NODE: Internals. (line 23) -* internal variable, nargs: Internals. (line 42) -* internal variable, stlen: Internals. (line 46) -* internal variable, stptr: Internals. (line 46) -* internal variable, type: Internals. (line 59) -* internal variable, vname: Internals. (line 64) -* internal variable, wstlen: Internals. (line 54) -* internal variable, wstptr: Internals. (line 54) * internationalization <1>: I18N and L10N. (line 6) * internationalization: I18N Functions. (line 6) * internationalization, localization <1>: Internationalization. @@ -27373,10 +27034,7 @@ Index * interpreted programs <1>: Glossary. (line 361) * interpreted programs: Basic High Level. (line 14) * interval expressions: Regexp Operators. (line 116) -* INVALID_HANDLE internal constant: Internals. (line 157) * inventory-shipped file: Sample Data Files. (line 32) -* IOBUF internal structure: Internals. (line 157) -* iop_alloc() internal function: Internals. (line 157) * isarray() function (gawk): Type Functions. (line 11) * ISO: Glossary. (line 372) * ISO 8859-1: Glossary. (line 141) @@ -27482,7 +27140,6 @@ Index * Linux: Manual History. (line 28) * list debugger command: Miscellaneous Debugger Commands. (line 74) -* loading extension: Loading Extensions. (line 6) * loading, library: Options. (line 173) * local variables: Variable Scope. (line 6) * locale categories: Explaining gettext. (line 80) @@ -27502,15 +27159,11 @@ Index * loops, count for header: Profiling. (line 123) * loops, exiting: Break Statement. (line 6) * loops, See Also while statement: While Statement. (line 6) -* Lost In Space: Dynamic Extensions. (line 6) * ls utility: More Complex. (line 15) * lshift() function (gawk): Bitwise Functions. (line 46) * lvalues/rvalues: Assignment Ops. (line 32) * mailing labels, printing: Labels Program. (line 6) * mailing list, GNITS: Acknowledgments. (line 52) -* make_builtin() internal function: Internals. (line 97) -* make_number() internal function: Internals. (line 82) -* make_string() internal function: Internals. (line 77) * mark parity: Ordinal Functions. (line 45) * marked string extraction (internationalization): String Extraction. (line 6) @@ -27525,7 +27178,6 @@ Index * matching, null strings: Gory Details. (line 164) * mawk program: Other Versions. (line 35) * McPhee, Patrick: Contributors. (line 100) -* memory, releasing: Internals. (line 92) * message object files: Explaining gettext. (line 41) * message object files, converting from portable object files: I18N Example. (line 62) @@ -27551,7 +27203,6 @@ Index * namespace issues <1>: Library Names. (line 6) * namespace issues: Arrays. (line 18) * namespace issues, functions: Definition Syntax. (line 20) -* nargs internal variable: Internals. (line 42) * nawk utility: Names. (line 17) * negative zero: Unexpected Results. (line 28) * NetBSD: Glossary. (line 611) @@ -27592,8 +27243,6 @@ Index * ni debugger command (alias for nexti): Debugger Execution Control. (line 49) * noassign.awk program: Ignoring Assigns. (line 15) -* NODE internal type: Internals. (line 23) -* nodes, duplicating: Internals. (line 87) * not Boolean-logic operator: Boolean Ops. (line 6) * NR variable <1>: Auto-set. (line 119) * NR variable: Records. (line 6) @@ -27614,7 +27263,6 @@ Index * number sign (#), #! (executable scripts), portability issues with: Executable Scripts. (line 6) * number sign (#), commenting: Comments. (line 6) -* numbers: Internals. (line 82) * numbers, as array subscripts: Numeric Array Subscripts. (line 6) * numbers, as values of characters: Ordinal Functions. (line 6) @@ -27624,16 +27272,13 @@ Index * numbers, converting: Conversion. (line 6) * numbers, converting, to strings: User-modified. (line 28) * numbers, floating-point: Basic Data Typing. (line 21) -* numbers, floating-point, AWKNUM internal type: Internals. (line 19) * numbers, hexadecimal: Nondecimal-numbers. (line 6) -* numbers, NODE internal type: Internals. (line 23) * numbers, octal: Nondecimal-numbers. (line 6) * numbers, random: Numeric Functions. (line 64) * numbers, rounding: Round Function. (line 6) * numeric, constants: Scalar Constants. (line 6) * numeric, output format: OFMT. (line 6) * numeric, strings: Variable Typing. (line 6) -* numeric, values: Internals. (line 27) * o debugger command (alias for option): Debugger Info. (line 57) * oawk utility: Names. (line 17) * obsolete features: Obsolete. (line 6) @@ -27717,7 +27362,6 @@ Index (line 36) * P1003.1 POSIX standard: Glossary. (line 454) * P1003.2 POSIX standard: Glossary. (line 454) -* parameters, number of: Internals. (line 42) * parentheses () <1>: Profiling. (line 138) * parentheses (): Regexp Operators. (line 79) * password file: Passwd Functions. (line 16) @@ -27880,13 +27524,12 @@ Index * private variables: Library Names. (line 11) * processes, two-way communications with: Two-way I/O. (line 23) * processing data: Basic High Level. (line 6) -* PROCINFO array <1>: Internals. (line 146) -* PROCINFO array <2>: Id Program. (line 15) -* PROCINFO array <3>: Group Functions. (line 6) -* PROCINFO array <4>: Passwd Functions. (line 6) -* PROCINFO array <5>: Two-way I/O. (line 116) -* PROCINFO array <6>: Time Functions. (line 46) -* PROCINFO array <7>: Auto-set. (line 124) +* PROCINFO array <1>: Id Program. (line 15) +* PROCINFO array <2>: Group Functions. (line 6) +* PROCINFO array <3>: Passwd Functions. (line 6) +* PROCINFO array <4>: Two-way I/O. (line 116) +* PROCINFO array <5>: Time Functions. (line 46) +* PROCINFO array <6>: Auto-set. (line 124) * PROCINFO array: Obsolete. (line 11) * profiling awk programs: Profiling. (line 6) * profiling awk programs, dynamically: Profiling. (line 171) @@ -27976,8 +27619,6 @@ Index * regexp constants, slashes vs. quotes: Computed Regexps. (line 28) * regexp constants, vs. string constants: Computed Regexps. (line 38) * regexp, See regular expressions: Regexp. (line 6) -* register_deferred_variable() internal function: Internals. (line 146) -* register_open_hook() internal function: Internals. (line 157) * regular expressions: Regexp. (line 6) * regular expressions as field separators: Field Separators. (line 50) * regular expressions, anchors in: Regexp Operators. (line 22) @@ -28046,8 +27687,6 @@ Index * Robbins, Miriam <1>: Passwd Functions. (line 90) * Robbins, Miriam <2>: Getline/Pipe. (line 36) * Robbins, Miriam: Acknowledgments. (line 83) -* Robinson, Will: Dynamic Extensions. (line 6) -* robot, the: Dynamic Extensions. (line 6) * Rommel, Kai Uwe: Contributors. (line 43) * round() user-defined function: Round Function. (line 16) * rounding mode, floating-point: Rounding Mode. (line 6) @@ -28209,8 +27848,6 @@ Index (line 68) * stepi debugger command: Debugger Execution Control. (line 76) -* stlen internal variable: Internals. (line 46) -* stptr internal variable: Internals. (line 46) * stream editors <1>: Simple Sed. (line 6) * stream editors: Field Splitting Summary. (line 47) @@ -28221,7 +27858,6 @@ Index (line 6) * string operators: Concatenation. (line 9) * string-matching operators: Regexp Usage. (line 19) -* strings: Internals. (line 77) * strings, converting <1>: Bitwise Functions. (line 109) * strings, converting: Conversion. (line 6) * strings, converting, numbers to: User-modified. (line 28) @@ -28230,7 +27866,6 @@ Index * strings, for localization: Programmer i18n. (line 14) * strings, length of: Scalar Constants. (line 20) * strings, merging arrays into: Join Function. (line 6) -* strings, NODE internal type: Internals. (line 23) * strings, null: Regexp Field Splitting. (line 43) * strings, numeric: Variable Typing. (line 6) @@ -28352,7 +27987,6 @@ Index * trunc-mod operation: Arithmetic Ops. (line 66) * truth values: Truth Values. (line 6) * type conversion: Conversion. (line 21) -* type internal variable: Internals. (line 59) * u debugger command (alias for until): Debugger Execution Control. (line 83) * undefined functions: Pass By Value/Reference. @@ -28378,16 +28012,12 @@ Index (line 72) * Unix, awk scripts and: Executable Scripts. (line 6) * UNIXROOT variable, on OS/2 systems: PC Using. (line 17) -* unref() internal function: Internals. (line 92) -* unset_ERRNO() internal function: Internals. (line 141) * unsigned integers: Basic Data Typing. (line 30) * until debugger command: Debugger Execution Control. (line 83) * unwatch debugger command: Viewing And Changing Data. (line 84) * up debugger command: Execution Stack. (line 33) -* update_ERRNO_int() internal function: Internals. (line 130) -* update_ERRNO_string() internal function: Internals. (line 135) * user database, reading: Passwd Functions. (line 6) * user-defined, functions: User-defined. (line 6) * user-defined, functions, counts: Profiling. (line 129) @@ -28438,7 +28068,6 @@ Index * vertical bar (|), || operator <1>: Precedence. (line 89) * vertical bar (|), || operator: Boolean Ops. (line 57) * Vinschen, Corinna: Acknowledgments. (line 60) -* vname internal variable: Internals. (line 64) * w debugger command (alias for watch): Viewing And Changing Data. (line 67) * w utility: Constant Size. (line 22) @@ -28472,11 +28101,8 @@ Index * words, counting: Wc Program. (line 6) * words, duplicate, searching for: Dupword Program. (line 6) * words, usage counts, generating: Word Sorting. (line 6) -* wstlen internal variable: Internals. (line 54) -* wstptr internal variable: Internals. (line 54) * xgawk: Other Versions. (line 120) * xgettext utility: String Extraction. (line 13) -* XML (eXtensible Markup Language): Internals. (line 157) * XOR bitwise operation: Bitwise Functions. (line 6) * xor() function (gawk): Bitwise Functions. (line 55) * Yawitz, Efraim: Contributors. (line 106) @@ -28514,442 +28140,440 @@ Index Tag Table: Node: Top1352 -Node: Foreword31758 -Node: Preface36103 -Ref: Preface-Footnote-139156 -Ref: Preface-Footnote-239262 -Node: History39494 -Node: Names41885 -Ref: Names-Footnote-143362 -Node: This Manual43434 -Ref: This Manual-Footnote-148372 -Node: Conventions48472 -Node: Manual History50606 -Ref: Manual History-Footnote-153876 -Ref: Manual History-Footnote-253917 -Node: How To Contribute53991 -Node: Acknowledgments55135 -Node: Getting Started59631 -Node: Running gawk62010 -Node: One-shot63196 -Node: Read Terminal64421 -Ref: Read Terminal-Footnote-166071 -Ref: Read Terminal-Footnote-266347 -Node: Long66518 -Node: Executable Scripts67894 -Ref: Executable Scripts-Footnote-169763 -Ref: Executable Scripts-Footnote-269865 -Node: Comments70412 -Node: Quoting72879 -Node: DOS Quoting77502 -Node: Sample Data Files78177 -Node: Very Simple81209 -Node: Two Rules85808 -Node: More Complex87955 -Ref: More Complex-Footnote-190885 -Node: Statements/Lines90970 -Ref: Statements/Lines-Footnote-195432 -Node: Other Features95697 -Node: When96625 -Node: Invoking Gawk98772 -Node: Command Line100233 -Node: Options101016 -Ref: Options-Footnote-1116414 -Node: Other Arguments116439 -Node: Naming Standard Input119097 -Node: Environment Variables120191 -Node: AWKPATH Variable120749 -Ref: AWKPATH Variable-Footnote-1123507 -Node: AWKLIBPATH Variable123767 -Node: Other Environment Variables124364 -Node: Exit Status126859 -Node: Include Files127534 -Node: Loading Shared Libraries131103 -Node: Obsolete132328 -Node: Undocumented133025 -Node: Regexp133268 -Node: Regexp Usage134657 -Node: Escape Sequences136683 -Node: Regexp Operators142446 -Ref: Regexp Operators-Footnote-1149826 -Ref: Regexp Operators-Footnote-2149973 -Node: Bracket Expressions150071 -Ref: table-char-classes151961 -Node: GNU Regexp Operators154484 -Node: Case-sensitivity158207 -Ref: Case-sensitivity-Footnote-1161175 -Ref: Case-sensitivity-Footnote-2161410 -Node: Leftmost Longest161518 -Node: Computed Regexps162719 -Node: Reading Files166129 -Node: Records168133 -Ref: Records-Footnote-1176807 -Node: Fields176844 -Ref: Fields-Footnote-1179877 -Node: Nonconstant Fields179963 -Node: Changing Fields182165 -Node: Field Separators188146 -Node: Default Field Splitting190775 -Node: Regexp Field Splitting191892 -Node: Single Character Fields195234 -Node: Command Line Field Separator196293 -Node: Field Splitting Summary199734 -Ref: Field Splitting Summary-Footnote-1202926 -Node: Constant Size203027 -Node: Splitting By Content207611 -Ref: Splitting By Content-Footnote-1211337 -Node: Multiple Line211377 -Ref: Multiple Line-Footnote-1217224 -Node: Getline217403 -Node: Plain Getline219619 -Node: Getline/Variable221708 -Node: Getline/File222849 -Node: Getline/Variable/File224171 -Ref: Getline/Variable/File-Footnote-1225770 -Node: Getline/Pipe225857 -Node: Getline/Variable/Pipe228417 -Node: Getline/Coprocess229524 -Node: Getline/Variable/Coprocess230767 -Node: Getline Notes231481 -Node: Getline Summary233423 -Ref: table-getline-variants233831 -Node: Read Timeout234687 -Ref: Read Timeout-Footnote-1238432 -Node: Command line directories238489 -Node: Printing239119 -Node: Print240750 -Node: Print Examples242087 -Node: Output Separators244871 -Node: OFMT246631 -Node: Printf247989 -Node: Basic Printf248895 -Node: Control Letters250434 -Node: Format Modifiers254246 -Node: Printf Examples260255 -Node: Redirection262970 -Node: Special Files269954 -Node: Special FD270487 -Ref: Special FD-Footnote-1274112 -Node: Special Network274186 -Node: Special Caveats275036 -Node: Close Files And Pipes275832 -Ref: Close Files And Pipes-Footnote-1282855 -Ref: Close Files And Pipes-Footnote-2283003 -Node: Expressions283153 -Node: Values284285 -Node: Constants284961 -Node: Scalar Constants285641 -Ref: Scalar Constants-Footnote-1286500 -Node: Nondecimal-numbers286682 -Node: Regexp Constants289741 -Node: Using Constant Regexps290216 -Node: Variables293271 -Node: Using Variables293926 -Node: Assignment Options295650 -Node: Conversion297522 -Ref: table-locale-affects302898 -Ref: Conversion-Footnote-1303522 -Node: All Operators303631 -Node: Arithmetic Ops304261 -Node: Concatenation306766 -Ref: Concatenation-Footnote-1309559 -Node: Assignment Ops309679 -Ref: table-assign-ops314667 -Node: Increment Ops316075 -Node: Truth Values and Conditions319545 -Node: Truth Values320628 -Node: Typing and Comparison321677 -Node: Variable Typing322466 -Ref: Variable Typing-Footnote-1326363 -Node: Comparison Operators326485 -Ref: table-relational-ops326895 -Node: POSIX String Comparison330444 -Ref: POSIX String Comparison-Footnote-1331400 -Node: Boolean Ops331538 -Ref: Boolean Ops-Footnote-1335616 -Node: Conditional Exp335707 -Node: Function Calls337439 -Node: Precedence341033 -Node: Locales344702 -Node: Patterns and Actions345791 -Node: Pattern Overview346845 -Node: Regexp Patterns348514 -Node: Expression Patterns349057 -Node: Ranges352742 -Node: BEGIN/END355708 -Node: Using BEGIN/END356470 -Ref: Using BEGIN/END-Footnote-1359201 -Node: I/O And BEGIN/END359307 -Node: BEGINFILE/ENDFILE361589 -Node: Empty364482 -Node: Using Shell Variables364798 -Node: Action Overview367083 -Node: Statements369440 -Node: If Statement371294 -Node: While Statement372793 -Node: Do Statement374837 -Node: For Statement375993 -Node: Switch Statement379145 -Node: Break Statement381242 -Node: Continue Statement383232 -Node: Next Statement385025 -Node: Nextfile Statement387415 -Node: Exit Statement389960 -Node: Built-in Variables392376 -Node: User-modified393471 -Ref: User-modified-Footnote-1401826 -Node: Auto-set401888 -Ref: Auto-set-Footnote-1411796 -Node: ARGC and ARGV412001 -Node: Arrays415852 -Node: Array Basics417357 -Node: Array Intro418183 -Node: Reference to Elements422501 -Node: Assigning Elements424771 -Node: Array Example425262 -Node: Scanning an Array426994 -Node: Controlling Scanning429308 -Ref: Controlling Scanning-Footnote-1434241 -Node: Delete434557 -Ref: Delete-Footnote-1436992 -Node: Numeric Array Subscripts437049 -Node: Uninitialized Subscripts439232 -Node: Multi-dimensional440860 -Node: Multi-scanning443954 -Node: Arrays of Arrays445545 -Node: Functions450190 -Node: Built-in451012 -Node: Calling Built-in452090 -Node: Numeric Functions454078 -Ref: Numeric Functions-Footnote-1457910 -Ref: Numeric Functions-Footnote-2458267 -Ref: Numeric Functions-Footnote-3458315 -Node: String Functions458584 -Ref: String Functions-Footnote-1482081 -Ref: String Functions-Footnote-2482210 -Ref: String Functions-Footnote-3482458 -Node: Gory Details482545 -Ref: table-sub-escapes484224 -Ref: table-sub-posix-92485578 -Ref: table-sub-proposed486921 -Ref: table-posix-sub488271 -Ref: table-gensub-escapes489817 -Ref: Gory Details-Footnote-1491024 -Ref: Gory Details-Footnote-2491075 -Node: I/O Functions491226 -Ref: I/O Functions-Footnote-1497881 -Node: Time Functions498028 -Ref: Time Functions-Footnote-1508920 -Ref: Time Functions-Footnote-2508988 -Ref: Time Functions-Footnote-3509146 -Ref: Time Functions-Footnote-4509257 -Ref: Time Functions-Footnote-5509369 -Ref: Time Functions-Footnote-6509596 -Node: Bitwise Functions509862 -Ref: table-bitwise-ops510420 -Ref: Bitwise Functions-Footnote-1514641 -Node: Type Functions514825 -Node: I18N Functions515295 -Node: User-defined516922 -Node: Definition Syntax517726 -Ref: Definition Syntax-Footnote-1522636 -Node: Function Example522705 -Node: Function Caveats525299 -Node: Calling A Function525720 -Node: Variable Scope526835 -Node: Pass By Value/Reference528810 -Node: Return Statement532250 -Node: Dynamic Typing535231 -Node: Indirect Calls535966 -Node: Internationalization545651 -Node: I18N and L10N547090 -Node: Explaining gettext547776 -Ref: Explaining gettext-Footnote-1552842 -Ref: Explaining gettext-Footnote-2553026 -Node: Programmer i18n553191 -Node: Translator i18n557391 -Node: String Extraction558184 -Ref: String Extraction-Footnote-1559145 -Node: Printf Ordering559231 -Ref: Printf Ordering-Footnote-1562015 -Node: I18N Portability562079 -Ref: I18N Portability-Footnote-1564528 -Node: I18N Example564591 -Ref: I18N Example-Footnote-1567226 -Node: Gawk I18N567298 -Node: Arbitrary Precision Arithmetic567915 -Ref: Arbitrary Precision Arithmetic-Footnote-1570790 -Node: Floating-point Programming570938 -Node: Floating-point Representation576208 -Node: Floating-point Context577312 -Ref: table-ieee-formats578147 -Node: Rounding Mode579517 -Ref: table-rounding-modes580144 -Ref: Rounding Mode-Footnote-1583267 -Node: Arbitrary Precision Floats583448 -Ref: Arbitrary Precision Floats-Footnote-1585489 -Node: Setting Precision585800 -Node: Setting Rounding Mode588558 -Node: Floating-point Constants589475 -Node: Changing Precision590894 -Ref: Changing Precision-Footnote-1592294 -Node: Exact Arithmetic592467 -Node: Integer Programming595480 -Node: Arbitrary Precision Integers597260 -Ref: Arbitrary Precision Integers-Footnote-1600284 -Node: MPFR and GMP Libraries600430 -Node: Advanced Features600815 -Node: Nondecimal Data602338 -Node: Array Sorting603921 -Node: Controlling Array Traversal604618 -Node: Array Sorting Functions612855 -Ref: Array Sorting Functions-Footnote-1616529 -Ref: Array Sorting Functions-Footnote-2616622 -Node: Two-way I/O616816 -Ref: Two-way I/O-Footnote-1622248 -Node: TCP/IP Networking622318 -Node: Profiling625162 -Node: Library Functions632616 -Ref: Library Functions-Footnote-1635623 -Node: Library Names635794 -Ref: Library Names-Footnote-1639265 -Ref: Library Names-Footnote-2639485 -Node: General Functions639571 -Node: Strtonum Function640524 -Node: Assert Function643454 -Node: Round Function646780 -Node: Cliff Random Function648323 -Node: Ordinal Functions649339 -Ref: Ordinal Functions-Footnote-1652409 -Ref: Ordinal Functions-Footnote-2652661 -Node: Join Function652870 -Ref: Join Function-Footnote-1654641 -Node: Getlocaltime Function654841 -Node: Data File Management658556 -Node: Filetrans Function659188 -Node: Rewind Function663327 -Node: File Checking664714 -Node: Empty Files665808 -Node: Ignoring Assigns668038 -Node: Getopt Function669591 -Ref: Getopt Function-Footnote-1680895 -Node: Passwd Functions681098 -Ref: Passwd Functions-Footnote-1690073 -Node: Group Functions690161 -Node: Walking Arrays698245 -Node: Sample Programs699814 -Node: Running Examples700479 -Node: Clones701207 -Node: Cut Program702431 -Node: Egrep Program712276 -Ref: Egrep Program-Footnote-1720049 -Node: Id Program720159 -Node: Split Program723775 -Ref: Split Program-Footnote-1727294 -Node: Tee Program727422 -Node: Uniq Program730225 -Node: Wc Program737654 -Ref: Wc Program-Footnote-1741920 -Ref: Wc Program-Footnote-2742120 -Node: Miscellaneous Programs742212 -Node: Dupword Program743400 -Node: Alarm Program745431 -Node: Translate Program750180 -Ref: Translate Program-Footnote-1754567 -Ref: Translate Program-Footnote-2754795 -Node: Labels Program754929 -Ref: Labels Program-Footnote-1758300 -Node: Word Sorting758384 -Node: History Sorting762268 -Node: Extract Program764107 -Ref: Extract Program-Footnote-1771590 -Node: Simple Sed771718 -Node: Igawk Program774780 -Ref: Igawk Program-Footnote-1789937 -Ref: Igawk Program-Footnote-2790138 -Node: Anagram Program790276 -Node: Signature Program793344 -Node: Debugger794444 -Node: Debugging795396 -Node: Debugging Concepts795829 -Node: Debugging Terms797685 -Node: Awk Debugging800282 -Node: Sample Debugging Session801174 -Node: Debugger Invocation801694 -Node: Finding The Bug803023 -Node: List of Debugger Commands809511 -Node: Breakpoint Control810845 -Node: Debugger Execution Control814509 -Node: Viewing And Changing Data817869 -Node: Execution Stack821225 -Node: Debugger Info822692 -Node: Miscellaneous Debugger Commands826673 -Node: Readline Support832118 -Node: Limitations832949 -Node: Language History835201 -Node: V7/SVR3.1836713 -Node: SVR4839034 -Node: POSIX840476 -Node: BTL841484 -Node: POSIX/GNU842218 -Node: Common Extensions847509 -Node: Ranges and Locales848616 -Ref: Ranges and Locales-Footnote-1853220 -Node: Contributors853441 -Node: Installation857702 -Node: Gawk Distribution858596 -Node: Getting859080 -Node: Extracting859906 -Node: Distribution contents861598 -Node: Unix Installation866820 -Node: Quick Installation867437 -Node: Additional Configuration Options869399 -Node: Configuration Philosophy870876 -Node: Non-Unix Installation873218 -Node: PC Installation873676 -Node: PC Binary Installation874975 -Node: PC Compiling876823 -Node: PC Testing879767 -Node: PC Using880943 -Node: Cygwin885128 -Node: MSYS886128 -Node: VMS Installation886642 -Node: VMS Compilation887245 -Ref: VMS Compilation-Footnote-1888252 -Node: VMS Installation Details888310 -Node: VMS Running889945 -Node: VMS Old Gawk891552 -Node: Bugs892026 -Node: Other Versions895878 -Node: Notes901193 -Node: Compatibility Mode901885 -Node: Additions902668 -Node: Accessing The Source903480 -Node: Adding Code904905 -Node: New Ports910872 -Node: Dynamic Extensions914985 -Node: Internals916425 -Node: Plugin License925247 -Node: Loading Extensions925885 -Node: Sample Library927726 -Node: Internal File Description928416 -Node: Internal File Ops932131 -Ref: Internal File Ops-Footnote-1936696 -Node: Using Internal File Ops936836 -Node: Future Extensions939214 -Node: Basic Concepts941718 -Node: Basic High Level942475 -Ref: Basic High Level-Footnote-1946510 -Node: Basic Data Typing946695 -Node: Floating Point Issues951220 -Node: String Conversion Precision952303 -Ref: String Conversion Precision-Footnote-1954003 -Node: Unexpected Results954112 -Node: POSIX Floating Point Problems955938 -Ref: POSIX Floating Point Problems-Footnote-1959643 -Node: Glossary959681 -Node: Copying984657 -Node: GNU Free Documentation License1022214 -Node: Index1047351 +Node: Foreword31579 +Node: Preface35924 +Ref: Preface-Footnote-138977 +Ref: Preface-Footnote-239083 +Node: History39315 +Node: Names41706 +Ref: Names-Footnote-143183 +Node: This Manual43255 +Ref: This Manual-Footnote-148159 +Node: Conventions48259 +Node: Manual History50393 +Ref: Manual History-Footnote-153663 +Ref: Manual History-Footnote-253704 +Node: How To Contribute53778 +Node: Acknowledgments54922 +Node: Getting Started59418 +Node: Running gawk61797 +Node: One-shot62983 +Node: Read Terminal64208 +Ref: Read Terminal-Footnote-165858 +Ref: Read Terminal-Footnote-266134 +Node: Long66305 +Node: Executable Scripts67681 +Ref: Executable Scripts-Footnote-169550 +Ref: Executable Scripts-Footnote-269652 +Node: Comments70199 +Node: Quoting72666 +Node: DOS Quoting77289 +Node: Sample Data Files77964 +Node: Very Simple80996 +Node: Two Rules85595 +Node: More Complex87742 +Ref: More Complex-Footnote-190672 +Node: Statements/Lines90757 +Ref: Statements/Lines-Footnote-195219 +Node: Other Features95484 +Node: When96412 +Node: Invoking Gawk98559 +Node: Command Line100020 +Node: Options100803 +Ref: Options-Footnote-1116201 +Node: Other Arguments116226 +Node: Naming Standard Input118884 +Node: Environment Variables119978 +Node: AWKPATH Variable120536 +Ref: AWKPATH Variable-Footnote-1123294 +Node: AWKLIBPATH Variable123554 +Node: Other Environment Variables124151 +Node: Exit Status126646 +Node: Include Files127321 +Node: Loading Shared Libraries130890 +Node: Obsolete132115 +Node: Undocumented132812 +Node: Regexp133055 +Node: Regexp Usage134444 +Node: Escape Sequences136470 +Node: Regexp Operators142233 +Ref: Regexp Operators-Footnote-1149613 +Ref: Regexp Operators-Footnote-2149760 +Node: Bracket Expressions149858 +Ref: table-char-classes151748 +Node: GNU Regexp Operators154271 +Node: Case-sensitivity157994 +Ref: Case-sensitivity-Footnote-1160962 +Ref: Case-sensitivity-Footnote-2161197 +Node: Leftmost Longest161305 +Node: Computed Regexps162506 +Node: Reading Files165916 +Node: Records167919 +Ref: Records-Footnote-1176593 +Node: Fields176630 +Ref: Fields-Footnote-1179663 +Node: Nonconstant Fields179749 +Node: Changing Fields181951 +Node: Field Separators187932 +Node: Default Field Splitting190561 +Node: Regexp Field Splitting191678 +Node: Single Character Fields195020 +Node: Command Line Field Separator196079 +Node: Field Splitting Summary199520 +Ref: Field Splitting Summary-Footnote-1202712 +Node: Constant Size202813 +Node: Splitting By Content207397 +Ref: Splitting By Content-Footnote-1211123 +Node: Multiple Line211163 +Ref: Multiple Line-Footnote-1217010 +Node: Getline217189 +Node: Plain Getline219405 +Node: Getline/Variable221494 +Node: Getline/File222635 +Node: Getline/Variable/File223957 +Ref: Getline/Variable/File-Footnote-1225556 +Node: Getline/Pipe225643 +Node: Getline/Variable/Pipe228203 +Node: Getline/Coprocess229310 +Node: Getline/Variable/Coprocess230553 +Node: Getline Notes231267 +Node: Getline Summary233209 +Ref: table-getline-variants233617 +Node: Read Timeout234473 +Ref: Read Timeout-Footnote-1238218 +Node: Command line directories238275 +Node: Printing238905 +Node: Print240536 +Node: Print Examples241873 +Node: Output Separators244657 +Node: OFMT246417 +Node: Printf247775 +Node: Basic Printf248681 +Node: Control Letters250220 +Node: Format Modifiers254032 +Node: Printf Examples260041 +Node: Redirection262756 +Node: Special Files269740 +Node: Special FD270273 +Ref: Special FD-Footnote-1273898 +Node: Special Network273972 +Node: Special Caveats274822 +Node: Close Files And Pipes275618 +Ref: Close Files And Pipes-Footnote-1282641 +Ref: Close Files And Pipes-Footnote-2282789 +Node: Expressions282939 +Node: Values284071 +Node: Constants284747 +Node: Scalar Constants285427 +Ref: Scalar Constants-Footnote-1286286 +Node: Nondecimal-numbers286468 +Node: Regexp Constants289527 +Node: Using Constant Regexps290002 +Node: Variables293057 +Node: Using Variables293712 +Node: Assignment Options295436 +Node: Conversion297308 +Ref: table-locale-affects302684 +Ref: Conversion-Footnote-1303308 +Node: All Operators303417 +Node: Arithmetic Ops304047 +Node: Concatenation306552 +Ref: Concatenation-Footnote-1309345 +Node: Assignment Ops309465 +Ref: table-assign-ops314453 +Node: Increment Ops315861 +Node: Truth Values and Conditions319331 +Node: Truth Values320414 +Node: Typing and Comparison321463 +Node: Variable Typing322252 +Ref: Variable Typing-Footnote-1326149 +Node: Comparison Operators326271 +Ref: table-relational-ops326681 +Node: POSIX String Comparison330230 +Ref: POSIX String Comparison-Footnote-1331186 +Node: Boolean Ops331324 +Ref: Boolean Ops-Footnote-1335402 +Node: Conditional Exp335493 +Node: Function Calls337225 +Node: Precedence340819 +Node: Locales344488 +Node: Patterns and Actions345577 +Node: Pattern Overview346631 +Node: Regexp Patterns348300 +Node: Expression Patterns348843 +Node: Ranges352528 +Node: BEGIN/END355494 +Node: Using BEGIN/END356256 +Ref: Using BEGIN/END-Footnote-1358987 +Node: I/O And BEGIN/END359093 +Node: BEGINFILE/ENDFILE361375 +Node: Empty364279 +Node: Using Shell Variables364595 +Node: Action Overview366880 +Node: Statements369237 +Node: If Statement371091 +Node: While Statement372590 +Node: Do Statement374634 +Node: For Statement375790 +Node: Switch Statement378942 +Node: Break Statement381039 +Node: Continue Statement383029 +Node: Next Statement384822 +Node: Nextfile Statement387212 +Node: Exit Statement389757 +Node: Built-in Variables392173 +Node: User-modified393268 +Ref: User-modified-Footnote-1401623 +Node: Auto-set401685 +Ref: Auto-set-Footnote-1411593 +Node: ARGC and ARGV411798 +Node: Arrays415649 +Node: Array Basics417154 +Node: Array Intro417980 +Node: Reference to Elements422298 +Node: Assigning Elements424568 +Node: Array Example425059 +Node: Scanning an Array426791 +Node: Controlling Scanning429105 +Ref: Controlling Scanning-Footnote-1434038 +Node: Delete434354 +Ref: Delete-Footnote-1436789 +Node: Numeric Array Subscripts436846 +Node: Uninitialized Subscripts439029 +Node: Multi-dimensional440657 +Node: Multi-scanning443751 +Node: Arrays of Arrays445342 +Node: Functions449987 +Node: Built-in450809 +Node: Calling Built-in451887 +Node: Numeric Functions453875 +Ref: Numeric Functions-Footnote-1457707 +Ref: Numeric Functions-Footnote-2458064 +Ref: Numeric Functions-Footnote-3458112 +Node: String Functions458381 +Ref: String Functions-Footnote-1481878 +Ref: String Functions-Footnote-2482007 +Ref: String Functions-Footnote-3482255 +Node: Gory Details482342 +Ref: table-sub-escapes484021 +Ref: table-sub-posix-92485375 +Ref: table-sub-proposed486718 +Ref: table-posix-sub488068 +Ref: table-gensub-escapes489614 +Ref: Gory Details-Footnote-1490821 +Ref: Gory Details-Footnote-2490872 +Node: I/O Functions491023 +Ref: I/O Functions-Footnote-1497678 +Node: Time Functions497825 +Ref: Time Functions-Footnote-1508717 +Ref: Time Functions-Footnote-2508785 +Ref: Time Functions-Footnote-3508943 +Ref: Time Functions-Footnote-4509054 +Ref: Time Functions-Footnote-5509166 +Ref: Time Functions-Footnote-6509393 +Node: Bitwise Functions509659 +Ref: table-bitwise-ops510217 +Ref: Bitwise Functions-Footnote-1514438 +Node: Type Functions514622 +Node: I18N Functions515092 +Node: User-defined516719 +Node: Definition Syntax517523 +Ref: Definition Syntax-Footnote-1522433 +Node: Function Example522502 +Node: Function Caveats525096 +Node: Calling A Function525517 +Node: Variable Scope526632 +Node: Pass By Value/Reference528607 +Node: Return Statement532047 +Node: Dynamic Typing535028 +Node: Indirect Calls535763 +Node: Internationalization545448 +Node: I18N and L10N546887 +Node: Explaining gettext547573 +Ref: Explaining gettext-Footnote-1552639 +Ref: Explaining gettext-Footnote-2552823 +Node: Programmer i18n552988 +Node: Translator i18n557188 +Node: String Extraction557981 +Ref: String Extraction-Footnote-1558942 +Node: Printf Ordering559028 +Ref: Printf Ordering-Footnote-1561812 +Node: I18N Portability561876 +Ref: I18N Portability-Footnote-1564325 +Node: I18N Example564388 +Ref: I18N Example-Footnote-1567023 +Node: Gawk I18N567095 +Node: Arbitrary Precision Arithmetic567712 +Ref: Arbitrary Precision Arithmetic-Footnote-1570464 +Node: Floating-point Programming570612 +Node: Floating-point Representation575882 +Node: Floating-point Context576986 +Ref: table-ieee-formats577821 +Node: Rounding Mode579191 +Ref: table-rounding-modes579818 +Ref: Rounding Mode-Footnote-1582941 +Node: Arbitrary Precision Floats583122 +Ref: Arbitrary Precision Floats-Footnote-1585163 +Node: Setting Precision585474 +Node: Setting Rounding Mode588232 +Node: Floating-point Constants589149 +Node: Changing Precision590568 +Ref: Changing Precision-Footnote-1591968 +Node: Exact Arithmetic592141 +Node: Integer Programming595154 +Node: Arbitrary Precision Integers596934 +Ref: Arbitrary Precision Integers-Footnote-1599958 +Node: MPFR and GMP Libraries600104 +Node: Advanced Features600489 +Node: Nondecimal Data602012 +Node: Array Sorting603595 +Node: Controlling Array Traversal604292 +Node: Array Sorting Functions612529 +Ref: Array Sorting Functions-Footnote-1616203 +Ref: Array Sorting Functions-Footnote-2616296 +Node: Two-way I/O616490 +Ref: Two-way I/O-Footnote-1621922 +Node: TCP/IP Networking621992 +Node: Profiling624836 +Node: Library Functions632290 +Ref: Library Functions-Footnote-1635297 +Node: Library Names635468 +Ref: Library Names-Footnote-1638939 +Ref: Library Names-Footnote-2639159 +Node: General Functions639245 +Node: Strtonum Function640198 +Node: Assert Function643128 +Node: Round Function646454 +Node: Cliff Random Function647997 +Node: Ordinal Functions649013 +Ref: Ordinal Functions-Footnote-1652083 +Ref: Ordinal Functions-Footnote-2652335 +Node: Join Function652544 +Ref: Join Function-Footnote-1654315 +Node: Getlocaltime Function654515 +Node: Data File Management658230 +Node: Filetrans Function658862 +Node: Rewind Function663001 +Node: File Checking664388 +Node: Empty Files665482 +Node: Ignoring Assigns667712 +Node: Getopt Function669265 +Ref: Getopt Function-Footnote-1680569 +Node: Passwd Functions680772 +Ref: Passwd Functions-Footnote-1689747 +Node: Group Functions689835 +Node: Walking Arrays697919 +Node: Sample Programs699488 +Node: Running Examples700153 +Node: Clones700881 +Node: Cut Program702105 +Node: Egrep Program711950 +Ref: Egrep Program-Footnote-1719723 +Node: Id Program719833 +Node: Split Program723449 +Ref: Split Program-Footnote-1726968 +Node: Tee Program727096 +Node: Uniq Program729899 +Node: Wc Program737328 +Ref: Wc Program-Footnote-1741594 +Ref: Wc Program-Footnote-2741794 +Node: Miscellaneous Programs741886 +Node: Dupword Program743074 +Node: Alarm Program745105 +Node: Translate Program749854 +Ref: Translate Program-Footnote-1754241 +Ref: Translate Program-Footnote-2754469 +Node: Labels Program754603 +Ref: Labels Program-Footnote-1757974 +Node: Word Sorting758058 +Node: History Sorting761942 +Node: Extract Program763781 +Ref: Extract Program-Footnote-1771264 +Node: Simple Sed771392 +Node: Igawk Program774454 +Ref: Igawk Program-Footnote-1789611 +Ref: Igawk Program-Footnote-2789812 +Node: Anagram Program789950 +Node: Signature Program793018 +Node: Debugger794118 +Node: Debugging795072 +Node: Debugging Concepts795505 +Node: Debugging Terms797361 +Node: Awk Debugging799958 +Node: Sample Debugging Session800850 +Node: Debugger Invocation801370 +Node: Finding The Bug802699 +Node: List of Debugger Commands809187 +Node: Breakpoint Control810521 +Node: Debugger Execution Control814185 +Node: Viewing And Changing Data817545 +Node: Execution Stack820901 +Node: Debugger Info822368 +Node: Miscellaneous Debugger Commands826349 +Node: Readline Support831794 +Node: Limitations832625 +Node: Dynamic Extensions834877 +Node: Plugin License835773 +Node: Sample Library836387 +Node: Internal File Description837071 +Node: Internal File Ops840784 +Ref: Internal File Ops-Footnote-1845347 +Node: Using Internal File Ops845487 +Node: Language History847863 +Node: V7/SVR3.1849385 +Node: SVR4851706 +Node: POSIX853148 +Node: BTL854156 +Node: POSIX/GNU854890 +Node: Common Extensions860146 +Node: Ranges and Locales861253 +Ref: Ranges and Locales-Footnote-1865857 +Node: Contributors866078 +Node: Installation870374 +Node: Gawk Distribution871268 +Node: Getting871752 +Node: Extracting872578 +Node: Distribution contents874270 +Node: Unix Installation879492 +Node: Quick Installation880109 +Node: Additional Configuration Options882071 +Node: Configuration Philosophy883548 +Node: Non-Unix Installation885890 +Node: PC Installation886348 +Node: PC Binary Installation887647 +Node: PC Compiling889495 +Node: PC Testing892439 +Node: PC Using893615 +Node: Cygwin897800 +Node: MSYS898800 +Node: VMS Installation899314 +Node: VMS Compilation899917 +Ref: VMS Compilation-Footnote-1900924 +Node: VMS Installation Details900982 +Node: VMS Running902617 +Node: VMS Old Gawk904224 +Node: Bugs904698 +Node: Other Versions908550 +Node: Notes913865 +Node: Compatibility Mode914452 +Node: Additions915235 +Node: Accessing The Source916046 +Node: Adding Code917471 +Node: New Ports923479 +Node: Future Extensions927592 +Node: Basic Concepts929079 +Node: Basic High Level929836 +Ref: Basic High Level-Footnote-1933871 +Node: Basic Data Typing934056 +Node: Floating Point Issues938581 +Node: String Conversion Precision939664 +Ref: String Conversion Precision-Footnote-1941364 +Node: Unexpected Results941473 +Node: POSIX Floating Point Problems943299 +Ref: POSIX Floating Point Problems-Footnote-1947004 +Node: Glossary947042 +Node: Copying972018 +Node: GNU Free Documentation License1009575 +Node: Index1034712 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 12b77556..ceea9a92 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -295,12 +295,14 @@ particular records in a file and perform operations upon them. * Sample Programs:: Many @command{awk} programs with complete explanations. * Debugger:: The @code{gawk} debugger. +* Dynamic Extensions:: Adding new built-in functions to + @command{gawk}. * Language History:: The evolution of the @command{awk} language. * Installation:: Installing @command{gawk} under various operating systems. -* Notes:: Notes about @command{gawk} extensions and - possible future work. +* Notes:: Notes about adding things to @command{gawk} + and possible future work. * Basic Concepts:: A very quick introduction to programming concepts. * Glossary:: An explanation of some unfamiliar terms. @@ -558,21 +560,22 @@ particular records in a file and perform operations upon them. * I18N Portability:: @command{awk}-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: @command{gawk} is also internationalized. -* Floating-point Programming:: Effective floating-point programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Arbitrary Precision Floats:: Arbitrary precision floating-point - arithmetic with @command{gawk}. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point numbers. -* Integer Programming:: Effective integer programming. -* Arbitrary Precision Integers:: Arbitrary precision integer - arithmetic with @command{gawk}. -* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point + Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. +* MPFR and GMP Libraries :: * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -637,14 +640,14 @@ particular records in a file and perform operations upon them. * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. -* Debugging:: Introduction to @command{gawk} Debugger. +* Debugging:: Introduction to @command{gawk} debugger. * Debugging Concepts:: Debugging in General. * Debugging Terms:: Additional Debugging Concepts. * Awk Debugging:: Awk Debugging. -* Sample Debugging Session:: Sample Debugging Session. +* Sample Debugging Session:: Sample debugging session. * Debugger Invocation:: How to Start the Debugger. * Finding The Bug:: Finding the Bug. -* List of Debugger Commands:: Main Commands. +* List of Debugger Commands:: Main debugger commands. * Breakpoint Control:: Control of Breakpoints. * Debugger Execution Control:: Control of Execution. * Viewing And Changing Data:: Viewing and Changing Data. @@ -652,8 +655,13 @@ particular records in a file and perform operations upon them. * Debugger Info:: Obtaining Information about the Program and the Debugger State. * Miscellaneous Debugger Commands:: Miscellaneous Commands. -* Readline Support:: Readline Support. -* Limitations:: Limitations and Future Plans. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. * SVR4:: Minor changes between System V Releases 3.1 @@ -704,16 +712,6 @@ particular records in a file and perform operations upon them. @command{gawk}. * New Ports:: Porting @command{gawk} to a new operating system. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. -* Internals:: A brief look at some @command{gawk} - internals. -* Plugin License:: A note about licensing. -* Loading Extensions:: How to load dynamic extensions. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. * Future Extensions:: New features that may be implemented one day. * Basic High Level:: The high level view. @@ -1206,8 +1204,7 @@ available @command{awk} implementations. @ref{Notes}, describes how to disable @command{gawk}'s extensions, as well as how to contribute new code to @command{gawk}, -how to write extension libraries, and some possible -future directions for @command{gawk} development. +and some possible future directions for @command{gawk} development. @ref{Basic Concepts}, provides some very cursory background material for those who @@ -3616,8 +3613,8 @@ behaves. @menu * AWKPATH Variable:: Searching directories for @command{awk} programs. -* AWKLIBPATH Variable:: Searching directories for @command{awk} - shared libraries. +* AWKLIBPATH Variable:: Searching directories for @command{awk} shared + libraries. * Other Environment Variables:: The environment variables. @end menu @@ -5263,7 +5260,6 @@ used with it do not have to be named on the @command{awk} command line * Getline:: Reading files under explicit program control using the @code{getline} function. * Read Timeout:: Reading input with a timeout. - * Command line directories:: What happens if you put a directory on the command line. @end menu @@ -11565,9 +11561,9 @@ fatal error. @item If you have written extensions that modify the record handling (by inserting -an ``open hook''), you can invoke them at this point, before @command{gawk} +an ``input parser''), you can invoke them at this point, before @command{gawk} has started processing the file. (This is a @emph{very} advanced feature, -currently used only by the @uref{http://xmlgawk.sourceforge.net, XMLgawk project}.) +currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.) @end itemize The @code{ENDFILE} rule is called when @command{gawk} has finished processing @@ -18508,21 +18504,22 @@ in general, and the limitations of doing arithmetic with ordinary @command{gawk} numbers. @menu -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary Floating-point Representation. -* Floating-point Context:: Floating-point Context. -* Rounding Mode:: Floating-point Rounding Mode. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with @command{gawk}. -* Setting Precision:: Setting the Working Precision. -* Setting Rounding Mode:: Setting the Rounding Mode. -* Floating-point Constants:: Representing Floating-point Constants. -* Changing Precision:: Changing the Precision of a Number. -* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers. -* Integer Programming:: Effective Integer Programming. -* Arbitrary Precision Integers:: Arbitrary Precision Integer - Arithmetic with @command{gawk}. -* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point + Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. +* MPFR and GMP Libraries :: @end menu @node Floating-point Programming @@ -27530,6 +27527,471 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! +@node Dynamic Extensions +@chapter Writing Extensions for @command{gawk} + +This chapter is a placeholder, pending a rewrite for the new API. +Some of the old bits remain, since they can be partially reused. + + +@c STARTOFRANGE gladfgaw +@cindex @command{gawk}, functions, adding +@c STARTOFRANGE adfugaw +@cindex adding, functions to @command{gawk} +@c STARTOFRANGE fubadgaw +@cindex functions, built-in, adding to @command{gawk} +It is possible to add new built-in +functions to @command{gawk} using dynamically loaded libraries. This +facility is available on systems (such as GNU/Linux) that support +the C @code{dlopen()} and @code{dlsym()} functions. +This @value{CHAPTER} describes how to write and use dynamically +loaded extensions for @command{gawk}. +Experience with programming in +C or C++ is necessary when reading this @value{SECTION}. + +@quotation NOTE +When @option{--sandbox} is specified, extensions are disabled +(@pxref{Options}. +@end quotation + +@menu +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +@end menu + +@node Plugin License +@section Extension Licensing + +Every dynamic extension should define the global symbol +@code{plugin_is_GPL_compatible} to assert that it has been licensed under +a GPL-compatible license. If this symbol does not exist, @command{gawk} +will emit a fatal error and exit. + +The declared type of the symbol should be @code{int}. It does not need +to be in any allocated section, though. The code merely asserts that +the symbol exists in the global scope. Something like this is enough: + +@example +int plugin_is_GPL_compatible; +@end example + +@node Sample Library +@section Example: Directory and File Operation Built-ins +@c STARTOFRANGE chdirg +@cindex @code{chdir()} function@comma{} implementing in @command{gawk} +@c STARTOFRANGE statg +@cindex @code{stat()} function@comma{} implementing in @command{gawk} +@c STARTOFRANGE filre +@cindex files, information about@comma{} retrieving +@c STARTOFRANGE dirch +@cindex directories, changing + +Two useful functions that are not in @command{awk} are @code{chdir()} +(so that an @command{awk} program can change its directory) and +@code{stat()} (so that an @command{awk} program can gather information about +a file). +This @value{SECTION} implements these functions for @command{gawk} in an +external extension library. + +@menu +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +@end menu + +@node Internal File Description +@subsection Using @code{chdir()} and @code{stat()} + +This @value{SECTION} shows how to use the new functions at the @command{awk} +level once they've been integrated into the running @command{gawk} +interpreter. +Using @code{chdir()} is very straightforward. It takes one argument, +the new directory to change to: + +@example +@dots{} +newdir = "/home/arnold/funstuff" +ret = chdir(newdir) +if (ret < 0) @{ + printf("could not change to %s: %s\n", + newdir, ERRNO) > "/dev/stderr" + exit 1 +@} +@dots{} +@end example + +The return value is negative if the @code{chdir} failed, +and @code{ERRNO} +(@pxref{Built-in Variables}) +is set to a string indicating the error. + +Using @code{stat()} is a bit more complicated. +The C @code{stat()} function fills in a structure that has a fair +amount of information. +The right way to model this in @command{awk} is to fill in an associative +array with the appropriate information: + +@c broke printf for page breaking +@example +file = "/home/arnold/.profile" +fdata[1] = "x" # force `fdata' to be an array +ret = stat(file, fdata) +if (ret < 0) @{ + printf("could not stat %s: %s\n", + file, ERRNO) > "/dev/stderr" + exit 1 +@} +printf("size of %s is %d bytes\n", file, fdata["size"]) +@end example + +The @code{stat()} function always clears the data array, even if +the @code{stat()} fails. It fills in the following elements: + +@table @code +@item "name" +The name of the file that was @code{stat()}'ed. + +@item "dev" +@itemx "ino" +The file's device and inode numbers, respectively. + +@item "mode" +The file's mode, as a numeric value. This includes both the file's +type and its permissions. + +@item "nlink" +The number of hard links (directory entries) the file has. + +@item "uid" +@itemx "gid" +The numeric user and group ID numbers of the file's owner. + +@item "size" +The size in bytes of the file. + +@item "blocks" +The number of disk blocks the file actually occupies. This may not +be a function of the file's size if the file has holes. + +@item "atime" +@itemx "mtime" +@itemx "ctime" +The file's last access, modification, and inode update times, +respectively. These are numeric timestamps, suitable for formatting +with @code{strftime()} +(@pxref{Built-in}). + +@item "pmode" +The file's ``printable mode.'' This is a string representation of +the file's type and permissions, such as what is produced by +@samp{ls -l}---for example, @code{"drwxr-xr-x"}. + +@item "type" +A printable string representation of the file's type. The value +is one of the following: + +@table @code +@item "blockdev" +@itemx "chardev" +The file is a block or character device (``special file''). + +@ignore +@item "door" +The file is a Solaris ``door'' (special file used for +interprocess communications). +@end ignore + +@item "directory" +The file is a directory. + +@item "fifo" +The file is a named-pipe (also known as a FIFO). + +@item "file" +The file is just a regular file. + +@item "socket" +The file is an @code{AF_UNIX} (``Unix domain'') socket in the +filesystem. + +@item "symlink" +The file is a symbolic link. +@end table +@end table + +Several additional elements may be present depending upon the operating +system and the type of the file. You can test for them in your @command{awk} +program by using the @code{in} operator +(@pxref{Reference to Elements}): + +@table @code +@item "blksize" +The preferred block size for I/O to the file. This field is not +present on all POSIX-like systems in the C @code{stat} structure. + +@item "linkval" +If the file is a symbolic link, this element is the name of the +file the link points to (i.e., the value of the link). + +@item "rdev" +@itemx "major" +@itemx "minor" +If the file is a block or character device file, then these values +represent the numeric device number and the major and minor components +of that number, respectively. +@end table + +@node Internal File Ops +@subsection C Code for @code{chdir()} and @code{stat()} + +Here is the C code for these extensions. They were written for +GNU/Linux. The code needs some more work for complete portability +to other POSIX-compliant systems:@footnote{This version is edited +slightly for presentation. See +@file{extension/filefuncs.c} in the @command{gawk} distribution +for the complete version.} + +@c break line for page breaking +@example +#include "awk.h" + +#include <sys/sysmacros.h> + +int plugin_is_GPL_compatible; + +/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ + +static NODE * +do_chdir(int nargs) +@{ + NODE *newdir; + int ret = -1; + + if (do_lint && nargs != 1) + lintwarn("chdir: called with incorrect number of arguments"); + + newdir = get_scalar_argument(0, FALSE); +@end example + +The file includes the @code{"awk.h"} header file for definitions +for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} +for access to the @code{major()} and @code{minor}() macros. + +@cindex programming conventions, @command{gawk} internals +By convention, for an @command{awk} function @code{foo}, the function that +implements it is called @samp{do_foo}. The function should take +a @samp{int} argument, usually called @code{nargs}, that +represents the number of defined arguments for the function. The @code{newdir} +variable represents the new directory to change to, retrieved +with @code{get_scalar_argument()}. Note that the first argument is +numbered zero. + +This code actually accomplishes the @code{chdir()}. It first forces +the argument to be a string and passes the string value to the +@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} +is updated. + +@example + (void) force_string(newdir); + ret = chdir(newdir->stptr); + if (ret < 0) + update_ERRNO_int(errno); +@end example + +Finally, the function returns the return value to the @command{awk} level: + +@example + return make_number((AWKNUM) ret); +@} +@end example + +The @code{stat()} built-in is more involved. First comes a function +that turns a numeric mode into a printable representation +(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: + +@c break line for page breaking +@example +/* format_mode --- turn a stat mode field into something readable */ + +static char * +format_mode(unsigned long fmode) +@{ + @dots{} +@} +@end example + +Next comes the @code{do_stat()} function. It starts with +variable declarations and argument checking: + +@ignore +Changed message for page breaking. Used to be: + "stat: called with incorrect number of arguments (%d), should be 2", +@end ignore +@example +/* do_stat --- provide a stat() function for gawk */ + +static NODE * +do_stat(int nargs) +@{ + NODE *file, *array, *tmp; + struct stat sbuf; + int ret; + NODE **aptr; + char *pmode; /* printable mode */ + char *type = "unknown"; + + if (do_lint && nargs > 2) + lintwarn("stat: called with too many arguments"); +@end example + +Then comes the actual work. First, the function gets the arguments. +Then, it always clears the array. +The code use @code{lstat()} (instead of @code{stat()}) +to get the file information, +in case the file is a symbolic link. +If there's an error, it sets @code{ERRNO} and returns: + +@c comment made multiline for page breaking +@example + /* file is first arg, array to hold results is second */ + file = get_scalar_argument(0, FALSE); + array = get_array_argument(1, FALSE); + + /* empty out the array */ + assoc_clear(array); + + /* lstat the file, if error, set ERRNO and return */ + (void) force_string(file); + ret = lstat(file->stptr, & sbuf); + if (ret < 0) @{ + update_ERRNO_int(errno); + return make_number((AWKNUM) ret); + @} +@end example + +Now comes the tedious part: filling in the array. Only a few of the +calls are shown here, since they all follow the same pattern: + +@example + /* fill in the array */ + aptr = assoc_lookup(array, tmp = make_string("name", 4)); + *aptr = dupnode(file); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("mode", 4)); + *aptr = make_number((AWKNUM) sbuf.st_mode); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); + pmode = format_mode(sbuf.st_mode); + *aptr = make_string(pmode, strlen(pmode)); + unref(tmp); +@end example + +When done, return the @code{lstat()} return value: + +@example + + return make_number((AWKNUM) ret); +@} +@end example + +@cindex programming conventions, @command{gawk} internals +Finally, it's necessary to provide the ``glue'' that loads the +new function(s) into @command{gawk}. By convention, each library has +a routine named @code{dl_load()} that does the job. The simplest way +is to use the @code{dl_load_func} macro in @code{gawkapi.h}. + +And that's it! As an exercise, consider adding functions to +implement system calls such as @code{chown()}, @code{chmod()}, +and @code{umask()}. + +@node Using Internal File Ops +@subsection Integrating the Extensions + +@cindex @command{gawk}, interpreter@comma{} adding code to +Now that the code is written, it must be possible to add it at +runtime to the running @command{gawk} interpreter. First, the +code must be compiled. Assuming that the functions are in +a file named @file{filefuncs.c}, and @var{idir} is the location +of the @command{gawk} include files, +the following steps create +a GNU/Linux shared library: + +@example +$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} +$ @kbd{ld -o filefuncs.so -shared filefuncs.o} +@end example + +@cindex @code{extension()} function (@command{gawk}) +Once the library exists, it is loaded by calling the @code{extension()} +built-in function. +This function takes two arguments: the name of the +library to load and the name of a function to call when the library +is first loaded. This function adds the new functions to @command{gawk}. +It returns the value returned by the initialization function +within the shared library: + +@example +# file testff.awk +BEGIN @{ + extension("./filefuncs.so", "dl_load") + + chdir(".") # no-op + + data[1] = 1 # force `data' to be an array + print "Info for testff.awk" + ret = stat("testff.awk", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "testff.awk modified:", + strftime("%m %d %y %H:%M:%S", data["mtime"]) + + print "\nInfo for JUNK" + ret = stat("JUNK", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) +@} +@end example + +Here are the results of running the program: + +@example +$ @kbd{gawk -f testff.awk} +@print{} Info for testff.awk +@print{} ret = 0 +@print{} data["size"] = 607 +@print{} data["ino"] = 14945891 +@print{} data["name"] = testff.awk +@print{} data["pmode"] = -rw-rw-r-- +@print{} data["nlink"] = 1 +@print{} data["atime"] = 1293993369 +@print{} data["mtime"] = 1288520752 +@print{} data["mode"] = 33204 +@print{} data["blksize"] = 4096 +@print{} data["dev"] = 2054 +@print{} data["type"] = file +@print{} data["gid"] = 500 +@print{} data["uid"] = 500 +@print{} data["blocks"] = 8 +@print{} data["ctime"] = 1290113572 +@print{} testff.awk modified: 10 31 10 12:25:52 +@print{} +@print{} Info for JUNK +@print{} ret = -1 +@print{} JUNK modified: 01 01 70 02:00:00 +@end example +@c ENDOFRANGE filre +@c ENDOFRANGE dirch +@c ENDOFRANGE statg +@c ENDOFRANGE chdirg +@c ENDOFRANGE gladfgaw +@c ENDOFRANGE adfugaw +@c ENDOFRANGE fubadgaw + @ignore @c Try this @iftex @@ -28010,11 +28472,6 @@ functions for internationalization (@pxref{Programmer i18n}). @item -The @code{extension()} built-in function and the ability to add -new functions dynamically -(@pxref{Dynamic Extensions}). - -@item The @code{fflush()} function from Brian Kernighan's version of @command{awk} (@pxref{I/O Functions}). @@ -28048,15 +28505,21 @@ the @option{-l} command-line option @item The ability to use GNU-style long-named options that start with @option{--} and the +@option{--bignum}, @option{--characters-as-bytes}, -@option{--compat}, +@option{--copyright}, +@option{--debug}, @option{--dump-variables}, @option{--exec}, @option{--gen-pot}, +@option{--include}, @option{--lint}, @option{--lint-old}, +@option{--load}, @option{--non-decimal-data}, +@option{--optimize}, @option{--posix}, +@option{--pretty-print}, @option{--profile}, @option{--re-interval}, @option{--sandbox}, @@ -28374,6 +28837,7 @@ the various PC platforms. Christos Zoulas provided the @code{extension()} built-in function for dynamically adding new modules. +(This was removed at @command{gawk} 4.1.) @item @cindex Kahrs, J@"urgen @@ -29802,8 +30266,6 @@ maintainers of @command{gawk}. Everything in it applies specifically to * Compatibility Mode:: How to disable certain @command{gawk} extensions. * Additions:: Making Additions To @command{gawk}. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. * Future Extensions:: New features that may be implemented one day. @end menu @@ -30039,8 +30501,9 @@ You will also have to sign paperwork for your documentation changes. Submit changes as unified diffs. Use @samp{diff -u -r -N} to compare the original @command{gawk} source tree with your version. -I recommend using the GNU version of @command{diff}. -Send the output produced by either run of @command{diff} to me when you +I recommend using the GNU version of @command{diff}, or best of all, +@samp{git diff} or @samp{git format-patch}. +Send the output produced by @command{diff} to me when you submit your changes. (@xref{Bugs}, for the electronic mail information.) @@ -30166,838 +30629,6 @@ operating systems' code that is already there. In the code that you supply and maintain, feel free to use a coding style and brace layout that suits your taste. -@node Dynamic Extensions -@appendixsec Adding New Built-in Functions to @command{gawk} -@cindex Robinson, Will -@cindex robot, the -@cindex Lost In Space -@quotation -@i{Danger Will Robinson! Danger!!@* -Warning! Warning!}@* -The Robot -@end quotation - -@c STARTOFRANGE gladfgaw -@cindex @command{gawk}, functions, adding -@c STARTOFRANGE adfugaw -@cindex adding, functions to @command{gawk} -@c STARTOFRANGE fubadgaw -@cindex functions, built-in, adding to @command{gawk} -It is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the C @code{dlopen()} and @code{dlsym()} functions. -This @value{SECTION} describes how to write and use dynamically -loaded extensions for @command{gawk}. -Experience with programming in -C or C++ is necessary when reading this @value{SECTION}. - -@quotation CAUTION -The facilities described in this @value{SECTION} -are very much subject to change in a future @command{gawk} release. -Be aware that you may have to re-do everything, -at some future time. - -If you have written your own dynamic extensions, -be sure to recompile them for each new @command{gawk} release. -There is no guarantee of binary compatibility between different -releases, nor will there ever be such a guarantee. -@end quotation - -@quotation NOTE -When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}. -@end quotation - -@menu -* Internals:: A brief look at some @command{gawk} internals. -* Plugin License:: A note about licensing. -* Loading Extensions:: How to load dynamic extensions. -* Sample Library:: A example of new functions. -@end menu - -@node Internals -@appendixsubsec A Minimal Introduction to @command{gawk} Internals -@c STARTOFRANGE gawint -@cindex @command{gawk}, internals - -The truth is that @command{gawk} was not designed for simple extensibility. -The facilities for adding functions using shared libraries work, but -are something of a ``bag on the side.'' Thus, this tour is -brief and simplistic; would-be @command{gawk} hackers are encouraged to -spend some time reading the source code before trying to write -extensions based on the material presented here. Of particular note -are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}. -Reading @file{awkgram.y} in order to see how the parse tree is built -would also be of use. - -@cindex @code{awk.h} file (internal) -With the disclaimers out of the way, the following types, structure -members, functions, and macros are declared in @file{awk.h} and are of -use when writing extensions. The next @value{SECTION} -shows how they are used: - -@table @code -@cindex floating-point, numbers, @code{AWKNUM} internal type -@cindex numbers, floating-point, @code{AWKNUM} internal type -@cindex @code{AWKNUM} internal type -@cindex internal type, @code{AWKNUM} -@item AWKNUM -An @code{AWKNUM} is the internal type of @command{awk} -floating-point numbers. Typically, it is a C @code{double}. - -@cindex @code{NODE} internal type -@cindex internal type, @code{NODE} -@cindex strings, @code{NODE} internal type -@cindex numbers, @code{NODE} internal type -@item NODE -Just about everything is done using objects of type @code{NODE}. -These contain both strings and numbers, as well as variables and arrays. - -@cindex @code{force_number()} internal function -@cindex internal function, @code{force_number()} -@cindex numeric, values -@item AWKNUM force_number(NODE *n) -This macro forces a value to be numeric. It returns the actual -numeric value contained in the node. -It may end up calling an internal @command{gawk} function. - -@cindex @code{force_string()} internal function -@cindex internal function, @code{force_string()} -@item void force_string(NODE *n) -This macro guarantees that a @code{NODE}'s string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the string is zero-terminated. - -@cindex @code{force_wstring()} internal function -@cindex internal function, @code{force_wstring()} -@item void force_wstring(NODE *n) -Similarly, this -macro guarantees that a @code{NODE}'s wide-string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the wide string is zero-terminated. - -@cindex parameters@comma{} number of -@cindex @code{nargs} internal variable -@cindex internal variable, @code{nargs} -@item nargs -Inside an extension function, this is the actual number of -parameters passed to the current function. - -@cindex @code{stptr} internal variable -@cindex internal variable, @code{stptr} -@cindex @code{stlen} internal variable -@cindex internal variable, @code{stlen} -@item n->stptr -@itemx n->stlen -The data and length of a @code{NODE}'s string value, respectively. -The string is @emph{not} guaranteed to be zero-terminated. -If you need to pass the string value to a C library function, save -the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it, -call the routine, and then restore the value. - -@cindex @code{wstptr} internal variable -@cindex internal variable, @code{wstptr} -@cindex @code{wstlen} internal variable -@cindex internal variable, @code{wstlen} -@item n->wstptr -@itemx n->wstlen -The data and length of a @code{NODE}'s wide-string value, respectively. -Use @code{force_wstring()} to make sure these values are current. - -@cindex @code{type} internal variable -@cindex internal variable, @code{type} -@item n->type -The type of the @code{NODE}. This is a C @code{enum}. Values should -be one of @code{Node_var}, @code{Node_var_new}, or @code{Node_var_array} -for function parameters. - -@cindex @code{vname} internal variable -@cindex internal variable, @code{vname} -@item n->vname -The ``variable name'' of a node. This is not of much use inside -externally written extensions. - -@cindex arrays, associative, clearing -@cindex @code{assoc_clear()} internal function -@cindex internal function, @code{assoc_clear()} -@item void assoc_clear(NODE *n) -Clears the associative array pointed to by @code{n}. -Make sure that @samp{n->type == Node_var_array} first. - -@cindex arrays, elements, installing -@cindex @code{assoc_lookup()} internal function -@cindex internal function, @code{assoc_lookup()} -@item NODE **assoc_lookup(NODE *symbol, NODE *subs) -Finds, and installs if necessary, array elements. -@code{symbol} is the array, @code{subs} is the subscript. -This is usually a value created with @code{make_string()} (see below). - -@cindex strings -@cindex @code{make_string()} internal function -@cindex internal function, @code{make_string()} -@item NODE *make_string(char *s, size_t len) -Take a C string and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - -@cindex numbers -@cindex @code{make_number()} internal function -@cindex internal function, @code{make_number()} -@item NODE *make_number(AWKNUM val) -Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - - -@cindex nodes@comma{} duplicating -@cindex @code{dupnode()} internal function -@cindex internal function, @code{dupnode()} -@item NODE *dupnode(NODE *n) -Duplicate a node. In most cases, this increments an internal -reference count instead of actually duplicating the entire @code{NODE}; -understanding of @command{gawk} memory management is helpful. - -@cindex memory, releasing -@cindex @code{unref()} internal function -@cindex internal function, @code{unref()} -@item void unref(NODE *n) -This macro releases the memory associated with a @code{NODE} -allocated with @code{make_string()} or @code{make_number()}. -Understanding of @command{gawk} memory management is helpful. - -@cindex @code{make_builtin()} internal function -@cindex internal function, @code{make_builtin()} -@item void make_builtin(const char *name, NODE *(*func)(NODE *), int count) -Register a C function pointed to by @code{func} as new built-in -function @code{name}. @code{name} is a regular C string. @code{count} -is the maximum number of arguments that the function takes. -The function should be written in the following manner: - -@example -/* do_xxx --- do xxx function for gawk */ - -NODE * -do_xxx(int nargs) -@{ - @dots{} -@} -@end example - -@cindex arguments, retrieving -@cindex @code{get_argument()} internal function -@cindex internal function, @code{get_argument()} -@item NODE *get_argument(int i) -This function is called from within a C extension function to get -the @code{i}-th argument from the function call. -The first argument is argument zero. - -@cindex @code{get_actual_argument()} internal function -@cindex internal function, @code{get_actual_argument()} -@item NODE *get_actual_argument(int i, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray); -This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE} -if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is -@code{TRUE}, the argument need not have been supplied. If it wasn't, the return -value is @code{NULL}. It is a fatal error if @code{optional} is @code{TRUE} but -the argument was not provided. - -@cindex @code{get_scalar_argument()} internal macro -@cindex internal macro, @code{get_scalar_argument()} -@item get_scalar_argument(i, opt) -This is a convenience macro that calls @code{get_actual_argument()}. - -@cindex @code{get_array_argument()} internal macro -@cindex internal macro, @code{get_array_argument()} -@item get_array_argument(i, opt) -This is a convenience macro that calls @code{get_actual_argument()}. - -@cindex functions, return values@comma{} setting - -@cindex @code{ERRNO} variable -@cindex @code{update_ERRNO_int()} internal function -@cindex internal function, @code{update_ERRNO_int()} -@item void update_ERRNO_int(int errno_saved) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable, based on the error -value provided as the argument. -It is provided as a convenience. - -@cindex @code{ERRNO} variable -@cindex @code{update_ERRNO_string()} internal function -@cindex internal function, @code{update_ERRNO_string()} -@item void update_ERRNO_string(const char *string, enum errno_translate) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable to a given string. -The second argument determines whether the string is translated before being -installed into @code{ERRNO}. It is provided as a convenience. - -@cindex @code{ERRNO} variable -@cindex @code{unset_ERRNO()} internal function -@cindex internal function, @code{unset_ERRNO()} -@item void unset_ERRNO(void) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable to a null string. -It is provided as a convenience. - -@cindex @code{ENVIRON} array -@cindex @code{PROCINFO} array -@cindex @code{register_deferred_variable()} internal function -@cindex internal function, @code{register_deferred_variable()} -@item void register_deferred_variable(const char *name, NODE *(*load_func)(void)) -This function is called to register a function to be called when a -reference to an undefined variable with the given name is encountered. -The callback function will never be called if the variable exists already, -so, unless the calling code is running at program startup, it should first -check whether a variable of the given name already exists. -The argument function must return a pointer to a @code{NODE} containing the -newly created variable. This function is used to implement the builtin -@code{ENVIRON} and @code{PROCINFO} arrays, so you can refer to them -for examples. - -@cindex @code{IOBUF} internal structure -@cindex internal structure, @code{IOBUF} -@cindex @code{iop_alloc()} internal function -@cindex internal function, @code{iop_alloc()} -@cindex @code{get_record()} input method -@cindex @code{close_func}() input method -@cindex @code{INVALID_HANDLE} internal constant -@cindex internal constant, @code{INVALID_HANDLE} -@cindex XML (eXtensible Markup Language) -@cindex eXtensible Markup Language (XML) -@cindex @code{register_open_hook()} internal function -@cindex internal function, @code{register_open_hook()} -@item void register_open_hook(void *(*open_func)(IOBUF *)) -This function is called to register a function to be called whenever -a new data file is opened, leading to the creation of an @code{IOBUF} -structure in @code{iop_alloc()}. After creating the new @code{IOBUF}, -@code{iop_alloc()} will call (in reverse order of registration, so the last -function registered is called first) each open hook until one returns -non-@code{NULL}. If any hook returns a non-@code{NULL} value, that value is assigned -to the @code{IOBUF}'s @code{opaque} field (which will presumably point -to a structure containing additional state associated with the input -processing), and no further open hooks are called. - -The function called will most likely want to set the @code{IOBUF}'s -@code{get_record} method to indicate that future input records should -be retrieved by calling that method instead of using the standard -@command{gawk} input processing. - -And the function will also probably want to set the @code{IOBUF}'s -@code{close_func} method to be called when the file is closed to clean -up any state associated with the input. - -Finally, hook functions should be prepared to receive an @code{IOBUF} -structure where the @code{fd} field is set to @code{INVALID_HANDLE}, -meaning that @command{gawk} was not able to open the file itself. In -this case, the hook function must be able to successfully open the file -and place a valid file descriptor there. - -Currently, for example, the hook function facility is used to implement -the XML parser shared library extension. For more info, please look in -@file{awk.h} and in @file{io.c}. -@end table - -An argument that is supposed to be an array needs to be handled with -some extra code, in case the array being passed in is actually -from a function parameter. - -The following boilerplate code shows how to do this: - -@example -NODE *the_arg; - -/* assume need 3rd arg, 0-based */ -the_arg = get_array_argument(2, FALSE); -@end example - -Again, you should spend time studying the @command{gawk} internals; -don't just blindly copy this code. -@c ENDOFRANGE gawint - -@node Plugin License -@appendixsubsec Extension Licensing - -Every dynamic extension should define the global symbol -@code{plugin_is_GPL_compatible} to assert that it has been licensed under -a GPL-compatible license. If this symbol does not exist, @command{gawk} -will emit a fatal error and exit. - -The declared type of the symbol should be @code{int}. It does not need -to be in any allocated section, though. The code merely asserts that -the symbol exists in the global scope. Something like this is enough: - -@example -int plugin_is_GPL_compatible; -@end example - -@node Loading Extensions -@appendixsubsec Loading a Dynamic Extension -@cindex loading extension -@cindex @command{gawk}, functions, loading -There are two ways to load a dynamically linked library. The first is to use the -builtin @code{extension()}: - -@example -extension(libname, init_func) -@end example - -where @file{libname} is the library to load, and @samp{init_func} is the -name of the initialization or bootstrap routine to run once loaded. - -The second method for dynamic loading of a library is to use the -command line option @option{-l}: - -@example -$ @kbd{gawk -l libname -f myprog} -@end example - -This will work only if the initialization routine is named @code{dl_load()}. - -If you use @code{extension()}, the library will be loaded -at run time. This means that the functions are available only to the rest of -your script. If you use the command line option @option{-l} instead, -the library will be loaded before @command{gawk} starts compiling the -actual program. The net effect is that you can use those functions -anywhere in the program. - -@command{gawk} has a list of directories where it searches for libraries. -By default, the list includes directories that depend upon how gawk was built -and installed (@pxref{AWKLIBPATH Variable}). If you want @command{gawk} -to look for libraries in your private directory, you have to tell it. -The way to do it is to set the @env{AWKLIBPATH} environment variable -(@pxref{AWKLIBPATH Variable}). -@command{gawk} supplies the default shared library platform suffix if it is not -present in the name of the library. -If the name of your library is @file{mylib.so}, you can simply type - -@example -$ @kbd{gawk -l mylib -f myprog} -@end example - -and @command{gawk} will do everything necessary to load in your library, -and then call your @code{dl_load()} routine. - -You can always specify the library using an absolute pathname, in which -case @command{gawk} will not use @env{AWKLIBPATH} to search for it. - -@node Sample Library -@appendixsubsec Example: Directory and File Operation Built-ins -@c STARTOFRANGE chdirg -@cindex @code{chdir()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE statg -@cindex @code{stat()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE filre -@cindex files, information about@comma{} retrieving -@c STARTOFRANGE dirch -@cindex directories, changing - -Two useful functions that are not in @command{awk} are @code{chdir()} -(so that an @command{awk} program can change its directory) and -@code{stat()} (so that an @command{awk} program can gather information about -a file). -This @value{SECTION} implements these functions for @command{gawk} in an -external extension library. - -@menu -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -@end menu - -@node Internal File Description -@appendixsubsubsec Using @code{chdir()} and @code{stat()} - -This @value{SECTION} shows how to use the new functions at the @command{awk} -level once they've been integrated into the running @command{gawk} -interpreter. -Using @code{chdir()} is very straightforward. It takes one argument, -the new directory to change to: - -@example -@dots{} -newdir = "/home/arnold/funstuff" -ret = chdir(newdir) -if (ret < 0) @{ - printf("could not change to %s: %s\n", - newdir, ERRNO) > "/dev/stderr" - exit 1 -@} -@dots{} -@end example - -The return value is negative if the @code{chdir} failed, -and @code{ERRNO} -(@pxref{Built-in Variables}) -is set to a string indicating the error. - -Using @code{stat()} is a bit more complicated. -The C @code{stat()} function fills in a structure that has a fair -amount of information. -The right way to model this in @command{awk} is to fill in an associative -array with the appropriate information: - -@c broke printf for page breaking -@example -file = "/home/arnold/.profile" -fdata[1] = "x" # force `fdata' to be an array -ret = stat(file, fdata) -if (ret < 0) @{ - printf("could not stat %s: %s\n", - file, ERRNO) > "/dev/stderr" - exit 1 -@} -printf("size of %s is %d bytes\n", file, fdata["size"]) -@end example - -The @code{stat()} function always clears the data array, even if -the @code{stat()} fails. It fills in the following elements: - -@table @code -@item "name" -The name of the file that was @code{stat()}'ed. - -@item "dev" -@itemx "ino" -The file's device and inode numbers, respectively. - -@item "mode" -The file's mode, as a numeric value. This includes both the file's -type and its permissions. - -@item "nlink" -The number of hard links (directory entries) the file has. - -@item "uid" -@itemx "gid" -The numeric user and group ID numbers of the file's owner. - -@item "size" -The size in bytes of the file. - -@item "blocks" -The number of disk blocks the file actually occupies. This may not -be a function of the file's size if the file has holes. - -@item "atime" -@itemx "mtime" -@itemx "ctime" -The file's last access, modification, and inode update times, -respectively. These are numeric timestamps, suitable for formatting -with @code{strftime()} -(@pxref{Built-in}). - -@item "pmode" -The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as what is produced by -@samp{ls -l}---for example, @code{"drwxr-xr-x"}. - -@item "type" -A printable string representation of the file's type. The value -is one of the following: - -@table @code -@item "blockdev" -@itemx "chardev" -The file is a block or character device (``special file''). - -@ignore -@item "door" -The file is a Solaris ``door'' (special file used for -interprocess communications). -@end ignore - -@item "directory" -The file is a directory. - -@item "fifo" -The file is a named-pipe (also known as a FIFO). - -@item "file" -The file is just a regular file. - -@item "socket" -The file is an @code{AF_UNIX} (``Unix domain'') socket in the -filesystem. - -@item "symlink" -The file is a symbolic link. -@end table -@end table - -Several additional elements may be present depending upon the operating -system and the type of the file. You can test for them in your @command{awk} -program by using the @code{in} operator -(@pxref{Reference to Elements}): - -@table @code -@item "blksize" -The preferred block size for I/O to the file. This field is not -present on all POSIX-like systems in the C @code{stat} structure. - -@item "linkval" -If the file is a symbolic link, this element is the name of the -file the link points to (i.e., the value of the link). - -@item "rdev" -@itemx "major" -@itemx "minor" -If the file is a block or character device file, then these values -represent the numeric device number and the major and minor components -of that number, respectively. -@end table - -@node Internal File Ops -@appendixsubsubsec C Code for @code{chdir()} and @code{stat()} - -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability -to other POSIX-compliant systems:@footnote{This version is edited -slightly for presentation. See -@file{extension/filefuncs.c} in the @command{gawk} distribution -for the complete version.} - -@c break line for page breaking -@example -#include "awk.h" - -#include <sys/sysmacros.h> - -int plugin_is_GPL_compatible; - -/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - -static NODE * -do_chdir(int nargs) -@{ - NODE *newdir; - int ret = -1; - - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); - - newdir = get_scalar_argument(0, FALSE); -@end example - -The file includes the @code{"awk.h"} header file for definitions -for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} -for access to the @code{major()} and @code{minor}() macros. - -@cindex programming conventions, @command{gawk} internals -By convention, for an @command{awk} function @code{foo}, the function that -implements it is called @samp{do_foo}. The function should take -a @samp{int} argument, usually called @code{nargs}, that -represents the number of defined arguments for the function. The @code{newdir} -variable represents the new directory to change to, retrieved -with @code{get_scalar_argument()}. Note that the first argument is -numbered zero. - -This code actually accomplishes the @code{chdir()}. It first forces -the argument to be a string and passes the string value to the -@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} -is updated. - -@example - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); -@end example - -Finally, the function returns the return value to the @command{awk} level: - -@example - return make_number((AWKNUM) ret); -@} -@end example - -The @code{stat()} built-in is more involved. First comes a function -that turns a numeric mode into a printable representation -(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: - -@c break line for page breaking -@example -/* format_mode --- turn a stat mode field into something readable */ - -static char * -format_mode(unsigned long fmode) -@{ - @dots{} -@} -@end example - -Next comes the @code{do_stat()} function. It starts with -variable declarations and argument checking: - -@ignore -Changed message for page breaking. Used to be: - "stat: called with incorrect number of arguments (%d), should be 2", -@end ignore -@example -/* do_stat --- provide a stat() function for gawk */ - -static NODE * -do_stat(int nargs) -@{ - NODE *file, *array, *tmp; - struct stat sbuf; - int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; - - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); -@end example - -Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. -The code use @code{lstat()} (instead of @code{stat()}) -to get the file information, -in case the file is a symbolic link. -If there's an error, it sets @code{ERRNO} and returns: - -@c comment made multiline for page breaking -@example - /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); - - /* empty out the array */ - assoc_clear(array); - - /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); - if (ret < 0) @{ - update_ERRNO_int(errno); - return make_number((AWKNUM) ret); - @} -@end example - -Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: - -@example - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); -@end example - -When done, return the @code{lstat()} return value: - -@example - - return make_number((AWKNUM) ret); -@} -@end example - -@cindex programming conventions, @command{gawk} internals -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. By convention, each library has -a routine named @code{dl_load()} that does the job. The simplest way -is to use the @code{dl_load_func} macro in @code{gawkapi.h}. - -And that's it! As an exercise, consider adding functions to -implement system calls such as @code{chown()}, @code{chmod()}, -and @code{umask()}. - -@node Using Internal File Ops -@appendixsubsubsec Integrating the Extensions - -@cindex @command{gawk}, interpreter@comma{} adding code to -Now that the code is written, it must be possible to add it at -runtime to the running @command{gawk} interpreter. First, the -code must be compiled. Assuming that the functions are in -a file named @file{filefuncs.c}, and @var{idir} is the location -of the @command{gawk} include files, -the following steps create -a GNU/Linux shared library: - -@example -$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} -$ @kbd{ld -o filefuncs.so -shared filefuncs.o} -@end example - -@cindex @code{extension()} function (@command{gawk}) -Once the library exists, it is loaded by calling the @code{extension()} -built-in function. -This function takes two arguments: the name of the -library to load and the name of a function to call when the library -is first loaded. This function adds the new functions to @command{gawk}. -It returns the value returned by the initialization function -within the shared library: - -@example -# file testff.awk -BEGIN @{ - extension("./filefuncs.so", "dl_load") - - chdir(".") # no-op - - data[1] = 1 # force `data' to be an array - print "Info for testff.awk" - ret = stat("testff.awk", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "testff.awk modified:", - strftime("%m %d %y %H:%M:%S", data["mtime"]) - - print "\nInfo for JUNK" - ret = stat("JUNK", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) -@} -@end example - -Here are the results of running the program: - -@example -$ @kbd{gawk -f testff.awk} -@print{} Info for testff.awk -@print{} ret = 0 -@print{} data["size"] = 607 -@print{} data["ino"] = 14945891 -@print{} data["name"] = testff.awk -@print{} data["pmode"] = -rw-rw-r-- -@print{} data["nlink"] = 1 -@print{} data["atime"] = 1293993369 -@print{} data["mtime"] = 1288520752 -@print{} data["mode"] = 33204 -@print{} data["blksize"] = 4096 -@print{} data["dev"] = 2054 -@print{} data["type"] = file -@print{} data["gid"] = 500 -@print{} data["uid"] = 500 -@print{} data["blocks"] = 8 -@print{} data["ctime"] = 1290113572 -@print{} testff.awk modified: 10 31 10 12:25:52 -@print{} -@print{} Info for JUNK -@print{} ret = -1 -@print{} JUNK modified: 01 01 70 02:00:00 -@end example -@c ENDOFRANGE filre -@c ENDOFRANGE dirch -@c ENDOFRANGE statg -@c ENDOFRANGE chdirg -@c ENDOFRANGE gladfgaw -@c ENDOFRANGE adfugaw -@c ENDOFRANGE fubadgaw - @node Future Extensions @appendixsec Probable Future Extensions @ignore @@ -31055,12 +30686,8 @@ Following is a list of probable future changes visible at the @c these are ordered by likelihood @table @asis -@item Loadable module interface -It is not clear that the @command{awk}-level interface to the -modules facility is as good as it should be. The interface needs to be -redesigned, particularly taking namespace issues into account, as -well as possibly including issues such as library search path order -and versioning. +@item Databases +It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. @item @code{RECLEN} variable for fixed-length records Along with @code{FIELDWIDTHS}, this would speed up the processing of @@ -31068,9 +30695,6 @@ fixed-length records. @code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"}, depending upon which kind of record processing is in effect. -@item Databases -It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. - @item More @code{lint} warnings There are more things that could be checked for portability. @end table @@ -31079,21 +30703,6 @@ Following is a list of probable improvements that will make @command{gawk}'s source code easier to work with: @table @asis -@item Loadable module mechanics -The current extension mechanism works -(@pxref{Dynamic Extensions}), -but is rather primitive. It requires a fair amount of manual work -to create and integrate a loadable module. -Nor is the current mechanism as portable as might be desired. -The GNU @command{libtool} package provides a number of features that -would make using loadable modules much easier. -@command{gawk} should be changed to use @command{libtool}. - -@item Loadable module internals -The API to its internals that @command{gawk} ``exports'' should be revised. -Too many things are needlessly exposed. A new API should be designed -and implemented to make module writing easier. - @item Better array subscript management @command{gawk}'s management of array subscript storage could use revamping, so that using the same value to index multiple arrays only |