.\" cppawk: C preprocessor wrapper around awk .\" Copyright 2022 Kaz Kylheku .\" .\" BSD-2 License .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions are met: .\" .\" 1. Redistributions of source code must retain the above copyright notice, .\" this list of conditions and the following disclaimer. .\" .\" 2. Redistributions in binary form must reproduce the above copyright notice, .\" this list of conditions and the following disclaimer in the documentation .\" and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" .\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE .\" LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .TH CPPAWK 1 "19 April 2022" "Utility Commands" "Awk With C Preprocessing" .SH NAME cppawk \- wrapper for awk, with C preprocessing .SH SYNOPSIS cppawk [cpp, awk and cppawk options] [awk arguments] cppawk --prepro-only [cpp, awk and cppawk options] .SH DESCRIPTION .I cppawk is a shell script which passes awk code through the standalone C preprocessor, and then invokes awk on the preprocessed code. This allows Awk code to be written which uses C preprocessor .B #define macros, .B #include C comments, trigraphs (though perish the thought) and backslash continuation. .I cppawk deliberately has an invocation syntax similar to Awk, and understands certain Awk options such as .B -f and also understands .I cpp options, such as .BI -Dfoo= bar for pre-defining a macro. Just like with .IR awk , code is specified either directly as the first non-option argument, or via the .B -f option which indicates a file. In either situation, .I cppawk preprocesses the code and places the result in a temporary file which is then executed as .I awk code. .SH OPTIONS Any option not described here is assumed to be an Awk option which takes no argument, and is consequently passed through to the .I awk program. .IP "\fB--\fR" End of options: any subsequent argument is the first non-option argument, even if it looks like an option. .IP "\fB--prepro-only\fR" Do not run the preprocessed Awk program; dump the preprocessed code to standard output. .IP "\fB--awk=\fR\fIpath\fR" Specify alternative Awk implementation. If it contains no slashes, then .B PATH is searched to find the program. If the base name of the program is .I gawk or .I mawk, then, respectively, one of the preprocessor symbols .B __gawk__ or .B __mawk__ is predefined, with a value of 1. This happens immediately when this option is processed, so can be counter-acted by a subsequent .B -U option. .IP "\fB\-\-prepro=\fR\fIpath\fR" Specify alternative preprocessor. If it contains no slashes, then .B PATH is searched to find the program. .IP "\fB\-f\fR \fIfilename\fR" Read the awk program from .I filename rather than processing awk code from the first non-option command-line argument. The program is preprocessed to a temporary file, and .I awk is then invoked on this file. The file is deleted when .I awk terminates. .IP "\fB\-E\fR \fIfilename\fR" The .B -E option is inspired by that of GNU Awk; .B cppawk implements a form of this option itself, for all Awk back-ends, and does not pass it through to GNU Awk. This option combines the semantics of the .B -f and .B -- options. Arrangements are made for the awk program to be read from a file exactly as described above for the .B -f option. Then, no more options are processed. Any remaining option-like arguments are ordinary arguments. Note that unlike GNU Awk's .B -E options, .BR cppawk 's .B -E option doesn't suppress the processing of arguments which look like variable assignments. Instead, the program may specify the following preprocessing directive, outside of any Awk block or function: .ft B #include .ft R this directive produces a .B BEGIN clause which prepares an associate array named .B argv that contains the same key/value pairs as the standard .BR ARGV . The .B ARGV array is then deleted. Consequently, Awk will not process and perform the command line variable assignments, which normally occurs after the .B BEGIN clauses are processed. The effects of .B "" are not visible to .B BEGIN clauses which are placed earlier than the inclusion of .BR "" . Those earlier clauses have access to the original .B ARGV array. However, the combination of .B -E option and .B "" is still not equivalent to GNU Awk's .B -E option, because no filename arguments are available for implicit use in the Awk pattern processing loop. .IP "\fB--nobash\fR" Pretend that the shell which executes .I cppawk isn't GNU Bash, even if it is. This has the effect of disabling the use of process substitution in favor of the use of a temporary file. .IP "\fB--dump-macros\fR" Instruct the preprocessor to dump all of the .B #define directives instead of the preprocessed output. Since this is only useful with .B --prepro-only that option is implied. .IP "\fB\-M\fR, \fB\--bignum\fR" These two equivalent GNU Awk options are passed through to .I awk , which will understand them if it is GNU Awk. Using either of them causes the preprocessor symbol .B __bignum__ to be defined with the value 1. .IP "\fB\-P\fR, \fB\--posix\fR" These two equivalent GNU Awk options are passed through to .I awk , which will understand them if it is GNU Awk. Using either of them causes the preprocessor symbol .B __posix__ to be defined with the value 1. .IP "\fB\-M...\fR Any optional argument beginning with .B -M and followed by one or more characters results in a diagnostic message and failed termination. The intent is that the .B -M family of options that are supported by GNU cpp are not supported by .IR cppawk . .IP "\fB-F\fR, \fB-v\fR, \fB-i\fR, \fB-l\fR, \fB-L\fR" These standard and GNU Awk options are recognized by .I cppawk as requiring an argument. They are validated for the presence of the required argument, and passed to .IR awk . .IP "\fB-U...\fR, \fB-D...\fR, \fB-I...\fR, \fB-iquote...\fR" Options which match these patterns are passed to the .I cpp program instead of .IR awk . .SH PREDEFINED SYMBOLS .IP \fB__gawk__\fR When .I cppawk installation is configured to use GNU Awk, which is the default, the preprocessor symbol .I __gawk__ is predefined with a value of 1. See the .I --awk option. .IP \fB__cppawk_ver\fR This preprocessor symbol gives the version of .IR cppawk . Its value is a is an eight digit decimal integer the form .IR YYYYMMDD , such as 20220321. .SH CONFIGURATION SYMBOLS .IP \fB__gawk_ver\fR Certain .I cppawk header files may have functionality that depends on GNU Awk. The .B __gawk_ver variable may be set by the application to indicate which version of GNU Awk should be assumed by those library headers. The headers will avoid generating code that doesn't work with later versions than this. This variable should be set before including any header files, or using the .B -D option on the command line. The variable should be a decimal integer, whose last four digits encode the minor and build numbers. For instance 4.1.3 is encoded as 40103: #define __gawk_ver 40103 // Inform library GNU Awk 4.1.3 is used #include <...> // inclusion of headers follows If the variable is not set, then the library headers which make use of it will define it themselves to a default value of 40000, to assume GNU Awk 4.0 or later. Lower values than 40000 are not supported; code that requires GNU Awk assumes at least version 4.0. .SH STANDARD HEADERS .I cppawk points the preprocessor to look for .B "#include <...>" files in its own directory, which contains a library of header files that accompany .IR cppawk . .IP \fB\fR This header provides macros which make it easy to write variable-argument macros with complex expansions. This is documented in the .I cppawk-narg manual page. .IP \fB\fR This header provides macros for writing a .B case statement. The case statement syntax is designed so that a GNU Awk switch statement is easily converted to it. The preprocessor translates it back to a clean GNU Awk switch statement, or to portable Awk code that runs on other Awks. The contents of this header are documented by the .I cppawk-case manual page. .SH EXAMPLES Print the larger of field 1 or 2: .ft B cppawk '\fI// C comment\fP #define max(\fIa\fP, \fIb\fP) ((\fIa\fP) > (\fIb\fP) ? (\fIa\fP) : (\fIb\fP)) { print max($1, $2) /* C comment */ } \fI#awk comment\fP' .ft R Implement awk-like processing loop within function, to process /proc/mounts: .ft B #include "awkloop.h" function main() { awkloop ("/proc/mounts") { rule ($3 != "ext4") { nextrec } rule ($2 == "/") { print $1 } } } BEGIN { main() } .ft R Where .B awkloop.h contains: .ft B #define awkloop(\fIfile\fP) for (; getline < \fIfile\fP || (close(\fIfile\fP) && 0); ) #define nextrec continue #define rule(\fIcond\fP) if (\fIcond\fP) .ft R Produce an informative banner in generated output, as an Awk comment block. This is very useful when output is being generated and retained instead of being immediately executed, for instance for installation on a target system which has no preprocessor: .ft B #define HASH # HASH################################################### HASH DO NOT EDIT! HASH This file was generated from __FILE__ on __DATE__ HASH################################################### .ft R Note: this was tested to work with the GNU preprocessor. A spurious blank line may appear. The material in the Awk comments isn't a comment to the C preprocessor; it must consist of valid C preprocessor tokens, so the text must be chosen accordingly. .SH "SEE ALSO" awk(1), cpp(6), cppawk-narg(1), cppawk-case(1), cppawk-cons(1) .SH BUGS The .B -f option can be given only once, whereas .I awk accepts multiple .B -f options, and executes each of the indicated files. Awk error messages are reported against the preprocessed text. Awk .B # comments cannot be used at the start of a line because .B # begins a preprocessing directive. They also cannot be used inside a preprocessing directive, such as a macro definition, because .B # is an operator in the preprocessor language. It may be a good idea to avoid .B # comments entirely in .I cppawk source, and use only C comments. The .I cpp program tokenizes text using C preprocessor rules. Because Awk is "C-like", there is a lot of compatibility between that and Awk syntax, which is why .I cppawk works at all; however, there may be corner cases where some issue arises because of this. One example is that double quote characters may be used in Awk regular expressions such as /abc\(dq/ but the preprocessor rejects this as a literal with a missing closing quote. The workaround for that situation is to use an escape sequence to encode the quote: /abc\e042/ Another area of an incompatibility is that newlines are significant in the Awk grammar, and some Awk programs use backslash-newline escape sequences in order to turn significant newlines into insignificant newlines. Though the C preprocessor recognizes and consumes backslash-newline sequences it may, unfortunately, replace them with an unescaped newlines. So the backslash line continuation technique is not reliably available to .B cppawk programs. A clumsy workaround which works with GNU .I cpp is this: .ft B #define BS \e\e /pattern/ BS { action } .ft R Awk implementations reports errors against lines an anonymous filename associated with the preprocessed stream, rather than the original lines in the original file. Although the preprocessed output indicates source file and line number information, Awks do not understand this. The default choices of .I gawk and .I cpp are fixed in the source code; users must edit .I cppawk to select alternative implementations or locations of these tools, if they don't wish to use the .B --awk and .B --prepro command line options. The C preprocessor doesn't permit macro recursion, which introduces limitations to the ability to compose invocations of .B cppawk macros, thus curtailing their power. If in the expansion of some macro .B M a call of macro .B M appears, that call is not expanded. This is relied upon by C programs which use macros to inline same-named functions, for instance, if it were acceptable for the argument of .B strlen to be evaluated twice, then this macro version would be permissible: .ft B #define strlen(x) (*(x) == 0 ? 0 : strlen(x)) .ft R Here, the .B strlen call in the macro expansion is relied upon not to be expanded as a macro, in which case runaway expansion would occur. .SH AUTHOR Kaz Kylheku .SH COPYRIGHT Copyright 2022, BSD2 License.