Clean up sprintf return type. Add tests for pgawk.

author: Arnold D. Robbins <arnold@skeeve.com> 2010-12-07 22:38:40 +0200
committer: Arnold D. Robbins <arnold@skeeve.com> 2010-12-07 22:38:40 +0200
commit: 2ea264adbbefc2337b766f06d1ceb38cd798915b (patch)
tree: 3a898fb49982ca6edc6a315362c1aa7f5ff4fc8a /test/xref.original
parent: 86643fa35036db1f40ce61da6358da455abe3c3b (diff)
download: egawk-2ea264adbbefc2337b766f06d1ceb38cd798915b.tar.gz
egawk-2ea264adbbefc2337b766f06d1ceb38cd798915b.tar.bz2
egawk-2ea264adbbefc2337b766f06d1ceb38cd798915b.zip
1 files changed, 313 insertions, 0 deletions
diff --git a/test/xref.original b/test/xref.original
new file mode 100644
index 00000000..a94de211
--- /dev/null
+++ b/test/xref.original
@@ -0,0 +1,313 @@
+XREF(AWK)                   Philip L. Bewig                   XREF(AWK)
+
+NAME
+
+        xref(awk) - produce a cross reference listing of an awk program
+
+SYNOPSIS
+
+        awk -f xref.awk [ file ... ]
+
+DESCRIPTION
+
+        XREF(AWK) takes as input a valid awk program and produces as out-
+        put a cross-reference listing of all variables and function calls
+        which appear in the program.
+
+        For ordinary variables and array variables, a line of the form
+
+                count var(func) lines ...
+
+        is produced, where "count" is the number of times the variable is
+        used, "var" is the name of the variable, "func" is the function
+        name to which the variable is local (a null "func" indicates that
+        the variable is global), and "lines" is the number of each line
+        where the variable appears.  Appearances of the variable in a
+        function's parameter list are ignored.  The number of lines shown
+        may differ from "count" if the variable appears more than once on
+        the same line.
+
+        For functions, a line of the form
+
+                count func(define) lines ...
+
+        is produced, where "count" is the number of times the function is
+        called, "func" is the name of the function, "define" is the lime
+        number where the function is defined, and "lines" is the number of
+        each line where the function is called.  As for variables, the
+        number of lines shown may differ from "count."
+
+        Output lines for variables and functions are intermixed and are
+        sorted by name.  Though terse, the output is informative, easy to
+        read, and amenable to further processing.
+
+EXAMPLE
+
+        The cross-reference listing produced by running xref.awk against
+        itself is shown below:
+
+                5 NR() 39 45 50 53 68
+                8 RLENGTH() 119 120 123 124 127 128 132 133
+                10 act() 31 34 36 40 51 54 56 63 65 67
+                1 arr(asplit) 90
+                2 arr(inarray) 93 94
+                1 asplit(89) 6
+                3 braces() 55 57 58
+                2 flines() 53 79
+                1 fs(asplit) 89
+                3 funcname() 42 52 61
+                16 i() 76 77 78 79 81 82 83 84 90
+                3 inarray(92) 37 43 48
+                6 j() 82 83 84
+                3 j(inarray) 93 94
+                3 keywords() 10 125 129
+                1 lex(97) 29
+                31 line() 103 104 107 108 109 110 112 113 114 115 116 117 118
+                        119 120 122 123 124 126 127 128 131 132 133
+                6 lines() 39 45 50 83 84
+                4 local() 41 59 60 64
+                3 machine() 17 30 31
+                2 n(asplit) 89 90
+                15 names() 37 38 43 44 48 49 77 78 79 81 82 83 84
+                3 nextstate() 30 62 73
+                4 nnames() 38 44 49 76
+                4 state() 23 30 31 73
+                1 str(asplit) 89
+                3 symb() 29 30 31
+                2 temp() 59 60
+                2 temp_asplit() 89 90
+                31 tok() 37 38 39 41 42 43 44 45 47 48 49 50 52 53 64 101 105
+                        113 115 117 119 123 125 127 129 132
+                1 val(inarray) 94
+                5 xnames() 39 45 50 77 82
+
+        For readability, some lines have been folded.
+
+SOURCE CODE
+
+        # xref.awk - cross reference an awk program
+
+        BEGIN {
+
+                # create array of keywords to be ignored by lexer
+                asplit("BEGIN:END:atan2:break:close:continue:cos:delete:" \
+                        "do:else:exit:exp:for:getline:gsub:if:in:index:int:"  \
+                        "length:log:match:next:print:printf:rand:return:sin:" \
+                        "split:sprintf:sqrt:srand:sub:substr:system:while",
+                        keywords,":")
+
+                # build the symbol-state table
+                split("00:00:00:00:00:00:00:00:00:00:" \
+                          "20:10:10:12:12:11:07:00:00:00:" \
+                          "08:08:08:08:08:33:08:00:00:00:" \
+                          "08:44:08:36:08:08:08:00:00:00:" \
+                          "08:44:45:42:42:41:08",machine,":")
+
+                # parse the input and store an intermediate representation
+                # of the cross-reference information
+
+                # set up the machine
+                state = 1
+
+                # run the machine
+                for (;;) {
+
+                        # get next symbol
+                        symb = lex()
+                        nextstate = substr(machine[state symb],1,1)
+                        act = substr(machine[state symb],2,1)
+
+                        # perform required action
+                        if ( act == "0" )
+                                ; # do nothing
+                        else if ( act == "1" ) {
+                                if ( ! inarray(tok,names) )
+                                        names[++nnames] = tok
+                                lines[tok,++xnames[tok]] = NR }
+                        else if ( act == "2" ) {
+                                if ( tok in local ) {
+                                        tok = tok "(" funcname ")"
+                                        if ( ! inarray(tok,names) )
+                                                names[++nnames] = tok
+                                        lines[tok,++xnames[tok]] = NR }
+                                else {
+                                        tok = tok "()"
+                                        if ( ! inarray(tok,names) )
+                                                names[++nnames] = tok
+                                        lines[tok,++xnames[tok]] = NR } }
+                        else if ( act == "3" ) {
+                                funcname = tok
+                                flines[tok] = NR }
+                        else if ( act == "4" )
+                                braces++
+                        else if ( act == "5" ) {
+                                braces--
+                                if ( braces == 0 ) {
+                                        for ( temp in local )
+                                                delete local[temp]
+                                        funcname = ""
+                                        nextstate = 1 } }
+                        else if ( act == "6" ) {
+                                local[tok] = 1 }
+                        else if ( act == "7" )
+                                break
+                        else if ( act == "8" ) {
+                                print "error: xref.awk: line " NR ": aborting" \
+                                        > "/dev/con"
+                                exit 1 }
+
+                        # finished with current token
+                        state = nextstate }
+
+                # finished parsing, now ready to print output
+                for ( i = 1; i <= nnames; i++ ) {
+                        printf "%d ", xnames[names[i]] |"sort +1"
+                        if ( index(names[i],"(") == 0 )
+                                printf "%s(%d)", names[i], flines[names[i]] |"sort +1"
+                        else
+                                printf "%s", names[i] |"sort +1"
+                        for ( j = 1; j <= xnames[names[i]]; j++ )
+                                if ( lines[names[i],j] != lines[names[i],j-1] )
+                                        printf " %d", lines[names[i],j] |"sort +1"
+                        printf "\n" |"sort +1" }
+
+                } # END OF PROGRAM
+
+        function asplit(str,arr,fs,  n) { n = split(str,temp_asplit,fs)
+                for ( i = 1; i <= n; i++ ) arr[temp_asplit[i]]++ }
+
+        function inarray(val,arr,  j) {
+            for ( j in arr )
+                if ( arr[j] == val ) return j
+            return "" }
+
+        function lex() {
+
+                for (;;) {
+
+                        if ( tok == "(eof)" ) return 7
+
+                        while ( length(line) == 0 )
+                                if ( getline line == 0 ) {
+                                        tok = "(eof)"; return 7 }
+
+                        sub(/^[ \t]+/,"",line)                                # remove white space,
+                        sub(/^"([^"]|\\")*"/,"",line)             # quoted strings,
+                        sub(/^\/([^\/]|\\\/)+\//,"",line)     # regular expressions,
+                        sub(/^#.*/,"",line)                                   # and comments
+
+                        if ( line ~ /^function/ ) {
+                                tok = "function"; line = substr(line,9); return 1 }
+                        else if ( line ~ /^{/ ) {
+                                tok = "{"; line = substr(line,2); return 2 }
+                        else if ( line ~ /^}/ ) {
+                                tok = "}"; line = substr(line,2); return 3 }
+                        else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*\[/) ) {
+                                tok = substr(line,1,RLENGTH-1)
+                                line = substr(line,RLENGTH+1)
+                                return 5 }
+                        else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*\(/) ) {
+                                tok = substr(line,1,RLENGTH-1)
+                                line = substr(line,RLENGTH+1)
+                                if ( ! ( tok in keywords ) ) return 6 }
+                        else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*/) ) {
+                                tok = substr(line,1,RLENGTH)
+                                line = substr(line,RLENGTH+1)
+                                if ( ! ( tok in keywords ) ) return 4 }
+                        else {
+                                match(line,/^[^A-Za-z_{}]/)
+                                tok = substr(line,1,RLENGTH)
+                                line = substr(line,RLENGTH+1) } } }
+
+TECHNICAL DISCUSSION
+
+        Broadly, XREF(AWK) parses an awk program using a symbol-state
+        table, in much the same way as a yacc-generated parser.  The
+        lexical analyzer recognizes seven distinct symbols:  the word
+        "function", the left brace, the right brace, identifiers used
+        as variables, identifiers used as arrays, identifiers used as
+        functions, and end of file.  The type of symbol is returned to
+        the parser as the value of the "lex" function, and the global
+        variable "tok" is set to the text of the current token.
+
+        The symbol-state table is stored in the "machine" array.  The
+        table can be represented as follows:
+
+                       symbol |     1       2  3   4     5     6     7
+                              |
+        state                 | "function"  {  }  var  array  func  eof
+        -- -- -- -- -- -- -- -+- -- -- -- -- -- -- -- -- -- -- -- -- --
+        1 any                 |     20     10  10  12    12    11   07
+        2 "function"          |     08     08  08  08    08    33   08
+        3 "function" name     |     08     44  08  36    08    08   08
+        4 "function" name "{" |     08     44  45  42    42    41   08
+
+        where the first digit is the state to be entered after process-
+        ing the current token and the second digit is an action to be
+        performed.  The actions are listed below:
+
+                1       found a function call
+                2       found a variable or array
+                3       found a function definition
+                4       found a left brace
+                5       found a right brace
+                6       found a local variable declaration
+                7       found end of file
+                8       found an error
+
+        Each of the first six actions causes some information about the
+        target program to be stored for later processing; the structures
+        used will be discussed below.  The seventh action causes the
+        parser to exit.  The eighth action causes errors to be reported
+        to standard error and the program to abort.
+
+        Before describing the intermediate data structures, we will
+        discuss some of the more interesting points in the action calls.
+        The "braces" variable keeps track of whether we are currently
+        within a functions; it is positive within a function and zero
+        without.  When the right brace which causes the value of "braces"
+        to go from one to zero is found, the value of "nextstate" is
+        changed from four (scanning a function) to one (any) and the
+        names of local variables are forgotten.  The "local" array is
+        accumulated from the variables found after the function name but
+        before the opening left brace of the function; action two care-
+        fully checks whether a variable is global or local before writing
+        to the intermediate data structure.  The variable "funcname" is
+        the name of the current function when within a function and null
+        without.
+
+        The following arrays store an intermediate representation of the
+        variable and function identifiers of the target program:
+
+                names[1..nnames] = list of all identifiers, both variable and
+                        function names; for variables, the name has the form
+                        var(func), but for functions, there are no parentheses
+
+                xnames[names[i]] = number of times names[i] is used
+
+                lines[names[i],1..xnames[names[i]]] = list of line numbers
+                        where names[i] is used
+
+                flines[names[i]] = line number where function names[i] is
+                        defined
+
+        These arrays are created as the parser reads the input; when the
+        parser is finished, the arrays are output in user-readable form.
+
+PORTABILITY
+
+        XREF(AWK) will work with any implementation of nawk.  The MKS
+        ToolKit implementation requires the large-model version of awk.
+
+HISTORY
+
+        Written by Phil Bewig on February 10, 1990.  Inspired by
+        Exercise 3-16 of the book "The Awk Programming Language" by
+        Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger
+        (Addison-Wesley:  1988).
+
+COPYRIGHT
+
+        This program is placed in the public domain.  However, the
+        author requests credit when distributed. 
+
author	Arnold D. Robbins <arnold@skeeve.com>	2010-12-07 22:38:40 +0200
committer	Arnold D. Robbins <arnold@skeeve.com>	2010-12-07 22:38:40 +0200
commit	2ea264adbbefc2337b766f06d1ceb38cd798915b (patch)
tree	3a898fb49982ca6edc6a315362c1aa7f5ff4fc8a /test/xref.original
parent	86643fa35036db1f40ce61da6358da455abe3c3b (diff)
download	egawk-2ea264adbbefc2337b766f06d1ceb38cd798915b.tar.gz egawk-2ea264adbbefc2337b766f06d1ceb38cd798915b.tar.bz2 egawk-2ea264adbbefc2337b766f06d1ceb38cd798915b.zip