diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/ChangeLog | 4 | ||||
-rw-r--r-- | doc/gawk.info | 618 | ||||
-rw-r--r-- | doc/gawk.texi | 114 | ||||
-rw-r--r-- | doc/gawktexi.in | 114 |
4 files changed, 438 insertions, 412 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog index dd400936..c70ca85c 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,7 @@ +2020-08-31 Arnold D. Robbins <arnold@skeeve.com> + + * gawktexi.in (Uniq Program): Updated uniq.awk to follow 2020 POSIX. + 2020-08-26 Arnold D. Robbins <arnold@skeeve.com> * gawktexi.in: Fix some small mistakes / typos. diff --git a/doc/gawk.info b/doc/gawk.info index ad52db04..9d5d4ef6 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -18399,7 +18399,7 @@ by default removes duplicate lines. In other words, it only prints unique lines--hence the name. 'uniq' has a number of options. The usage is as follows: - 'uniq' ['-udc' ['-N']] ['+N'] [INPUTFILE [OUTPUTFILE]] + 'uniq' ['-udc' ['-f N'] ['-s N']] [INPUTFILE [OUTPUTFILE]] The options for 'uniq' are: @@ -18413,14 +18413,14 @@ usage is as follows: Count lines. This option overrides '-d' and '-u'. Both repeated and nonrepeated lines are counted. -'-N' +'-f N' Skip N fields before comparing lines. The definition of fields is similar to 'awk''s default: nonwhitespace characters separated by runs of spaces and/or TABs. -'+N' +'-s N' Skip N characters before comparing lines. Any fields specified - with '-N' are skipped first. + with '-f' are skipped first. 'INPUTFILE' Data is read from the input file named on the command line, instead @@ -18437,21 +18437,7 @@ provided. and the 'join()' library function (*note Join Function::). The program begins with a 'usage()' function and then a brief outline -of the options and their meanings in comments. The 'BEGIN' rule deals -with the command-line arguments and options. It uses a trick to get -'getopt()' to handle options of the form '-25', treating such an option -as the option letter '2' with an argument of '5'. If indeed two or more -digits are supplied ('Optarg' looks like a number), 'Optarg' is -concatenated with the option digit and then the result is added to zero -to make it into a number. If there is only one digit in the option, -then 'Optarg' is not needed. In this case, 'Optind' must be decremented -so that 'getopt()' processes it next time. This code is admittedly a -bit tricky. - - If no options are supplied, then the default is taken, to print both -repeated and nonrepeated lines. The output file, if provided, is -assigned to 'outputfile'. Early on, 'outputfile' is initialized to the -standard output, '/dev/stdout': +of the options and their meanings in comments: # uniq.awk --- do uniq in awk # @@ -18459,20 +18445,47 @@ standard output, '/dev/stdout': function usage() { - print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr" + print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") > "/dev/stderr" exit 1 } # -c count lines. overrides -d and -u # -d only repeated lines # -u only nonrepeated lines - # -n skip n fields - # +n skip n characters, skip fields first + # -f n skip n fields + # -s n skip n characters, skip fields first + + The POSIX standard for 'uniq' allows options to start with '+' as +well as with '-'. An initial 'BEGIN' rule traverses the arguments +changing any leading '+' to '-' so that the 'getopt()' function can +parse the options: + + # As of 2020, '+' can be used as option character in addition to '-' + # Previously allowed use of -N to skip fields and +N to skip + # characters is no longer allowed, and not supported by this version. + + BEGIN { + # Convert + to - so getopt can handle things + for (i = 1; i < ARGC; i++) { + first = substr(ARGV[i], 1, 1) + if (ARGV[i] == "--" || (first != "-" && first != "+")) + break + else if (first == "+") + # Replace "+" with "-" + ARGV[i] = "-" substr(ARGV[i], 2) + } + } + + The next 'BEGIN' rule deals with the command-line arguments and +options. If no options are supplied, then the default is taken, to +print both repeated and nonrepeated lines. The output file, if +provided, is assigned to 'outputfile'. Early on, 'outputfile' is +initialized to the standard output, '/dev/stdout': BEGIN { count = 1 outputfile = "/dev/stdout" - opts = "udc0:1:2:3:4:5:6:7:8:9:" + opts = "udcf:s:" while ((c = getopt(ARGC, ARGV, opts)) != -1) { if (c == "u") non_repeated_only++ @@ -18480,24 +18493,14 @@ standard output, '/dev/stdout': repeated_only++ else if (c == "c") do_count++ - else if (index("0123456789", c) != 0) { - # getopt() requires args to options - # this messes us up for things like -5 - if (Optarg ~ /^[[:digit:]]+$/) - fcount = (c Optarg) + 0 - else { - fcount = c + 0 - Optind-- - } - } else + else if (c == "f") + fcount = Optarg + 0 + else if (c == "s") + charcount = Optarg + 0 + else usage() } - if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) { - charcount = substr(ARGV[Optind], 2) + 0 - Optind++ - } - for (i = 1; i < Optind; i++) ARGV[i] = "" @@ -18610,6 +18613,20 @@ line of input data: convention of naming global variables with a leading capital letter. Doing that would make the program a little easier to follow. + The logic for choosing which lines to print represents a "state +machine", which is "a device which can be in one of a set number of +stable conditions depending on its previous condition and on the present +values of its inputs."(1) Brian Kernighan suggests that "an alternative +approach to state machines is to just read the input into an array, then +use indexing. It's almost always easier code, and for most inputs where +you would use this, just as fast." Consider how to rewrite the logic to +follow this suggestion. + + ---------- Footnotes ---------- + + (1) This definition is from +<https://www.lexico.com/en/definition/state_machine>. + File: gawk.info, Node: Wc Program, Prev: Uniq Program, Up: Clones @@ -37247,7 +37264,7 @@ Index * uninitialized variables, as array subscripts: Uninitialized Subscripts. (line 6) * uniq utility: Uniq Program. (line 6) -* uniq.awk program: Uniq Program. (line 65) +* uniq.awk program: Uniq Program. (line 51) * Unix, awk scripts and: Executable Scripts. (line 6) * Unix: Glossary. (line 747) * Unix awk, backslashes in escape sequences: Escape Sequences. @@ -37744,267 +37761,268 @@ Node: Split Program749505 Ref: Split Program-Footnote-1752963 Node: Tee Program753092 Node: Uniq Program755882 -Node: Wc Program763503 -Ref: Wc Program-Footnote-1767758 -Node: Miscellaneous Programs767852 -Node: Dupword Program769065 -Node: Alarm Program771095 -Node: Translate Program775950 -Ref: Translate Program-Footnote-1780515 -Node: Labels Program780785 -Ref: Labels Program-Footnote-1784136 -Node: Word Sorting784220 -Node: History Sorting788292 -Node: Extract Program790517 -Node: Simple Sed798571 -Node: Igawk Program801645 -Ref: Igawk Program-Footnote-1815976 -Ref: Igawk Program-Footnote-2816178 -Ref: Igawk Program-Footnote-3816300 -Node: Anagram Program816415 -Node: Signature Program819477 -Node: Programs Summary820724 -Node: Programs Exercises821938 -Ref: Programs Exercises-Footnote-1826067 -Node: Advanced Features826158 -Node: Nondecimal Data828148 -Node: Array Sorting829739 -Node: Controlling Array Traversal830439 -Ref: Controlling Array Traversal-Footnote-1838807 -Node: Array Sorting Functions838925 -Ref: Array Sorting Functions-Footnote-1844016 -Node: Two-way I/O844212 -Ref: Two-way I/O-Footnote-1851933 -Ref: Two-way I/O-Footnote-2852120 -Node: TCP/IP Networking852202 -Node: Profiling855320 -Node: Advanced Features Summary864634 -Node: Internationalization866478 -Node: I18N and L10N867958 -Node: Explaining gettext868645 -Ref: Explaining gettext-Footnote-1874537 -Ref: Explaining gettext-Footnote-2874722 -Node: Programmer i18n874887 -Ref: Programmer i18n-Footnote-1879836 -Node: Translator i18n879885 -Node: String Extraction880679 -Ref: String Extraction-Footnote-1881811 -Node: Printf Ordering881897 -Ref: Printf Ordering-Footnote-1884683 -Node: I18N Portability884747 -Ref: I18N Portability-Footnote-1887203 -Node: I18N Example887266 -Ref: I18N Example-Footnote-1890541 -Ref: I18N Example-Footnote-2890614 -Node: Gawk I18N890723 -Node: I18N Summary891372 -Node: Debugger892713 -Node: Debugging893713 -Node: Debugging Concepts894154 -Node: Debugging Terms895963 -Node: Awk Debugging898538 -Ref: Awk Debugging-Footnote-1899483 -Node: Sample Debugging Session899615 -Node: Debugger Invocation900149 -Node: Finding The Bug901535 -Node: List of Debugger Commands908009 -Node: Breakpoint Control909342 -Node: Debugger Execution Control913036 -Node: Viewing And Changing Data916398 -Node: Execution Stack919939 -Node: Debugger Info921576 -Node: Miscellaneous Debugger Commands925647 -Node: Readline Support930709 -Node: Limitations931605 -Node: Debugging Summary934159 -Node: Namespaces935438 -Node: Global Namespace936549 -Node: Qualified Names937947 -Node: Default Namespace938946 -Node: Changing The Namespace939687 -Node: Naming Rules941301 -Node: Internal Name Management943149 -Node: Namespace Example944191 -Node: Namespace And Features946753 -Node: Namespace Summary948188 -Node: Arbitrary Precision Arithmetic949665 -Node: Computer Arithmetic951152 -Ref: table-numeric-ranges954918 -Ref: table-floating-point-ranges955411 -Ref: Computer Arithmetic-Footnote-1956069 -Node: Math Definitions956126 -Ref: table-ieee-formats959442 -Ref: Math Definitions-Footnote-1960045 -Node: MPFR features960150 -Node: FP Math Caution961868 -Ref: FP Math Caution-Footnote-1962940 -Node: Inexactness of computations963309 -Node: Inexact representation964269 -Node: Comparing FP Values965629 -Node: Errors accumulate966870 -Node: Getting Accuracy968303 -Node: Try To Round971013 -Node: Setting precision971912 -Ref: table-predefined-precision-strings972609 -Node: Setting the rounding mode974439 -Ref: table-gawk-rounding-modes974813 -Ref: Setting the rounding mode-Footnote-1978744 -Node: Arbitrary Precision Integers978923 -Ref: Arbitrary Precision Integers-Footnote-1982098 -Node: Checking for MPFR982247 -Node: POSIX Floating Point Problems983721 -Ref: POSIX Floating Point Problems-Footnote-1988006 -Node: Floating point summary988044 -Node: Dynamic Extensions990234 -Node: Extension Intro991787 -Node: Plugin License993053 -Node: Extension Mechanism Outline993850 -Ref: figure-load-extension994289 -Ref: figure-register-new-function995854 -Ref: figure-call-new-function996946 -Node: Extension API Description999008 -Node: Extension API Functions Introduction1000721 -Ref: table-api-std-headers1002557 -Node: General Data Types1006806 -Ref: General Data Types-Footnote-11015436 -Node: Memory Allocation Functions1015735 -Ref: Memory Allocation Functions-Footnote-11020236 -Node: Constructor Functions1020335 -Node: API Ownership of MPFR and GMP Values1023801 -Node: Registration Functions1025114 -Node: Extension Functions1025814 -Node: Exit Callback Functions1031136 -Node: Extension Version String1032386 -Node: Input Parsers1033049 -Node: Output Wrappers1045770 -Node: Two-way processors1050282 -Node: Printing Messages1052547 -Ref: Printing Messages-Footnote-11053718 -Node: Updating ERRNO1053871 -Node: Requesting Values1054610 -Ref: table-value-types-returned1055347 -Node: Accessing Parameters1056283 -Node: Symbol Table Access1057520 -Node: Symbol table by name1058032 -Ref: Symbol table by name-Footnote-11061056 -Node: Symbol table by cookie1061184 -Ref: Symbol table by cookie-Footnote-11065369 -Node: Cached values1065433 -Ref: Cached values-Footnote-11068969 -Node: Array Manipulation1069122 -Ref: Array Manipulation-Footnote-11070213 -Node: Array Data Types1070250 -Ref: Array Data Types-Footnote-11072908 -Node: Array Functions1073000 -Node: Flattening Arrays1077498 -Node: Creating Arrays1084474 -Node: Redirection API1089241 -Node: Extension API Variables1092074 -Node: Extension Versioning1092785 -Ref: gawk-api-version1093214 -Node: Extension GMP/MPFR Versioning1094945 -Node: Extension API Informational Variables1096573 -Node: Extension API Boilerplate1097646 -Node: Changes from API V11101620 -Node: Finding Extensions1103192 -Node: Extension Example1103751 -Node: Internal File Description1104549 -Node: Internal File Ops1108629 -Ref: Internal File Ops-Footnote-11119979 -Node: Using Internal File Ops1120119 -Ref: Using Internal File Ops-Footnote-11122502 -Node: Extension Samples1122776 -Node: Extension Sample File Functions1124305 -Node: Extension Sample Fnmatch1131954 -Node: Extension Sample Fork1133441 -Node: Extension Sample Inplace1134659 -Node: Extension Sample Ord1138284 -Node: Extension Sample Readdir1139120 -Ref: table-readdir-file-types1140009 -Node: Extension Sample Revout1141076 -Node: Extension Sample Rev2way1141665 -Node: Extension Sample Read write array1142405 -Node: Extension Sample Readfile1144347 -Node: Extension Sample Time1145442 -Node: Extension Sample API Tests1147194 -Node: gawkextlib1147686 -Node: Extension summary1150604 -Node: Extension Exercises1154306 -Node: Language History1155548 -Node: V7/SVR3.11157204 -Node: SVR41159356 -Node: POSIX1160790 -Node: BTL1162171 -Node: POSIX/GNU1162900 -Node: Feature History1168678 -Node: Common Extensions1184997 -Node: Ranges and Locales1186280 -Ref: Ranges and Locales-Footnote-11190896 -Ref: Ranges and Locales-Footnote-21190923 -Ref: Ranges and Locales-Footnote-31191158 -Node: Contributors1191381 -Node: History summary1197378 -Node: Installation1198758 -Node: Gawk Distribution1199702 -Node: Getting1200186 -Node: Extracting1201149 -Node: Distribution contents1202787 -Node: Unix Installation1209267 -Node: Quick Installation1209949 -Node: Shell Startup Files1212363 -Node: Additional Configuration Options1213452 -Node: Configuration Philosophy1215767 -Node: Non-Unix Installation1218136 -Node: PC Installation1218596 -Node: PC Binary Installation1219434 -Node: PC Compiling1219869 -Node: PC Using1220986 -Node: Cygwin1224539 -Node: MSYS1225763 -Node: VMS Installation1226365 -Node: VMS Compilation1227156 -Ref: VMS Compilation-Footnote-11228385 -Node: VMS Dynamic Extensions1228443 -Node: VMS Installation Details1230128 -Node: VMS Running1232381 -Node: VMS GNV1236660 -Node: VMS Old Gawk1237395 -Node: Bugs1237866 -Node: Bug address1238529 -Node: Usenet1241511 -Node: Maintainers1242515 -Node: Other Versions1243700 -Node: Installation summary1250788 -Node: Notes1251997 -Node: Compatibility Mode1252791 -Node: Additions1253573 -Node: Accessing The Source1254498 -Node: Adding Code1255935 -Node: New Ports1262154 -Node: Derived Files1266529 -Ref: Derived Files-Footnote-11272189 -Ref: Derived Files-Footnote-21272224 -Ref: Derived Files-Footnote-31272822 -Node: Future Extensions1272936 -Node: Implementation Limitations1273594 -Node: Extension Design1274804 -Node: Old Extension Problems1275948 -Ref: Old Extension Problems-Footnote-11277466 -Node: Extension New Mechanism Goals1277523 -Ref: Extension New Mechanism Goals-Footnote-11280887 -Node: Extension Other Design Decisions1281076 -Node: Extension Future Growth1283189 -Node: Notes summary1283795 -Node: Basic Concepts1284953 -Node: Basic High Level1285634 -Ref: figure-general-flow1285916 -Ref: figure-process-flow1286601 -Ref: Basic High Level-Footnote-11289902 -Node: Basic Data Typing1290087 -Node: Glossary1293415 -Node: Copying1325300 -Node: GNU Free Documentation License1362843 -Node: Index1387963 +Ref: Uniq Program-Footnote-1764007 +Node: Wc Program764093 +Ref: Wc Program-Footnote-1768348 +Node: Miscellaneous Programs768442 +Node: Dupword Program769655 +Node: Alarm Program771685 +Node: Translate Program776540 +Ref: Translate Program-Footnote-1781105 +Node: Labels Program781375 +Ref: Labels Program-Footnote-1784726 +Node: Word Sorting784810 +Node: History Sorting788882 +Node: Extract Program791107 +Node: Simple Sed799161 +Node: Igawk Program802235 +Ref: Igawk Program-Footnote-1816566 +Ref: Igawk Program-Footnote-2816768 +Ref: Igawk Program-Footnote-3816890 +Node: Anagram Program817005 +Node: Signature Program820067 +Node: Programs Summary821314 +Node: Programs Exercises822528 +Ref: Programs Exercises-Footnote-1826657 +Node: Advanced Features826748 +Node: Nondecimal Data828738 +Node: Array Sorting830329 +Node: Controlling Array Traversal831029 +Ref: Controlling Array Traversal-Footnote-1839397 +Node: Array Sorting Functions839515 +Ref: Array Sorting Functions-Footnote-1844606 +Node: Two-way I/O844802 +Ref: Two-way I/O-Footnote-1852523 +Ref: Two-way I/O-Footnote-2852710 +Node: TCP/IP Networking852792 +Node: Profiling855910 +Node: Advanced Features Summary865224 +Node: Internationalization867068 +Node: I18N and L10N868548 +Node: Explaining gettext869235 +Ref: Explaining gettext-Footnote-1875127 +Ref: Explaining gettext-Footnote-2875312 +Node: Programmer i18n875477 +Ref: Programmer i18n-Footnote-1880426 +Node: Translator i18n880475 +Node: String Extraction881269 +Ref: String Extraction-Footnote-1882401 +Node: Printf Ordering882487 +Ref: Printf Ordering-Footnote-1885273 +Node: I18N Portability885337 +Ref: I18N Portability-Footnote-1887793 +Node: I18N Example887856 +Ref: I18N Example-Footnote-1891131 +Ref: I18N Example-Footnote-2891204 +Node: Gawk I18N891313 +Node: I18N Summary891962 +Node: Debugger893303 +Node: Debugging894303 +Node: Debugging Concepts894744 +Node: Debugging Terms896553 +Node: Awk Debugging899128 +Ref: Awk Debugging-Footnote-1900073 +Node: Sample Debugging Session900205 +Node: Debugger Invocation900739 +Node: Finding The Bug902125 +Node: List of Debugger Commands908599 +Node: Breakpoint Control909932 +Node: Debugger Execution Control913626 +Node: Viewing And Changing Data916988 +Node: Execution Stack920529 +Node: Debugger Info922166 +Node: Miscellaneous Debugger Commands926237 +Node: Readline Support931299 +Node: Limitations932195 +Node: Debugging Summary934749 +Node: Namespaces936028 +Node: Global Namespace937139 +Node: Qualified Names938537 +Node: Default Namespace939536 +Node: Changing The Namespace940277 +Node: Naming Rules941891 +Node: Internal Name Management943739 +Node: Namespace Example944781 +Node: Namespace And Features947343 +Node: Namespace Summary948778 +Node: Arbitrary Precision Arithmetic950255 +Node: Computer Arithmetic951742 +Ref: table-numeric-ranges955508 +Ref: table-floating-point-ranges956001 +Ref: Computer Arithmetic-Footnote-1956659 +Node: Math Definitions956716 +Ref: table-ieee-formats960032 +Ref: Math Definitions-Footnote-1960635 +Node: MPFR features960740 +Node: FP Math Caution962458 +Ref: FP Math Caution-Footnote-1963530 +Node: Inexactness of computations963899 +Node: Inexact representation964859 +Node: Comparing FP Values966219 +Node: Errors accumulate967460 +Node: Getting Accuracy968893 +Node: Try To Round971603 +Node: Setting precision972502 +Ref: table-predefined-precision-strings973199 +Node: Setting the rounding mode975029 +Ref: table-gawk-rounding-modes975403 +Ref: Setting the rounding mode-Footnote-1979334 +Node: Arbitrary Precision Integers979513 +Ref: Arbitrary Precision Integers-Footnote-1982688 +Node: Checking for MPFR982837 +Node: POSIX Floating Point Problems984311 +Ref: POSIX Floating Point Problems-Footnote-1988596 +Node: Floating point summary988634 +Node: Dynamic Extensions990824 +Node: Extension Intro992377 +Node: Plugin License993643 +Node: Extension Mechanism Outline994440 +Ref: figure-load-extension994879 +Ref: figure-register-new-function996444 +Ref: figure-call-new-function997536 +Node: Extension API Description999598 +Node: Extension API Functions Introduction1001311 +Ref: table-api-std-headers1003147 +Node: General Data Types1007396 +Ref: General Data Types-Footnote-11016026 +Node: Memory Allocation Functions1016325 +Ref: Memory Allocation Functions-Footnote-11020826 +Node: Constructor Functions1020925 +Node: API Ownership of MPFR and GMP Values1024391 +Node: Registration Functions1025704 +Node: Extension Functions1026404 +Node: Exit Callback Functions1031726 +Node: Extension Version String1032976 +Node: Input Parsers1033639 +Node: Output Wrappers1046360 +Node: Two-way processors1050872 +Node: Printing Messages1053137 +Ref: Printing Messages-Footnote-11054308 +Node: Updating ERRNO1054461 +Node: Requesting Values1055200 +Ref: table-value-types-returned1055937 +Node: Accessing Parameters1056873 +Node: Symbol Table Access1058110 +Node: Symbol table by name1058622 +Ref: Symbol table by name-Footnote-11061646 +Node: Symbol table by cookie1061774 +Ref: Symbol table by cookie-Footnote-11065959 +Node: Cached values1066023 +Ref: Cached values-Footnote-11069559 +Node: Array Manipulation1069712 +Ref: Array Manipulation-Footnote-11070803 +Node: Array Data Types1070840 +Ref: Array Data Types-Footnote-11073498 +Node: Array Functions1073590 +Node: Flattening Arrays1078088 +Node: Creating Arrays1085064 +Node: Redirection API1089831 +Node: Extension API Variables1092664 +Node: Extension Versioning1093375 +Ref: gawk-api-version1093804 +Node: Extension GMP/MPFR Versioning1095535 +Node: Extension API Informational Variables1097163 +Node: Extension API Boilerplate1098236 +Node: Changes from API V11102210 +Node: Finding Extensions1103782 +Node: Extension Example1104341 +Node: Internal File Description1105139 +Node: Internal File Ops1109219 +Ref: Internal File Ops-Footnote-11120569 +Node: Using Internal File Ops1120709 +Ref: Using Internal File Ops-Footnote-11123092 +Node: Extension Samples1123366 +Node: Extension Sample File Functions1124895 +Node: Extension Sample Fnmatch1132544 +Node: Extension Sample Fork1134031 +Node: Extension Sample Inplace1135249 +Node: Extension Sample Ord1138874 +Node: Extension Sample Readdir1139710 +Ref: table-readdir-file-types1140599 +Node: Extension Sample Revout1141666 +Node: Extension Sample Rev2way1142255 +Node: Extension Sample Read write array1142995 +Node: Extension Sample Readfile1144937 +Node: Extension Sample Time1146032 +Node: Extension Sample API Tests1147784 +Node: gawkextlib1148276 +Node: Extension summary1151194 +Node: Extension Exercises1154896 +Node: Language History1156138 +Node: V7/SVR3.11157794 +Node: SVR41159946 +Node: POSIX1161380 +Node: BTL1162761 +Node: POSIX/GNU1163490 +Node: Feature History1169268 +Node: Common Extensions1185587 +Node: Ranges and Locales1186870 +Ref: Ranges and Locales-Footnote-11191486 +Ref: Ranges and Locales-Footnote-21191513 +Ref: Ranges and Locales-Footnote-31191748 +Node: Contributors1191971 +Node: History summary1197968 +Node: Installation1199348 +Node: Gawk Distribution1200292 +Node: Getting1200776 +Node: Extracting1201739 +Node: Distribution contents1203377 +Node: Unix Installation1209857 +Node: Quick Installation1210539 +Node: Shell Startup Files1212953 +Node: Additional Configuration Options1214042 +Node: Configuration Philosophy1216357 +Node: Non-Unix Installation1218726 +Node: PC Installation1219186 +Node: PC Binary Installation1220024 +Node: PC Compiling1220459 +Node: PC Using1221576 +Node: Cygwin1225129 +Node: MSYS1226353 +Node: VMS Installation1226955 +Node: VMS Compilation1227746 +Ref: VMS Compilation-Footnote-11228975 +Node: VMS Dynamic Extensions1229033 +Node: VMS Installation Details1230718 +Node: VMS Running1232971 +Node: VMS GNV1237250 +Node: VMS Old Gawk1237985 +Node: Bugs1238456 +Node: Bug address1239119 +Node: Usenet1242101 +Node: Maintainers1243105 +Node: Other Versions1244290 +Node: Installation summary1251378 +Node: Notes1252587 +Node: Compatibility Mode1253381 +Node: Additions1254163 +Node: Accessing The Source1255088 +Node: Adding Code1256525 +Node: New Ports1262744 +Node: Derived Files1267119 +Ref: Derived Files-Footnote-11272779 +Ref: Derived Files-Footnote-21272814 +Ref: Derived Files-Footnote-31273412 +Node: Future Extensions1273526 +Node: Implementation Limitations1274184 +Node: Extension Design1275394 +Node: Old Extension Problems1276538 +Ref: Old Extension Problems-Footnote-11278056 +Node: Extension New Mechanism Goals1278113 +Ref: Extension New Mechanism Goals-Footnote-11281477 +Node: Extension Other Design Decisions1281666 +Node: Extension Future Growth1283779 +Node: Notes summary1284385 +Node: Basic Concepts1285543 +Node: Basic High Level1286224 +Ref: figure-general-flow1286506 +Ref: figure-process-flow1287191 +Ref: Basic High Level-Footnote-11290492 +Node: Basic Data Typing1290677 +Node: Glossary1294005 +Node: Copying1325890 +Node: GNU Free Documentation License1363433 +Node: Index1388553 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 6d10e4ee..60f129dc 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -25998,8 +25998,6 @@ END @{ @node Uniq Program @subsection Printing Nonduplicated Lines of Text -@c FIXME: One day, update to current POSIX version of uniq - @cindex printing @subentry unduplicated lines of text @cindex text, printing @subentry unduplicated lines of @cindex @command{uniq} utility @@ -26009,7 +26007,7 @@ prints unique lines---hence the name. @command{uniq} has a number of options. The usage is as follows: @display -@command{uniq} [@option{-udc} [@code{-@var{n}}]] [@code{+@var{n}}] [@var{inputfile} [@var{outputfile}]] +@command{uniq} [@option{-udc} [@code{-f @var{n}}] [@code{-s @var{n}}]] [@var{inputfile} [@var{outputfile}]] @end display The options for @command{uniq} are: @@ -26025,14 +26023,14 @@ Print only nonrepeated (unique) lines. Count lines. This option overrides @option{-d} and @option{-u}. Both repeated and nonrepeated lines are counted. -@item -@var{n} +@item -f @var{n} Skip @var{n} fields before comparing lines. The definition of fields is similar to @command{awk}'s default: nonwhitespace characters separated by runs of spaces and/or TABs. -@item +@var{n} +@item -s @var{n} Skip @var{n} characters before comparing lines. Any fields specified with -@samp{-@var{n}} are skipped first. +@option{-f} are skipped first. @item @var{inputfile} Data is read from the input file named on the command line, instead of from @@ -26053,22 +26051,7 @@ and the @code{join()} library function (@pxref{Join Function}). The program begins with a @code{usage()} function and then a brief outline of -the options and their meanings in comments. -The @code{BEGIN} rule deals with the command-line arguments and options. It -uses a trick to get @code{getopt()} to handle options of the form @samp{-25}, -treating such an option as the option letter @samp{2} with an argument of -@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks -like a number), @code{Optarg} is -concatenated with the option digit and then the result is added to zero to make -it into a number. If there is only one digit in the option, then -@code{Optarg} is not needed. In this case, @code{Optind} must be decremented so that -@code{getopt()} processes it next time. This code is admittedly a bit -tricky. - -If no options are supplied, then the default is taken, to print both -repeated and nonrepeated lines. The output file, if provided, is assigned -to @code{outputfile}. Early on, @code{outputfile} is initialized to the -standard output, @file{/dev/stdout}: +the options and their meanings in comments: @cindex @code{uniq.awk} program @example @@ -26084,26 +26067,62 @@ standard output, @file{/dev/stdout}: # # Arnold Robbins, arnold@@skeeve.com, Public Domain # May 1993 +# Updated August 2020 to current POSIX @c endfile @end ignore @c file eg/prog/uniq.awk function usage() @{ - print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr" + print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") > "/dev/stderr" exit 1 @} # -c count lines. overrides -d and -u # -d only repeated lines # -u only nonrepeated lines -# -n skip n fields -# +n skip n characters, skip fields first +# -f n skip n fields +# -s n skip n characters, skip fields first +@c endfile +@end example + +The POSIX standard for @command{uniq} allows options to start with +@samp{+} as well as with @samp{-}. An initial @code{BEGIN} rule +traverses the arguments changing any leading @samp{+} to @samp{-} +so that the @code{getopt()} function can parse the options: + +@example +@c file eg/prog/uniq.awk +# As of 2020, '+' can be used as option character in addition to '-' +# Previously allowed use of -N to skip fields and +N to skip +# characters is no longer allowed, and not supported by this version. + +BEGIN @{ + # Convert + to - so getopt can handle things + for (i = 1; i < ARGC; i++) @{ + first = substr(ARGV[i], 1, 1) + if (ARGV[i] == "--" || (first != "-" && first != "+")) + break + else if (first == "+") + # Replace "+" with "-" + ARGV[i] = "-" substr(ARGV[i], 2) + @} +@} +@c endfile +@end example + +The next @code{BEGIN} rule deals with the command-line arguments and options. +If no options are supplied, then the default is taken, to print both +repeated and nonrepeated lines. The output file, if provided, is assigned +to @code{outputfile}. Early on, @code{outputfile} is initialized to the +standard output, @file{/dev/stdout}: +@example +@c file eg/prog/uniq.awk BEGIN @{ count = 1 outputfile = "/dev/stdout" - opts = "udc0:1:2:3:4:5:6:7:8:9:" + opts = "udcf:s:" while ((c = getopt(ARGC, ARGV, opts)) != -1) @{ if (c == "u") non_repeated_only++ @@ -26111,26 +26130,14 @@ BEGIN @{ repeated_only++ else if (c == "c") do_count++ - else if (index("0123456789", c) != 0) @{ - # getopt() requires args to options - # this messes us up for things like -5 - if (Optarg ~ /^[[:digit:]]+$/) - fcount = (c Optarg) + 0 - else @{ - fcount = c + 0 - Optind-- - @} - @} else + else if (c == "f") + fcount = Optarg + 0 + else if (c == "s") + charcount = Optarg + 0 + else usage() @} -@group - if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{ - charcount = substr(ARGV[Optind], 2) + 0 - Optind++ - @} -@end group - for (i = 1; i < Optind; i++) ARGV[i] = "" @@ -26260,20 +26267,15 @@ As a side note, this program does not follow our recommended convention of namin global variables with a leading capital letter. Doing that would make the program a little easier to follow. -@ifset FOR_PRINT The logic for choosing which lines to print represents a @dfn{state -machine}, which is ``a device that can be in one of a set number of stable -conditions depending on its previous condition and on the present values -of its inputs.''@footnote{This is the definition returned from entering -@code{define: state machine} into Google.} -Brian Kernighan suggests that -``an alternative approach to state machines is to just read -the input into an array, then use indexing. It's almost always -easier code, and for most inputs where you would use this, just -as fast.'' Consider how to rewrite the logic to follow this -suggestion. -@end ifset - +machine}, which is ``a device which can be in one of a set number +of stable conditions depending on its previous condition and on the +present values of its inputs.''@footnote{This definition is from +@uref{https://www.lexico.com/en/definition/state_machine}.} Brian +Kernighan suggests that ``an alternative approach to state machines is +to just read the input into an array, then use indexing. It's almost +always easier code, and for most inputs where you would use this, just +as fast.'' Consider how to rewrite the logic to follow this suggestion. @node Wc Program diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 2d13f518..6a9dfe0e 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -25008,8 +25008,6 @@ END @{ @node Uniq Program @subsection Printing Nonduplicated Lines of Text -@c FIXME: One day, update to current POSIX version of uniq - @cindex printing @subentry unduplicated lines of text @cindex text, printing @subentry unduplicated lines of @cindex @command{uniq} utility @@ -25019,7 +25017,7 @@ prints unique lines---hence the name. @command{uniq} has a number of options. The usage is as follows: @display -@command{uniq} [@option{-udc} [@code{-@var{n}}]] [@code{+@var{n}}] [@var{inputfile} [@var{outputfile}]] +@command{uniq} [@option{-udc} [@code{-f @var{n}}] [@code{-s @var{n}}]] [@var{inputfile} [@var{outputfile}]] @end display The options for @command{uniq} are: @@ -25035,14 +25033,14 @@ Print only nonrepeated (unique) lines. Count lines. This option overrides @option{-d} and @option{-u}. Both repeated and nonrepeated lines are counted. -@item -@var{n} +@item -f @var{n} Skip @var{n} fields before comparing lines. The definition of fields is similar to @command{awk}'s default: nonwhitespace characters separated by runs of spaces and/or TABs. -@item +@var{n} +@item -s @var{n} Skip @var{n} characters before comparing lines. Any fields specified with -@samp{-@var{n}} are skipped first. +@option{-f} are skipped first. @item @var{inputfile} Data is read from the input file named on the command line, instead of from @@ -25063,22 +25061,7 @@ and the @code{join()} library function (@pxref{Join Function}). The program begins with a @code{usage()} function and then a brief outline of -the options and their meanings in comments. -The @code{BEGIN} rule deals with the command-line arguments and options. It -uses a trick to get @code{getopt()} to handle options of the form @samp{-25}, -treating such an option as the option letter @samp{2} with an argument of -@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks -like a number), @code{Optarg} is -concatenated with the option digit and then the result is added to zero to make -it into a number. If there is only one digit in the option, then -@code{Optarg} is not needed. In this case, @code{Optind} must be decremented so that -@code{getopt()} processes it next time. This code is admittedly a bit -tricky. - -If no options are supplied, then the default is taken, to print both -repeated and nonrepeated lines. The output file, if provided, is assigned -to @code{outputfile}. Early on, @code{outputfile} is initialized to the -standard output, @file{/dev/stdout}: +the options and their meanings in comments: @cindex @code{uniq.awk} program @example @@ -25094,26 +25077,62 @@ standard output, @file{/dev/stdout}: # # Arnold Robbins, arnold@@skeeve.com, Public Domain # May 1993 +# Updated August 2020 to current POSIX @c endfile @end ignore @c file eg/prog/uniq.awk function usage() @{ - print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr" + print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") > "/dev/stderr" exit 1 @} # -c count lines. overrides -d and -u # -d only repeated lines # -u only nonrepeated lines -# -n skip n fields -# +n skip n characters, skip fields first +# -f n skip n fields +# -s n skip n characters, skip fields first +@c endfile +@end example + +The POSIX standard for @command{uniq} allows options to start with +@samp{+} as well as with @samp{-}. An initial @code{BEGIN} rule +traverses the arguments changing any leading @samp{+} to @samp{-} +so that the @code{getopt()} function can parse the options: + +@example +@c file eg/prog/uniq.awk +# As of 2020, '+' can be used as option character in addition to '-' +# Previously allowed use of -N to skip fields and +N to skip +# characters is no longer allowed, and not supported by this version. + +BEGIN @{ + # Convert + to - so getopt can handle things + for (i = 1; i < ARGC; i++) @{ + first = substr(ARGV[i], 1, 1) + if (ARGV[i] == "--" || (first != "-" && first != "+")) + break + else if (first == "+") + # Replace "+" with "-" + ARGV[i] = "-" substr(ARGV[i], 2) + @} +@} +@c endfile +@end example + +The next @code{BEGIN} rule deals with the command-line arguments and options. +If no options are supplied, then the default is taken, to print both +repeated and nonrepeated lines. The output file, if provided, is assigned +to @code{outputfile}. Early on, @code{outputfile} is initialized to the +standard output, @file{/dev/stdout}: +@example +@c file eg/prog/uniq.awk BEGIN @{ count = 1 outputfile = "/dev/stdout" - opts = "udc0:1:2:3:4:5:6:7:8:9:" + opts = "udcf:s:" while ((c = getopt(ARGC, ARGV, opts)) != -1) @{ if (c == "u") non_repeated_only++ @@ -25121,26 +25140,14 @@ BEGIN @{ repeated_only++ else if (c == "c") do_count++ - else if (index("0123456789", c) != 0) @{ - # getopt() requires args to options - # this messes us up for things like -5 - if (Optarg ~ /^[[:digit:]]+$/) - fcount = (c Optarg) + 0 - else @{ - fcount = c + 0 - Optind-- - @} - @} else + else if (c == "f") + fcount = Optarg + 0 + else if (c == "s") + charcount = Optarg + 0 + else usage() @} -@group - if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{ - charcount = substr(ARGV[Optind], 2) + 0 - Optind++ - @} -@end group - for (i = 1; i < Optind; i++) ARGV[i] = "" @@ -25270,20 +25277,15 @@ As a side note, this program does not follow our recommended convention of namin global variables with a leading capital letter. Doing that would make the program a little easier to follow. -@ifset FOR_PRINT The logic for choosing which lines to print represents a @dfn{state -machine}, which is ``a device that can be in one of a set number of stable -conditions depending on its previous condition and on the present values -of its inputs.''@footnote{This is the definition returned from entering -@code{define: state machine} into Google.} -Brian Kernighan suggests that -``an alternative approach to state machines is to just read -the input into an array, then use indexing. It's almost always -easier code, and for most inputs where you would use this, just -as fast.'' Consider how to rewrite the logic to follow this -suggestion. -@end ifset - +machine}, which is ``a device which can be in one of a set number +of stable conditions depending on its previous condition and on the +present values of its inputs.''@footnote{This definition is from +@uref{https://www.lexico.com/en/definition/state_machine}.} Brian +Kernighan suggests that ``an alternative approach to state machines is +to just read the input into an array, then use indexing. It's almost +always easier code, and for most inputs where you would use this, just +as fast.'' Consider how to rewrite the logic to follow this suggestion. @node Wc Program |