aboutsummaryrefslogtreecommitdiffstats
path: root/gawk-info
diff options
context:
space:
mode:
Diffstat (limited to 'gawk-info')
-rw-r--r--gawk-info6151
1 files changed, 6151 insertions, 0 deletions
diff --git a/gawk-info b/gawk-info
new file mode 100644
index 00000000..361bd0c5
--- /dev/null
+++ b/gawk-info
@@ -0,0 +1,6151 @@
+Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
+file gawk.texinfo.
+
+This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+Copyright (C) 1989 Free Software Foundation, Inc.
+
+Permission is granted to make and distribute verbatim copies of this
+manual provided the copyright notice and this permission notice are
+preserved on all copies.
+
+Permission is granted to copy and distribute modified versions of
+this manual under the conditions for verbatim copying, provided that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+Permission is granted to copy and distribute translations of this
+manual into another language, under the above conditions for modified
+versions, except that this permission notice may be stated in a
+translation approved by the Foundation.
+
+
+
+File: gawk-info, Node: Top, Next: Preface, Prev: (dir), Up: (dir)
+
+This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them; it
+contains the following chapters:
+
+* Menu:
+
+* Preface:: What you can do with `awk'; brief history
+ and acknowledgements.
+
+* License:: Your right to copy and distribute `gawk'.
+
+* This Manual:: Using this manual.
+
+ Includes sample input files that you can use.
+
+* Getting Started:: A basic introduction to using `awk'.
+ How to run an `awk' program. Command line syntax.
+
+* Reading Files:: How to read files and manipulate fields.
+
+* Printing:: How to print using `awk'. Describes the
+ `print' and `printf' statements.
+ Also describes redirection of output.
+
+* One-liners:: Short, sample `awk' programs.
+
+* Patterns:: The various types of patterns explained in detail.
+
+* Actions:: The various types of actions are introduced here.
+ Describes expressions and the various operators in
+ detail. Also describes comparison expressions.
+
+* Statements:: The various control statements are described in
+ detail.
+
+* Arrays:: The description and use of arrays. Also includes
+ array--oriented control statements.
+
+* User-defined:: User--defined functions are described in detail.
+
+* Built-in:: The built--in functions are summarized here.
+
+* Special:: The special variables are summarized here.
+
+* Sample Program:: A sample `awk' program with a complete explanation.
+
+* Notes:: Something about the implementation of `gawk'.
+
+* Glossary:: An explanation of some unfamiliar terms.
+
+* Index::
+
+
+
+File: gawk-info, Node: Preface, Next: License, Prev: Top, Up: Top
+
+Preface
+*******
+
+If you are like many computer users, you frequently would like to
+make changes in various text files wherever certain patterns appear,
+or extract data from parts of certain lines while discarding the
+rest. To write a program to do this in a language such as C or
+Pascal is a time--consuming inconvenience that may take many lines of
+code. The job may be easier with `awk'.
+
+The `awk' utility interprets a special--purpose programming language
+that makes it possible to handle simple data--reformatting jobs
+easily with just a few lines of code.
+
+The GNU implementation of `awk' is called `gawk'; it is fully upward
+compatible with the System V Release 3.1 and later version of `awk'.
+All properly written `awk' programs should work with `gawk'. So we
+usually don't distinguish between `gawk' and other `awk'
+implementations in this manual.
+
+This manual teaches you what `awk' does and how you can use `awk'
+effectively. You should already be familiar with basic,
+general--purpose, operating system commands such as `ls'. Using
+`awk' you can:
+
+ * manage small, personal databases,
+
+ * generate reports,
+
+ * validate data,
+
+ * produce indexes, and perform other document preparation tasks,
+
+ * even experiment with algorithms that can be adapted later to
+ other computer languages!
+
+* Menu:
+
+* History:: The history of gawk and awk. Acknowledgements.
+
+
+
+File: gawk-info, Node: History, Up: Preface
+
+History of `awk' and `gawk'
+===========================
+
+The name `awk' comes from the initials of its designers: Alfred V.
+Aho, Peter J. Weinberger, and Brian W. Kernighan. The original
+version of `awk' was written in 1977. In 1985 a new version made the
+programming language more powerful, introducing user--defined
+functions, multiple input streams, and computed regular expressions.
+
+The GNU implementation, `gawk', was written in 1986 by Paul Rubin and
+Jay Fenlason, with advice from Richard Stallman. John Woods
+contributed parts of the code as well. In 1988, David Trueman, with
+help from Arnold Robbins, reworked `gawk' for compatibility with the
+newer `awk'.
+
+Many people need to be thanked for their assistance in producing this
+manual. Jay Fenlason contributed many ideas and sample programs.
+Richard Mlynarik and Robert Chassell gave helpful comments on drafts
+of this manual. The paper ``A Supplemental Document for `awk''' by
+John W. Pierce of the Chemistry Department at UC San Diego,
+pinpointed several issues relevant both to `awk' implementation and
+to this manual, that would otherwise have escaped us.
+
+Finally, we would like to thank Brian Kernighan of Bell Labs for
+invaluable assistance during the testing and debugging of `gawk', and
+for help in clarifying several points about the language.
+
+
+
+File: gawk-info, Node: License, Next: This Manual, Prev: Preface, Up: Top
+
+GNU GENERAL PUBLIC LICENSE
+**************************
+
+ Version 1, February 1989
+
+ Copyright (C) 1989 Free Software Foundation, Inc.
+ 675 Mass Ave, Cambridge, MA 02139, USA
+
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+=========
+
+ The license agreements of most software companies try to keep users
+at the mercy of those companies. By contrast, our General Public
+License is intended to guarantee your freedom to share and change
+free software--to make sure the software is free for all its users.
+The General Public License applies to the Free Software Foundation's
+software and to any other program whose authors commit to using it.
+You can use it for your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Specifically, the General Public License is designed to make
+sure that you have the freedom to give away or sell copies of free
+software, that you receive source code or can get it if you want it,
+that you can change the software or use pieces of it in new free
+programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if
+you distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of a such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must tell them their rights.
+
+ We protect your rights with two steps: (1) copyright the software,
+and (2) offer you this license which gives you legal permission to
+copy, distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on,
+we want its recipients to know that what they have is not the
+original, so that any problems introduced by others will not reflect
+on the original authors' reputations.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 1. This License Agreement applies to any program or other work
+ which contains a notice placed by the copyright holder saying it
+ may be distributed under the terms of this General Public
+ License. The ``Program'', below, refers to any such program or
+ work, and a ``work based on the Program'' means either the
+ Program or any work containing the Program or a portion of it,
+ either verbatim or with modifications. Each licensee is
+ addressed as ``you''.
+
+ 2. You may copy and distribute verbatim copies of the Program's
+ source code as you receive it, in any medium, provided that you
+ conspicuously and appropriately publish on each copy an
+ appropriate copyright notice and disclaimer of warranty; keep
+ intact all the notices that refer to this General Public License
+ and to the absence of any warranty; and give any other
+ recipients of the Program a copy of this General Public License
+ along with the Program. You may charge a fee for the physical
+ act of transferring a copy.
+
+ 3. You may modify your copy or copies of the Program or any portion
+ of it, and copy and distribute such modifications under the
+ terms of Paragraph 1 above, provided that you also do the
+ following:
+
+ * cause the modified files to carry prominent notices stating
+ that you changed the files and the date of any change; and
+
+ * cause the whole of any work that you distribute or publish,
+ that in whole or in part contains the Program or any part
+ thereof, either with or without modifications, to be
+ licensed at no charge to all third parties under the terms
+ of this General Public License (except that you may choose
+ to grant warranty protection to some or all third parties,
+ at your option).
+
+ * If the modified program normally reads commands
+ interactively when run, you must cause it, when started
+ running for such interactive use in the simplest and most
+ usual way, to print or display an announcement including an
+ appropriate copyright notice and a notice that there is no
+ warranty (or else, saying that you provide a warranty) and
+ that users may redistribute the program under these
+ conditions, and telling the user how to view a copy of this
+ General Public License.
+
+ * You may charge a fee for the physical act of transferring a
+ copy, and you may at your option offer warranty protection
+ in exchange for a fee.
+
+ Mere aggregation of another independent work with the Program
+ (or its derivative) on a volume of a storage or distribution
+ medium does not bring the other work under the scope of these
+ terms.
+
+ 4. You may copy and distribute the Program (or a portion or
+ derivative of it, under Paragraph 2) in object code or
+ executable form under the terms of Paragraphs 1 and 2 above
+ provided that you also do one of the following:
+
+ * accompany it with the complete corresponding
+ machine-readable source code, which must be distributed
+ under the terms of Paragraphs 1 and 2 above; or,
+
+ * accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ charge for the cost of distribution) a complete
+ machine-readable copy of the corresponding source code, to
+ be distributed under the terms of Paragraphs 1 and 2 above;
+ or,
+
+ * accompany it with the information you received as to where
+ the corresponding source code may be obtained. (This
+ alternative is allowed only for noncommercial distribution
+ and only if you received the program in object code or
+ executable form alone.)
+
+ Source code for a work means the preferred form of the work for
+ making modifications to it. For an executable file, complete
+ source code means all the source code for all modules it
+ contains; but, as a special exception, it need not include
+ source code for modules which are standard libraries that
+ accompany the operating system on which the executable file
+ runs, or for standard header files or definitions files that
+ accompany that operating system.
+
+ 5. You may not copy, modify, sublicense, distribute or transfer the
+ Program except as expressly provided under this General Public
+ License. Any attempt otherwise to copy, modify, sublicense,
+ distribute or transfer the Program is void, and will
+ automatically terminate your rights to use the Program under
+ this License. However, parties who have received copies, or
+ rights to use copies, from you under this General Public License
+ will not have their licenses terminated so long as such parties
+ remain in full compliance.
+
+ 6. By copying, distributing or modifying the Program (or any work
+ based on the Program) you indicate your acceptance of this
+ license to do so, and all its terms and conditions.
+
+ 7. Each time you redistribute the Program (or any work based on the
+ Program), the recipient automatically receives a license from
+ the original licensor to copy, distribute or modify the Program
+ subject to these terms and conditions. You may not impose any
+ further restrictions on the recipients' exercise of the rights
+ granted herein.
+
+ 8. The Free Software Foundation may publish revised and/or new
+ versions of the General Public License from time to time. Such
+ new versions will be similar in spirit to the present version,
+ but may differ in detail to address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+ Program specifies a version number of the license which applies
+ to it and ``any later version'', you have the option of
+ following the terms and conditions either of that version or of
+ any later version published by the Free Software Foundation. If
+ the Program does not specify a version number of the license,
+ you may choose any version ever published by the Free Software
+ Foundation.
+
+ 9. If you wish to incorporate parts of the Program into other free
+ programs whose distribution conditions are different, write to
+ the author to ask for permission. For software which is
+ copyrighted by the Free Software Foundation, write to the Free
+ Software Foundation; we sometimes make exceptions for this. Our
+ decision will be guided by the two goals of preserving the free
+ status of all derivatives of our free software and of promoting
+ the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
+ WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
+ LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS''
+ WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
+ INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
+ ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
+ WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE
+ COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
+ WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
+ MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
+ LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
+ INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
+ INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS
+ OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+ YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH
+ ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+Appendix: How to Apply These Terms to Your New Programs
+=======================================================
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to humanity, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these
+terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the ``copyright'' line and a pointer to where the full notice is found.
+
+ ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
+ Copyright (C) 19YY NAME OF AUTHOR
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 1, or (at your option)
+ any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+ Also add information on how to contact you by electronic and paper
+mail.
+
+If the program is interactive, make it output a short notice like
+this when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+ The hypothetical commands `show w' and `show c' should show the
+appropriate parts of the General Public License. Of course, the
+commands you use may be called something other than `show w' and
+`show c'; they could even be mouse-clicks or menu items--whatever
+suits your program.
+
+You should also get your employer (if you work as a programmer) or
+your school, if any, to sign a ``copyright disclaimer'' for the
+program, if necessary. Here a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the
+ program `Gnomovision' (a program to direct compilers to make passes
+ at assemblers) written by James Hacker.
+
+ SIGNATURE OF TY COON, 1 April 1989
+ Ty Coon, President of Vice
+
+That's all there is to it!
+
+
+
+File: gawk-info, Node: This Manual, Next: Getting Started, Prev: License, Up: Top
+
+Using This Manual
+*****************
+
+The term `gawk' refers to a program (a version of `awk') developed by
+the Free Software Foundation, and to the language you use to tell it
+what to do. When we need to be careful, we call the program ``the
+`awk' utility'' and the language ``the `awk' language''. The purpose
+of this manual is to explain the `awk' language and how to run the
+`awk' utility.
+
+The term "`awk' program" refers to a program written by you in the
+`awk' programming language.
+
+*Note Getting Started::, for the bare essentials you need to know to
+start using `awk'.
+
+Useful ``one--liners'' are included to give you a feel for the `awk'
+language (*note One-liners::.).
+
+A sizable sample `awk' program has been provided for you (*note
+Sample Program::.).
+
+If you find terms that you aren't familiar with, try looking them up
+in the glossary (*note Glossary::.).
+
+Most of the time complete `awk' programs are used as examples, but in
+some of the more advanced sections, only the part of the `awk'
+program that illustrates the concept being described is shown.
+
+* Menu:
+
+This chapter contains the following sections:
+
+* The Files:: Sample data files for use in the `awk' programs
+ illustrated in this manual.
+
+
+
+File: gawk-info, Node: The Files, Up: This Manual
+
+Input Files for the Examples
+============================
+
+This manual contains many sample programs. The data for many of
+those programs comes from two files. The first file, called
+`BBS-list', represents a list of computer bulletin board systems and
+information about those systems.
+
+Each line of this file is one "record". Each record contains the
+name of a computer bulletin board, its phone number, the board's baud
+rate, and a code for the number of hours it is operational. An `A'
+in the last column means the board operates 24 hours all week. A `B'
+in the last column means the board operates evening and weekend
+hours, only. A `C' means the board operates only on weekends.
+
+ aardvark 555-5553 1200/300 B
+ alpo-net 555-3412 2400/1200/300 A
+ barfly 555-7685 1200/300 A
+ bites 555-1675 2400/1200/300 A
+ camelot 555-0542 300 C
+ core 555-2912 1200/300 C
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sdace 555-3430 2400/1200/300 A
+ sabafoo 555-2127 1200/300 C
+
+The second data file, called `inventory-shipped', represents
+information about shipments during the year. Each line of this file
+is also one record. Each record contains the month of the year, the
+number of green crates shipped, the number of red boxes shipped, the
+number of orange bags shipped, and the number of blue packages
+shipped, respectively.
+
+ Jan 13 25 15 115
+ Feb 15 32 24 226
+ Mar 15 24 34 228
+ Apr 31 52 63 420
+ May 16 34 29 208
+ Jun 31 42 75 492
+ Jul 24 34 67 436
+ Aug 15 34 47 316
+ Sep 13 55 37 277
+ Oct 29 54 68 525
+ Nov 20 87 82 577
+ Dec 17 35 61 401
+
+ Jan 21 36 64 620
+ Feb 26 58 80 652
+ Mar 24 75 70 495
+ Apr 21 70 74 514
+
+If you are reading this in GNU Emacs using Info, you can copy the
+regions of text showing these sample files into your own test files.
+This way you can try out the examples shown in the remainder of this
+document. You do this by using the command `M-x write-region' to
+copy text from the Info file into a file for use with `awk' (see your
+``GNU Emacs Manual'' for more information). Using this information,
+create your own `BBS-list' and `inventory-shipped' files, and
+practice what you learn in this manual.
+
+
+
+File: gawk-info, Node: Getting Started, Next: Reading Files, Prev: This Manual, Up: Top
+
+Getting Started With `awk'
+**************************
+
+The basic function of `awk' is to search files for lines (or other
+units of text) that contain certain patterns. When a line matching
+any of those patterns is found, `awk' performs specified actions on
+that line. Then `awk' keeps processing input lines until the end of
+the file is reached.
+
+An `awk' "program" or "script" consists of a series of "rules".
+(They may also contain "function definitions", but that is an
+advanced feature, so let's ignore it for now. *Note User-defined::.)
+
+A rule contains a "pattern", an "action", or both. Actions are
+enclosed in curly braces to distinguish them from patterns.
+Therefore, an `awk' program is a sequence of rules in the form:
+
+ PATTERN { ACTION }
+ PATTERN { ACTION }
+ ...
+
+ * Menu:
+
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one--line example with two rules.
+* More Complex:: A more complex example.
+* Running gawk:: How to run gawk programs; includes command line syntax.
+* Comments:: Adding documentation to gawk programs.
+* Statements/Lines:: Subdividing or combining statements into lines.
+
+* When:: When to use gawk and when to use other things.
+
+
+
+File: gawk-info, Node: Very Simple, Next: Two Rules, Up: Getting Started
+
+A Very Simple Example
+=====================
+
+The following command runs a simple `awk' program that searches the
+input file `BBS-list' for the string of characters: `foo'. (A string
+of characters is usually called, quite simply, a "string".)
+
+ awk '/foo/ { print $0 }' BBS-list
+
+When lines containing `foo' are found, they are printed, because
+`print $0' means print the current line. (Just `print' by itself
+also means the same thing, so we could have written that instead.)
+
+You will notice that slashes, `/', surround the string `foo' in the
+actual `awk' program. The slashes indicate that `foo' is a pattern
+to search for. This type of pattern is called a "regular
+expression", and is covered in more detail later (*note Regexp::.).
+There are single quotes around the `awk' program so that the shell
+won't interpret any of it as special shell characters.
+
+Here is what this program prints:
+
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sabafoo 555-2127 1200/300 C
+
+In an `awk' rule, either the pattern or the action can be omitted,
+but not both.
+
+If the pattern is omitted, then the action is performed for *every*
+input line.
+
+If the action is omitted, the default action is to print all lines
+that match the pattern. We could leave out the action (the print
+statement and the curly braces) in the above example, and the result
+would be the same: all lines matching the pattern `foo' would be
+printed. (By comparison, omitting the print statement but retaining
+the curly braces makes an empty action that does nothing; then no
+lines would be printed.)
+
+
+
+File: gawk-info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started
+
+An Example with Two Rules
+=========================
+
+The `awk' utility reads the input files one line at a time. For each
+line, `awk' tries the patterns of all the rules. If several patterns
+match then several actions are run, in the order in which they appear
+in the `awk' program. If no patterns match, then no actions are run.
+
+After processing all the rules (perhaps none) that match the line,
+`awk' reads the next line (however, *note Next::.). This continues
+until the end of the file is reached.
+
+For example, the `awk' program:
+
+ /12/ { print $0 }
+ /21/ { print $0 }
+
+contains two rules. The first rule has the string `12' as the
+pattern and `print $0' as the action. The second rule has the string
+`21' as the pattern and also has `print $0' as the action. Each
+rule's action is enclosed in its own pair of braces.
+
+This `awk' program prints every line that contains the string `12'
+*or* the string `21'. If a line contains both strings, it is printed
+twice, once by each rule.
+
+If we run this program on our two sample data files, `BBS-list' and
+`inventory-shipped', as shown here:
+
+ awk '/12/ { print $0 }
+ /21/ { print $0 }' BBS-list inventory-shipped
+
+we get the following output:
+
+ aardvark 555-5553 1200/300 B
+ alpo-net 555-3412 2400/1200/300 A
+ barfly 555-7685 1200/300 A
+ bites 555-1675 2400/1200/300 A
+ core 555-2912 1200/300 C
+ fooey 555-1234 2400/1200/300 B
+ foot 555-6699 1200/300 B
+ macfoo 555-6480 1200/300 A
+ sdace 555-3430 2400/1200/300 A
+ sabafoo 555-2127 1200/300 C
+ sabafoo 555-2127 1200/300 C
+ Jan 21 36 64 620
+ Apr 21 70 74 514
+
+Note how the line in `BBS-list' beginning with `sabafoo' was printed
+twice, once for each rule.
+
+
+
+File: gawk-info, Node: More Complex, Next: Running gawk, Prev: Two Rules, Up: Getting Started
+
+A More Complex Example
+======================
+
+Here is an example to give you an idea of what typical `awk' programs
+do. This example shows how `awk' can be used to summarize, select,
+and rearrange the output of another utility. It uses features that
+haven't been covered yet, so don't worry if you don't understand all
+the details.
+
+ ls -l | awk '$5 == "Nov" { sum += $4 }
+ END { print sum }'
+
+This command prints the total number of bytes in all the files in the
+current directory that were last modified in November (of any year).
+(In the C shell you would need to type a semicolon and then a
+backslash at the end of the first line; in the Bourne shell you can
+type the example as shown.)
+
+The `ls -l' part of this example is a command that gives you a full
+listing of all the files in a directory, including file size and date.
+Its output looks like this:
+
+ -rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile
+ -rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h
+ -rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h
+ -rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y
+ -rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c
+ -rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c
+ -rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c
+ -rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c
+
+The first field contains read--write permissions, the second field
+contains the number of links to the file, and the third field
+identifies the owner of the file. The fourth field contains the size
+of the file in bytes. The fifth, sixth, and seventh fields contain
+the month, day, and time, respectively, that the file was last
+modified. Finally, the eighth field contains the name of the file.
+
+The `$5 == "Nov"' in our `awk' program is an expression that tests
+whether the fifth field of the output from `ls -l' matches the string
+`Nov'. Each time a line has the string `Nov' in its fifth field, the
+action `{ sum += $4 }' is performed. This adds the fourth field (the
+file size) to the variable `sum'. As a result, when `awk' has
+finished reading all the input lines, `sum' will be the sum of the
+sizes of files whose lines matched the pattern.
+
+After the last line of output from `ls' has been processed, the `END'
+pattern is executed, and the value of `sum' is printed. In this
+example, the value of `sum' would be 80600.
+
+These more advanced `awk' techniques are covered in later sections
+(*note Actions::.). Before you can move on to more advanced `awk'
+programming, you have to know how `awk' interprets your input and
+displays your output. By manipulating "fields" and using special
+"print" statements, you can produce some very useful and spectacular
+looking reports.
+
+
+
+File: gawk-info, Node: Running gawk, Next: Comments, Prev: More Complex, Up: Getting Started
+
+How to Run `awk' Programs
+=========================
+
+There are several ways to run an `awk' program. If the program is
+short, it is easiest to include it in the command that runs `awk',
+like this:
+
+ awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
+
+ where PROGRAM consists of a series of PATTERNS and ACTIONS, as
+described earlier.
+
+When the program is long, you would probably prefer to put it in a
+file and run it with a command like this:
+
+ awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ...
+
+ * Menu:
+
+* One-shot:: Running a short throw--away `awk' program.
+* Read Terminal:: Using no input files (input from terminal instead).
+* Long:: Putting permanent `awk' programs in files.
+* Executable Scripts:: Making self--contained `awk' programs.
+* Command Line:: How the `awk' command line is laid out.
+
+
+
+File: gawk-info, Node: One-shot, Next: Read Terminal, Up: Running gawk
+
+One--shot Throw--away `awk' Programs
+------------------------------------
+
+Once you are familiar with `awk', you will often type simple programs
+at the moment you want to use them. Then you can write the program
+as the first argument of the `awk' command, like this:
+
+ awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
+
+ where PROGRAM consists of a series of PATTERNS and ACTIONS, as
+described earlier.
+
+This command format tells the shell to start `awk' and use the
+PROGRAM to process records in the input file(s). There are single
+quotes around the PROGRAM so that the shell doesn't interpret any
+`awk' characters as special shell characters. They cause the shell
+to treat all of PROGRAM as a single argument for `awk'. They also
+allow PROGRAM to be more than one line long.
+
+This format is also useful for running short or medium--sized `awk'
+programs from shell scripts, because it avoids the need for a
+separate file for the `awk' program. A self--contained shell script
+is more reliable since there are no other files to misplace.
+
+
+
+File: gawk-info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk
+
+Running `awk' without Input Files
+---------------------------------
+
+You can also use `awk' without any input files. If you type the
+command line:
+
+ awk 'PROGRAM'
+
+then `awk' applies the PROGRAM to the "standard input", which usually
+means whatever you type on the terminal. This continues until you
+indicate end--of--file by typing `Control-d'.
+
+For example, if you type:
+
+ awk '/th/'
+
+whatever you type next will be taken as data for that `awk' program.
+If you go on to type the following data,
+
+ Kathy
+ Ben
+ Tom
+ Beth
+ Seth
+ Karen
+ Thomas
+ `Control-d'
+
+then `awk' will print
+
+ Kathy
+ Beth
+ Seth
+
+as matching the pattern `th'. Notice that it did not recognize
+`Thomas' as matching the pattern. The `awk' language is "case
+sensitive", and matches patterns *exactly*.
+
+
+
+File: gawk-info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk
+
+Running Long Programs
+---------------------
+
+Sometimes your `awk' programs can be very long. In this case it is
+more convenient to put the program into a separate file. To tell
+`awk' to use that file for its program, you type:
+
+ awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ...
+
+ The `-f' tells the `awk' utility to get the `awk' program from the
+file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For
+example, you could put the program:
+
+ /th/
+
+into the file `th-prog'. Then the command:
+
+ awk -f th-prog
+
+does the same thing as this one:
+
+ awk '/th/'
+
+which was explained earlier (*note Read Terminal::.). Note that you
+don't usually need single quotes around the file name that you
+specify with `-f', because most file names don't contain any of the
+shell's special characters.
+
+If you want to identify your `awk' program files clearly as such, you
+can add the extension `.awk' to the filename. This doesn't affect
+the execution of the `awk' program, but it does make ``housekeeping''
+easier.
+
+
+
+File: gawk-info, Node: Executable Scripts, Next: Command Line, Prev: Long, Up: Running gawk
+
+Executable `awk' Programs
+-------------------------
+
+(The following section assumes that you are already somewhat familiar
+with `awk'.)
+
+Once you have learned `awk', you may want to write self--contained
+`awk' scripts, using the `#!' script mechanism. You can do this on
+BSD Unix systems and GNU.
+
+For example, you could create a text file named `hello', containing
+the following (where `BEGIN' is a feature we have not yet discussed):
+
+ #! /bin/awk -f
+
+ # a sample awk program
+
+ BEGIN { print "hello, world" }
+
+After making this file executable (with the `chmod' command), you can
+simply type:
+
+ hello
+
+at the shell, and the system will arrange to run `awk' as if you had
+typed:
+
+ awk -f hello
+
+Self--contained `awk' scripts are particularly useful for putting
+`awk' programs into production on your system, without your users
+having to know that they are actually using an `awk' program.
+
+If your system does not support the `#!' mechanism, you can get a
+similar effect using a regular shell script. It would look something
+like this:
+
+ : a sample awk program
+
+ awk 'PROGRAM' "$@"
+
+Using this technique, it is *vital* to enclose the PROGRAM in single
+quotes to protect it from interpretation by the shell. If you omit
+the quotes, only a shell wizard can predict the result.
+
+The `"$@"' causes the shell to forward all the command line arguments
+to the `awk' program, without interpretation.
+
+
+
+File: gawk-info, Node: Command Line, Prev: Executable Scripts, Up: Running gawk
+
+Details of the `awk' Command Line
+---------------------------------
+
+(The following section assumes that you are already familiar with
+`awk'.)
+
+There are two ways to run `awk'. Here are templates for both of
+them; items enclosed in `[' and `]' in these templates are optional.
+
+ awk [ -FFS ] [ -- ] 'PROGRAM' FILE ...
+ awk [ -FFS ] -f SOURCE-FILE [ -f SOURCE-FILE ... ] [ -- ] FILE ...
+
+ Options begin with a minus sign, and consist of a single character.
+The options and their meanings are as follows:
+
+`-FFS'
+ This sets the `FS' variable to FS (*note Special::.). As a
+ special case, if FS is `t', then `FS' will be set to the tab
+ character (`"\t"').
+
+`-f SOURCE-FILE'
+ Indicates that the `awk' program is to be found in SOURCE-FILE
+ instead of in the first non--option argument.
+
+`--'
+ This signals the end of the command line options. If you wish
+ to specify an input file named `-f', you can precede it with the
+ `--' argument to prevent the `-f' from being interpreted as an
+ option. This handling of `--' follows the POSIX argument
+ parsing conventions.
+
+Any other options will be flagged as invalid with a warning message,
+but are otherwise ignored.
+
+If the `-f' option is *not* used, then the first non--option command
+line argument is expected to be the program text.
+
+The `-f' option may be used more than once on the command line.
+`awk' will read its program source from all of the named files, as if
+they had been concatenated together into one big file. This is
+useful for creating libraries of `awk' functions. Useful functions
+can be written once, and then retrieved from a standard place,
+instead of having to be included into each individual program. You
+can still type in a program at the terminal and use library
+functions, by specifying `/dev/tty' as one of the arguments to a
+`-f'. Type your program, and end it with the keyboard end--of--file
+character `Control-d'.
+
+Any additional arguments on the command line are made available to
+your `awk' program in the `ARGV' array (*note Special::.). These
+arguments are normally treated as input files to be processed in the
+order specified. However, an argument that has the form VAR`='VALUE,
+means to assign the value VALUE to the variable VAR--it does not
+specify a file at all.
+
+Command line options and the program text (if present) are omitted
+from the `ARGV' array. All other arguments, including variable
+assignments, are included (*note Special::.).
+
+The distinction between file name arguments and variable--assignment
+arguments is made when `awk' is about to open the next input file.
+At that point in execution, it checks the ``file name'' to see
+whether it is really a variable assignment; if so, instead of trying
+to read a file it will, *at that point in the execution*, assign the
+variable.
+
+Therefore, the variables actually receive the specified values after
+all previously specified files have been read. In particular, the
+values of variables assigned in this fashion are *not* available
+inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run
+before `awk' begins scanning the argument list.
+
+The variable assignment feature is most useful for assigning to
+variables such as `RS', `OFS', and `ORS', which control input and
+output formats, before listing the data files. It is also useful for
+controlling state if multiple passes are needed over a data file.
+For example:
+
+ awk 'pass == 1 { PASS 1 STUFF }
+ pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile
+
+
+
+File: gawk-info, Node: Comments, Next: Statements/Lines, Prev: Running gawk, Up: Getting Started
+
+Comments in `awk' Programs
+==========================
+
+When you write a complicated `awk' program, you can put "comments" in
+the program file to help you remember what the program does, and how
+it works.
+
+A comment starts with the the sharp sign character, `#', and
+continues to the end of the line. The `awk' language ignores the
+rest of a line following a sharp sign. For example, we could have
+put the following into `th-prog':
+
+ # This program finds records containing the pattern `th'. This is how
+ # you continue comments on additional lines.
+ /th/
+
+You can put comment lines into keyboard--composed throw--away `awk'
+programs also, but this usually isn't very useful; the purpose of a
+comment is to help yourself or another person understand the program
+at another time.
+
+
+
+File: gawk-info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started
+
+`awk' Statements versus Lines
+=============================
+
+Most often, each line in an `awk' program is a separate statement or
+separate rule, like this:
+
+ awk '/12/ { print $0 }
+ /21/ { print $0 }' BBS-list inventory-shipped
+
+But sometimes statements can be more than one line, and lines can
+contain several statements.
+
+You can split a statement into multiple lines by inserting a newline
+after any of the following:
+
+ , { ? : || &&
+
+Lines ending in `do' or `else' automatically have their statements
+continued on the following line(s). A newline at any other point
+ends the statement.
+
+If you would like to split a single statement into two lines at a
+point where a newline would terminate it, you can "continue" it by
+ending the first line with a backslash character, `\'. This is
+allowed absolutely anywhere in the statement, even in the middle of a
+string or regular expression. For example:
+
+ awk '/This program is too long, so continue it\
+ on the next line/ { print $1 }'
+
+We have generally not used backslash continuation in the sample
+programs in this manual. Since there is no limit on the length of a
+line, it is never strictly necessary; it just makes programs
+prettier. We have preferred to make them even more pretty by keeping
+the statements short. Backslash continuation is most useful when
+your `awk' program is in a separate source file, instead of typed in
+on the command line.
+
+*Warning: this does not work if you are using the C shell.*
+Continuation with backslash works for `awk' programs in files, and
+also for one--shot programs *provided* you are using the Bourne
+shell, the Korn shell, or the Bourne--again shell. But the C shell
+used on Berkeley Unix behaves differently! There, you must use two
+backslashes in a row, followed by a newline.
+
+When `awk' statements within one rule are short, you might want to
+put more than one of them on a line. You do this by separating the
+statements with semicolons, `;'. This also applies to the rules
+themselves. Thus, the above example program could have been written:
+
+ /12/ { print $0 } ; /21/ { print $0 }
+
+*Note:* It is a new requirement that rules on the same line require
+semicolons as a separator in the `awk' language; it was done for
+consistency with the statements in the action part of rules.
+
+
+
+File: gawk-info, Node: When, Prev: Statements/Lines, Up: Getting Started
+
+When to Use `awk'
+=================
+
+What use is all of this to me, you might ask? Using additional
+operating system utilities, more advanced patterns, field separators,
+arithmetic statements, and other selection criteria, you can produce
+much more complex output. The `awk' language is very useful for
+producing reports from large amounts of raw data, like summarizing
+information from the output of standard operating system programs
+such as `ls'. (*Note A More Complex Example: More Complex.)
+
+Programs written with `awk' are usually much smaller than they would
+be in other languages. This makes `awk' programs easy to compose and
+use. Often `awk' programs can be quickly composed at your terminal,
+used once, and thrown away. Since `awk' programs are interpreted,
+you can avoid the usually lengthy edit--compile--test--debug cycle of
+software development.
+
+Complex programs have been written in `awk', including a complete
+retargetable assembler for 8--bit microprocessors (*note Glossary::.
+for more information) and a microcode assembler for a special purpose
+Prolog computer. However, `awk''s capabilities are strained by tasks
+of such complexity.
+
+If you find yourself writing `awk' scripts of more than, say, a few
+hundred lines, you might consider using a different programming
+language. Emacs Lisp is a good choice if you need sophisticated
+string or pattern matching capabilities. The shell is also good at
+string and pattern matching; in addition it allows powerful use of
+the standard utilities. More conventional languages like C, C++, or
+Lisp offer better facilities for system programming and for managing
+the complexity of large programs. Programs in these languages may
+require more lines of source code than the equivalent `awk' programs,
+but they will be easier to maintain and usually run more efficiently.
+
+
+
+File: gawk-info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top
+
+Reading Files (Input)
+*********************
+
+In the typical `awk' program, all input is read either from the
+standard input (usually the keyboard) or from files whose names you
+specify on the `awk' command line. If you specify input files, `awk'
+reads data from the first one until it reaches the end; then it reads
+the second file until it reaches the end, and so on. The name of the
+current input file can be found in the special variable `FILENAME'
+(*note Special::.).
+
+The input is split automatically into "records", and processed by the
+rules one record at a time. (Records are the units of text mentioned
+in the introduction; by default, a record is a line of text.) Each
+record read is split automatically into "fields", to make it more
+convenient for a rule to work on parts of the record under
+consideration.
+
+On rare occasions you will need to use the `getline' command, which
+can do explicit input from any number of files.
+
+* Menu:
+
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Field Separators:: The field separator and how to change it.
+* Multiple:: Reading multi--line records.
+
+* Assignment Options:: Setting variables on the command line and a summary
+ of command line syntax. This is an advanced method
+ of input.
+
+* Getline:: Reading files under explicit program control
+ using the `getline' function.
+* Close Input:: Closing an input file (so you can read from
+ the beginning once more).
+
+
+
+File: gawk-info, Node: Records, Next: Fields, Up: Reading Files
+
+How Input is Split into Records
+===============================
+
+The `awk' language divides its input into records and fields.
+Records are separated from each other by the "record separator". By
+default, the record separator is the "newline" character. Therefore,
+normally, a record is a line of text.
+
+Sometimes you may want to use a different character to separate your
+records. You can use different characters by changing the special
+variable `RS'.
+
+The value of `RS' is a string that says how to separate records; the
+default value is `"\n"', the string of just a newline character.
+This is why lines of text are the default record. Although `RS' can
+have any string as its value, only the first character of the string
+will be used as the record separator. The other characters are
+ignored. `RS' is exceptional in this regard; `awk' uses the full
+value of all its other special variables.
+
+The value of `RS' is changed by "assigning" it a new value (*note
+Assignment Ops::.). One way to do this is at the beginning of your
+`awk' program, before any input has been processed, using the special
+`BEGIN' pattern (*note BEGIN/END::.). This way, `RS' is changed to
+its new value before any input is read. The new value of `RS' is
+enclosed in quotation marks. For example:
+
+ awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
+
+changes the value of `RS' to `/', the slash character, before reading
+any input. Records are now separated by a slash. The second rule in
+the `awk' program (the action with no pattern) will proceed to print
+each record. Since each `print' statement adds a newline at the end
+of its output, the effect of this `awk' program is to copy the input
+with each slash changed to a newline.
+
+Another way to change the record separator is on the command line,
+using the variable--assignment feature (*note Command Line::.).
+
+ awk '...' RS="/" SOURCE-FILE
+
+`RS' will be set to `/' before processing SOURCE-FILE.
+
+The empty string (a string of no characters) has a special meaning as
+the value of `RS': it means that records are separated only by blank
+lines. *Note Multiple::, for more details.
+
+The `awk' utility keeps track of the number of records that have been
+read so far from the current input file. This value is stored in a
+special variable called `FNR'. It is reset to zero when a new file
+is started. Another variable, `NR', is the total number of input
+records read so far from all files. It starts at zero but is never
+automatically reset to zero.
+
+If you change the value of `RS' in the middle of an `awk' run, the
+new value is used to delimit subsequent records, but the record
+currently being processed (and records already finished) are not
+affected.
+
+
+
+File: gawk-info, Node: Fields, Next: Non-Constant Fields, Prev: Records, Up: Reading Files
+
+Examining Fields
+================
+
+When `awk' reads an input record, the record is automatically
+separated or "parsed" by the interpreter into pieces called "fields".
+By default, fields are separated by whitespace, like words in a line.
+Whitespace in `awk' means any string of one or more spaces and/or
+tabs; other characters such as newline, formfeed, and so on, that are
+considered whitespace by other languages are *not* considered
+whitespace by `awk'.
+
+The purpose of fields is to make it more convenient for you to refer
+to these pieces of the record. You don't have to use them--you can
+operate on the whole record if you wish--but fields are what make
+simple `awk' programs so powerful.
+
+To refer to a field in an `awk' program, you use a dollar--sign, `$',
+followed by the number of the field you want. Thus, `$1' refers to
+the first field, `$2' to the second, and so on. For example, suppose
+the following is a line of input:
+
+ This seems like a pretty nice example.
+
+ Here the first field, or `$1', is `This'; the second field, or `$2',
+is `seems'; and so on. Note that the last field, `$7', is
+`example.'. Because there is no space between the `e' and the `.',
+the period is considered part of the seventh field.
+
+No matter how many fields there are, the last field in a record can
+be represented by `$NF'. So, in the example above, `$NF' would be
+the same as `$7', which is `example.'. Why this works is explained
+below (*note Non-Constant Fields::.). If you try to refer to a field
+beyond the last one, such as `$8' when the record has only 7 fields,
+you get the empty string.
+
+Plain `NF', with no `$', is a special variable whose value is the
+number of fields in the current record.
+
+`$0', which looks like an attempt to refer to the zeroth field, is a
+special case: it represents the whole input record. This is what you
+would use when you aren't interested in fields.
+
+Here are some more examples:
+
+ awk '$1 ~ /foo/ { print $0 }' BBS-list
+
+This example contains the "matching" operator `~' (*note Comparison
+Ops::.). Using this operator, all records in the file `BBS-list'
+whose first field contains the string `foo' are printed.
+
+By contrast, the following example:
+
+ awk '/foo/ { print $1, $NF }' BBS-list
+
+looks for the string `foo' in *the entire record* and prints the
+first field and the last field for each input record containing the
+pattern.
+
+The following program will search the system password file, and print
+the entries for users who have no password.
+
+ awk -F: '$2 == ""' /etc/passwd
+
+This program uses the `-F' option on the command line to set the file
+separator. (Fields in `/etc/passwd' are separated by colons. The
+second field represents a user's encrypted password, but if the field
+is empty, that user has no password.)
+
+
+
+File: gawk-info, Node: Non-Constant Fields, Next: Changing Fields, Prev: Fields, Up: Reading Files
+
+Non-constant Field Numbers
+==========================
+
+The number of a field does not need to be a constant. Any expression
+in the `awk' language can be used after a `$' to refer to a field.
+The `awk' utility evaluates the expression and uses the "numeric
+value" as a field number. Consider this example:
+
+ awk '{ print $NR }'
+
+Recall that `NR' is the number of records read so far: 1 in the first
+record, 2 in the second, etc. So this example will print the first
+field of the first record, the second field of the second record, and
+so on. For the twentieth record, field number 20 will be printed;
+most likely this will make a blank line, because the record will not
+have 20 fields.
+
+Here is another example of using expressions as field numbers:
+
+ awk '{ print $(2*2) }' BBS-list
+
+The `awk' language must evaluate the expression `(2*2)' and use its
+value as the field number to print. The `*' sign represents
+multiplication, so the expression `2*2' evaluates to 4. This
+example, then, prints the hours of operation (the fourth field) for
+every line of the file `BBS-list'.
+
+When you use non--constant field numbers, you may ask for a field
+with a negative number. This always results in an empty string, just
+like a field whose number is too large for the input record. For
+example, `$(1-4)' would try to examine field number -3; it would
+result in an empty string.
+
+If the field number you compute is zero, you get the entire record.
+
+The number of fields in the current record is stored in the special
+variable `NF' (*note Special::.). The expression `$NF' is not a
+special feature: it is the direct consequence of evaluating `NF' and
+using its value as a field number.
+
+
+
+File: gawk-info, Node: Changing Fields, Next: Field Separators, Prev: Non-Constant Fields, Up: Reading Files
+
+Changing the Contents of a Field
+================================
+
+You can change the contents of a field as seen by `awk' within an
+`awk' program; this changes what `awk' perceives as the current input
+record. (The actual input is untouched: `awk' never modifies the
+input file.)
+
+Look at this example:
+
+ awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped
+
+The `-' sign represents subtraction, so this program reassigns field
+three, `$3', to be the value of field two minus ten, ``$2' - 10'.
+(*Note Arithmetic Ops::.) Then field two, and the new value for
+field three, are printed.
+
+In order for this to work, the text in field `$2' must make sense as
+a number; the string of characters must be converted to a number in
+order for the computer to do arithmetic on it. The number resulting
+from the subtraction is converted back to a string of characters
+which then becomes field 3. *Note Conversion::.
+
+When you change the value of a field (as perceived by `awk'), the
+text of the input record is recalculated to contain the new field
+where the old one was. `$0' will from that time on reflect the
+altered field. Thus,
+
+ awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
+
+will print a copy of the input file, with 10 subtracted from the
+second field of each line.
+
+You can also assign contents to fields that are out of range. For
+example:
+
+ awk '{ $6 = ($5 + $4 + $3 + $2)/4) ; print $6 }' inventory-shipped
+
+We've just created `$6', whose value is the average of fields `$2',
+`$3', `$4', and `$5'. The `+' sign represents addition, and the `/'
+sign represents division. For the file `inventory-shipped' `$6'
+represents the average number of parcels shipped for a particular
+month.
+
+Creating a new field changes what `awk' interprets as the current
+input record. The value of `$0' will be recomputed. This
+recomputation affects and is affected by features not yet discussed,
+in particular, the "Output Field Separator", `OFS', which is used to
+separate the fields (*note Output Separators::.), and `NF' (the
+number of fields; *note Fields::.). For example, the value of `NF'
+will be set to the number of the highest out--of--range field you
+create.
+
+Note, however, that merely *referencing* an out--of--range field will
+*not* change the value of either `$0' or `NF'. Referencing an
+out--of--range field merely produces a null string. For example:
+
+ if ($(NF+1) != "")
+ print "can't happen"
+ else
+ print "everything is normal"
+
+should print `everything is normal'. (*Note If::, for more
+information about `awk''s `if-else' statements.)
+
+
+
+File: gawk-info, Node: Field Separators, Next: Multiple, Prev: Changing Fields, Up: Reading Files
+
+Specifying How Fields Are Separated
+===================================
+
+You can change the way `awk' splits a record into fields by changing
+the value of the "field separator". The field separator is
+represented by the special variable `FS' in an `awk' program, and can
+be set by `-F' on the command line. The `awk' language scans each
+input line for the field separator character to determine the
+positions of fields within that line. Shell programmers take note!
+`awk' uses the variable `FS', not `IFS'.
+
+The default value of the field separator is a string containing a
+single space. This value is actually a special case; as you know, by
+default, fields are separated by whitespace sequences, not by single
+spaces: two spaces in a row do not delimit an empty field.
+``Whitespace'' is defined as sequences of one or more spaces or tab
+characters.
+
+You change the value of `FS' by "assigning" it a new value. You can
+do this using the special `BEGIN' pattern (*note BEGIN/END::.). This
+pattern allows you to change the value of `FS' before any input is
+read. The new value of `FS' is enclosed in quotations. For example,
+set the value of `FS' to the string `","':
+
+ awk 'BEGIN { FS = "," } ; { print $2 }'
+
+and use the input line:
+
+ John Q. Smith, 29 Oak St., Walamazoo, MI 42139
+
+This `awk' program will extract the string `29 Oak St.'.
+
+Sometimes your input data will contain separator characters that
+don't separate fields the way you thought they would. For instance,
+the person's name in the example we've been using might have a title
+or suffix attached, such as `John Q. Smith, LXIX'. If you assigned
+`FS' to be `,' then:
+
+ awk 'BEGIN { FS = "," } ; { print $2 }
+
+would extract `LXIX', instead of `29 Oak St.'. If you were expecting
+the program to print the address, you would be surprised. So, choose
+your data layout and separator characters carefully to prevent
+problems like this from happening.
+
+You can assign `FS' to be a series of characters. For example, the
+assignment:
+
+ FS = ", \t"
+
+makes every area of an input line that consists of a comma followed
+by a space and a tab, into a field separator. (`\t' stands for a tab.)
+
+If `FS' is any single character other than a blank, then that
+character is used as the field separator, and two successive
+occurrences of that character do delimit an empty field.
+
+If you assign `FS' to a string longer than one character, that string
+is evaluated as a "regular expression" (*note Regexp::.). The value
+of the regular expression is used as a field separator.
+
+`FS' can be set on the command line. You use the `-F' argument to do
+so. For example:
+
+ awk -F, 'PROGRAM' INPUT-FILES
+
+sets `FS' to be the `,' character. Notice that the argument uses a
+capital `F'. Contrast this with `-f', which specifies a file
+containing an `awk' program. Case is significant in command options:
+the `-F' and `-f' options have nothing to do with each other. You
+can use both options at the same time to set the `FS' argument *and*
+get an `awk' program from a file.
+
+As a special case, if the argument to `-F' is `t', then `FS' is set
+to the tab character. (This is because if you type `-F\t', without
+the quotes, at the shell, the `\' gets deleted, so `awk' figures that
+you really want your fields to be separated with tabs, and not `t's.
+Use `FS="t"' if you really do want to separate your fields with `t's.)
+
+For example, let's use an `awk' program file called `baud.awk' that
+contains the pattern `/300/', and the action `print $1'. We'll use
+the operating system utility `cat' to ``look'' at our program:
+
+ % cat baud.awk
+ /300/ { print $1 }
+
+Let's also set `FS' to be the `-' character. We will apply all this
+information to the file `BBS-list'. This `awk' program will now
+print a list of the names of the bulletin boards that operate at 300
+baud and the first three digits of their phone numbers.
+
+ awk -F- -f baud.awk BBS-list
+
+produces this output:
+
+ aardvark 555
+ alpo
+ barfly 555
+ bites 555
+ camelot 555
+ core 555
+ fooey 555
+ foot 555
+ macfoo 555
+ sdace 555
+ sabafoo 555
+
+Note the second line of output. If you check the original file, you
+will see that the second line looked like this:
+
+ alpo-net 555-3412 2400/1200/300 A
+
+The `-' as part of the system's name was used as the field separator,
+instead of the `-' in the phone number that was originally intended.
+This demonstrates why you have to be careful in choosing your field
+and record separators.
+
+
+
+File: gawk-info, Node: Multiple, Next: Assignment Options, Prev: Field Separators, Up: Reading Files
+
+Multiple--Line Records
+======================
+
+In some data bases, a single line cannot conveniently hold all the
+information in one entry. Then you will want to use multi--line
+records.
+
+The first step in doing this is to choose your data format: when
+records are not defined as single lines, how will you want to define
+them? What should separate records?
+
+One technique is to use an unusual character or string to separate
+records. For example, you could use the formfeed character (written
+`\f' in `awk', as in C) to separate them, making each record a page
+of the file. To do this, just set the variable `RS' to `"\f"' (a
+string containing the formfeed character), or whatever string you
+prefer to use.
+
+Another technique is to have blank lines separate records. By a
+special dispensation, a null string as the value of `RS' indicates
+that records are separated by one or more blank lines. If you set
+`RS' to the null string, a record will always end at the first blank
+line encountered. And the next record won't start until the first
+nonblank line that follows--no matter how many blank lines appear in
+a row, they will be considered one record--separator.
+
+The second step is to separate the fields in the record. One way to
+do this is to put each field on a separate line: to do this, just set
+the variable `FS' to the string `"\n"'. (This simple regular
+expression matches a single newline.) Another idea is to divide each
+of the lines into fields in the normal manner; the regular expression
+`"[ \t\n]+"' will do this nicely by treating the newlines inside the
+record just like spaces.
+
+When `RS' is set to the null string, the newline character *always*
+acts as a field separator. This is in addition to whatever value
+`FS' has. The probable reason for this rule is so that you get
+rational behavior in the default case (i.e. `FS == " "'). This can
+be a problem if you really don't want the newline character to
+separate fields, since there is no way to do that. However, you can
+work around this by using the `split' function to manually break up
+your data (*note String Functions::.).
+
+Here is how to use records separated by blank lines and break each
+line into fields normally:
+
+ awk 'BEGIN { RS = ""; FS = "[ \t\n]+" } ; { print $0 }' BBS-list
+
+
+
+File: gawk-info, Node: Assignment Options, Next: Getline, Prev: Multiple, Up: Reading Files
+
+Assigning Variables on the Command Line
+=======================================
+
+You can include variable "assignments" among the file names on the
+command line used to invoke `awk' (*note Command Line::.). Such
+assignments have the form:
+
+ VARIABLE=TEXT
+
+and allow you to change variables either at the beginning of the
+`awk' run or in between input files. The variable assignment is
+performed at a time determined by its position among the input file
+arguments: after the processing of the preceding input file argument.
+For example:
+
+ awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list
+
+prints the value of field number `n' for all input records. Before
+the first file is read, the command line sets the variable `n' equal
+to 4. This causes the fourth field of the file `inventory-shipped'
+to be printed. After the first file has finished, but before the
+second file is started, `n' is set to 2, so that the second field of
+the file `BBS-list' will be printed.
+
+Command line arguments are made available for explicit examination by
+the `awk' program in an array named `ARGV' (*note Special::.).
+
+
+
+File: gawk-info, Node: Getline, Prev: Assignment Options, Up: Reading Files
+
+Explicit Input with `getline'
+=============================
+
+So far we have been getting our input files from `awk''s main input
+stream--either the standard input (usually your terminal) or the
+files specified on the command line. The `awk' language has a
+special built--in function called `getline' that can be used to read
+input under your explicit control.
+
+This command is quite complex and should *not* be used by beginners.
+The command (and its variations) is covered here because this is the
+section about input. The examples that follow the explanation of the
+`getline' command include material that has not been covered yet.
+Therefore, come back and attempt the `getline' command *after* you
+have reviewed the rest of this manual and have a good knowledge of
+how `awk' works.
+
+When retrieving input, `getline' returns a 1 if it found a record,
+and a 0 if the end of the file was encountered. If there was some
+error in getting a record, such as a file that could not be opened,
+then `getline' returns a -1.
+
+In the following examples, COMMAND stands for a string value that
+represents a shell command.
+
+`getline'
+ The `getline' function can be used by itself, in an `awk'
+ program, to read input from the current input. All it does in
+ this case is read the next input record and split it up into
+ fields. This is useful if you've finished processing the
+ current record, but you want to do some special processing
+ *right now* on the next record. Here's an example:
+
+ awk '{
+ if (t = index($0, "/*")) {
+ if(t > 1)
+ tmp = substr($0, 1, t - 1)
+ else
+ tmp = ""
+ u = index(substr($0, t + 2), "*/")
+ while (! u) {
+ getline
+ t = -1
+ u = index($0, "*/")
+ }
+ if(u <= length($0) - 2)
+ $0 = tmp substr($0, t + u + 3)
+ else
+ $0 = tmp
+ }
+ print $0
+ }'
+
+ This `awk' program deletes all comments, `/* ... */', from the
+ input. By replacing the `print $0' with other statements, you
+ could perform more complicated processing on the de--commented
+ input, such as search it for matches for a regular expression.
+
+ This form of the `getline' command sets `NF' (the number of
+ fields; *note Fields::.), `NR' (the number of records read so
+ far), the `FNR' variable (*note Records::.), and the value of
+ `$0'.
+
+ *Note:* The new value of `$0' will be used in testing the
+ patterns of any subsequent rules. The original value of `$0'
+ that triggered the rule which executed `getline' is lost. By
+ contrast, the `next' statement reads a new record but
+ immediately begins processing it normally, starting with the
+ first rule in the program. *Note Next::.
+
+`getline VAR'
+ This form of `getline' reads a record into the variable VAR.
+ This is useful when you want your program to read the next
+ record from the input file, but you don't want to subject the
+ record to the normal input processing.
+
+ For example, suppose the next line is a comment, or a special
+ string, and you want to read it, but you must make certain that
+ it won't accidentally trigger any rules. This version of
+ `getline' will allow you to read that line and store it in a
+ variable so that the main read--a--line--and--check--each--rule
+ loop of `awk' never sees it.
+
+ The following example swaps every two lines of input. For
+ example, given:
+
+ wan
+ tew
+ free
+ phore
+
+ it outputs:
+
+ tew
+ wan
+ phore
+ free
+
+ Here's the program:
+
+ awk '{
+ if ((getline tmp) > 0) {
+ print tmp
+ print $0
+ } else
+ print $0
+ }'
+
+ The `getline' function used in this way sets only `NR' and `FNR'
+ (and of course, VAR). The record is not split into fields, so
+ the values of the fields (including `$0') and the value of `NF'
+ do not change.
+
+`getline < FILE'
+ This form of the `getline' function takes its input from the
+ file FILE. Here FILE is a string--valued expression that
+ specifies the file name.
+
+ This form is useful if you want to read your input from a
+ particular file, instead of from the main input stream. For
+ example, the following program reads its input record from the
+ file `foo.input' when it encounters a first field with a value
+ equal to 10 in the current input file.
+
+ awk '{
+ if ($1 == 10) {
+ getline < "foo.input"
+ print
+ } else
+ print
+ }'
+
+ Since the main input stream is not used, the values of `NR' and
+ `FNR' are not changed. But the record read is split into fields
+ in the normal manner, so the values of `$0' and other fields are
+ changed. So is the value of `NF'.
+
+ This does not cause the record to be tested against all the
+ patterns in the `awk' program, in the way that would happen if
+ the record were read normally by the main processing loop of
+ `awk'. However the new record is tested against any subsequent
+ rules, just as when `getline' is used without a redirection.
+
+`getline VAR < FILE'
+ This form of the `getline' function takes its input from the
+ file FILE and puts it in the variable VAR. As above, FILE is a
+ string--valued expression that specifies the file to read from.
+
+ In this version of `getline', none of the built--in variables
+ are changed, and the record is not split into fields. The only
+ variable changed is VAR.
+
+ For example, the following program copies all the input files to
+ the output, except for records that say `@include FILENAME'.
+ Such a record is replaced by the contents of the file FILENAME.
+
+ awk '{
+ if (NF == 2 && $1 == "@include") {
+ while ((getline line < $2) > 0)
+ print line
+ close($2)
+ } else
+ print
+ }'
+
+ Note here how the name of the extra input file is not built into
+ the program; it is taken from the data, from the second field on
+ the `@include' line.
+
+ The `close' command is used to ensure that if two identical
+ `@include' lines appear in the input, the entire specified file
+ is included twice. *Note Close Input::.
+
+ One deficiency of this program is that it does not process
+ nested `@include' statements the way a true macro preprocessor
+ would.
+
+`COMMAND | getline'
+ You can "pipe" the output of a command into `getline'. A pipe
+ is simply a way to link the output of one program to the input
+ of another. In this case, the string COMMAND is run as a shell
+ command and its output is piped into `awk' to be used as input.
+ This form of `getline' reads one record from the pipe.
+
+ For example, the following program copies input to output,
+ except for lines that begin with `@execute', which are replaced
+ by the output produced by running the rest of the line as a
+ shell command:
+
+ awk '{
+ if ($1 == "@execute") {
+ tmp = substr($0, 10)
+ while ((tmp | getline) > 0)
+ print
+ close(tmp)
+ } else
+ print
+ }'
+
+ The `close' command is used to ensure that if two identical
+ `@execute' lines appear in the input, the command is run again
+ for each one. *Note Close Input::.
+
+ Given the input:
+
+ foo
+ bar
+ baz
+ @execute who
+ bletch
+
+ the program might produce:
+
+ foo
+ bar
+ baz
+ hack ttyv0 Jul 13 14:22
+ hack ttyp0 Jul 13 14:23 (gnu:0)
+ hack ttyp1 Jul 13 14:23 (gnu:0)
+ hack ttyp2 Jul 13 14:23 (gnu:0)
+ hack ttyp3 Jul 13 14:23 (gnu:0)
+ bletch
+
+ Notice that this program ran the command `who' and printed the
+ result. (If you try this program yourself, you will get
+ different results, showing you logged in.)
+
+ This variation of `getline' splits the record into fields, sets
+ the value of `NF' and recomputes the value of `$0'. The values
+ of `NR' and `FNR' are not changed.
+
+`COMMAND | getline VAR'
+ The output of the command COMMAND is sent through a pipe to
+ `getline' and into the variable VAR. For example, the following
+ program reads the current date and time into the variable
+ `current_time', using the utility called `date', and then prints
+ it.
+
+ awk 'BEGIN {
+ "date" | getline current_time
+ close("date")
+ print "Report printed on " current_time
+ }'
+
+ In this version of `getline', none of the built--in variables
+ are changed, and the record is not split into fields.
+
+
+
+File: gawk-info, Node: Close Input, Up: Getline
+
+Closing Input Files
+-------------------
+
+If the same file name or the same shell command is used with
+`getline' more than once during the execution of the `awk' program,
+the file is opened (or the command is executed) only the first time.
+At that time, the first record of input is read from that file or
+command. The next time the same file or command is used in
+`getline', another record is read from it, and so on.
+
+What this implies is that if you want to start reading the same file
+again from the beginning, or if you want to rerun a shell command
+(rather that reading more output from the command), you must take
+special steps. What you can do is use the `close' statement:
+
+ close (FILENAME)
+
+This statement closes a file or pipe, represented here by FILENAME.
+The string value of FILENAME must be the same value as the string
+used to open the file or pipe to begin with.
+
+Once this statement is executed, the next `getline' from that file or
+command will reopen the file or rerun the command.
+
+
+
+File: gawk-info, Node: Printing, Next: One-liners, Prev: Reading Files, Up: Top
+
+Printing Output
+***************
+
+One of the most common things that actions do is to output or "print"
+some or all of the input. For simple output, use the `print'
+statement. For fancier formatting use the `printf' statement. Both
+are described in this chapter.
+
+* Menu:
+
+* Print:: The `print' statement.
+* Print Examples:: Simple examples of `print' statements.
+* Output Separators:: The output separators and how to change them.
+
+* Redirection:: How to redirect output to multiple files and pipes.
+* Close Output:: How to close output files and pipes.
+
+* Printf:: The `printf' statement.
+
+
+
+File: gawk-info, Node: Print, Next: Print Examples, Up: Printing
+
+The `print' Statement
+=====================
+
+The `print' statement does output with simple, standardized
+formatting. You specify only the strings or numbers to be printed,
+in a list separated by commas. They are output, separated by single
+spaces, followed by a newline. The statement looks like this:
+
+ print ITEM1, ITEM2, ...
+
+ The entire list of items may optionally be enclosed in parentheses.
+The parentheses are necessary if any of the item expressions uses a
+relational operator; otherwise it could be confused with a
+redirection (*note Redirection::.). The relational operators are
+`==', `!=', `<', `>', `>=', `<=', `~' and `!~' (*note Comparison
+Ops::.).
+
+The items printed can be constant strings or numbers, fields of the
+current record (such as `$1'), variables, or any `awk' expressions.
+The `print' statement is completely general for computing *what*
+values to print. With one exception (*note Output Separators::.),
+what you can't do is specify *how* to print them--how many columns to
+use, whether to use exponential notation or not, and so on. For
+that, you need the `printf' statement (*note Printf::.).
+
+To print a fixed piece of text, write a string constant as one item,
+such as `"Hello there"'. If you forget to use the double--quote
+characters, your text will be taken as an `awk' expression, and you
+will probably get an error. Keep in mind that a space will be
+printed between any two items.
+
+The simple statement `print' with no items is equivalent to `print
+$0': it prints the entire current record. To print a blank line, use
+`print ""', where `""' is the null, or empty, string.
+
+Most often, each `print' statement makes one line of output. But it
+isn't limited to one line. If an item value is a string that
+contains a newline, the newline is output along with the rest of the
+string. A single `print' can make any number of lines this way.
+
+
+
+File: gawk-info, Node: Print Examples, Next: Output Separators, Prev: Print, Up: Printing
+
+Examples of `print' Statements
+==============================
+
+Here is an example that prints the first two fields of each input
+record, with a space between them:
+
+ awk '{ print $1, $2 }' inventory-shipped
+
+Its output looks like this:
+
+ Jan 13
+ Feb 15
+ Mar 15
+ ...
+
+ A common mistake in using the `print' statement is to omit the comma
+between two items. This often has the effect of making the items run
+together in the output, with no space. The reason for this is that
+juxtaposing two string expressions in `awk' means to concatenate
+them. For example, without the comma:
+
+ awk '{ print $1 $2 }' inventory-shipped
+
+prints:
+
+ Jan13
+ Feb15
+ Mar15
+ ...
+
+ Neither example's output makes much sense to someone unfamiliar with
+the file `inventory-shipped'. A heading line at the beginning would
+make it clearer. Let's add some headings to our table of months
+(`$1') and green crates shipped (`$2'). We do this using the BEGIN
+pattern (*note BEGIN/END::.) to cause the headings to be printed only
+once:
+
+ awk 'BEGIN { print "Month Crates"
+ print "---- -----" }
+ { print $1, $2 }' inventory-shipped
+
+Did you already guess what will happen? This program prints the
+following:
+
+ Month Crates
+ ---- -----
+ Jan 13
+ Feb 15
+ Mar 15
+ ...
+
+ The headings and the table data don't line up! We can fix this by
+printing some spaces between the two fields:
+
+ awk 'BEGIN { print "Month Crates"
+ print "---- -----" }
+ { print $1, " ", $2 }' inventory-shipped
+
+You can imagine that this way of lining up columns can get pretty
+complicated when you have many columns to fix. Counting spaces for
+two or three columns can be simple, but more than this and you can
+get ``lost'' quite easily. This is why the `printf' statement was
+created (*note Printf::.); one of its specialties is lining up
+columns of data.
+
+
+
+File: gawk-info, Node: Output Separators, Next: Redirection, Prev: Print Examples, Up: Printing
+
+Output Separators
+=================
+
+As mentioned previously, a `print' statement contains a list of
+items, separated by commas. In the output, the items are normally
+separated by single spaces. But they do not have to be spaces; a
+single space is only the default. You can specify any string of
+characters to use as the "output field separator", by setting the
+special variable `OFS'. The initial value of this variable is the
+string `" "'.
+
+The output from an entire `print' statement is called an "output
+record". Each `print' statement outputs one output record and then
+outputs a string called the "output record separator". The special
+variable `ORS' specifies this string. The initial value of the
+variable is the string `"\n"' containing a newline character; thus,
+normally each `print' statement makes a separate line.
+
+You can change how output fields and records are separated by
+assigning new values to the variables `OFS' and/or `ORS'. The usual
+place to do this is in the `BEGIN' rule (*note BEGIN/END::.), so that
+it happens before any input is processed. You may also do this with
+assignments on the command line, before the names of your input files.
+
+The following example prints the first and second fields of each
+input record separated by a semicolon, with a blank line added after
+each line:
+
+ awk 'BEGIN { OFS = ";"; ORS = "\n\n" }
+ { print $1, $2 }' BBS-list
+
+If the value of `ORS' does not contain a newline, all your output
+will be run together on a single line, unless you output newlines
+some other way.
+
+
+
+File: gawk-info, Node: Redirection, Next: Printf, Prev: Output Separators, Up: Printing
+
+Redirecting Output of `print' and `printf'
+==========================================
+
+So far we have been dealing only with output that prints to the
+standard output, usually your terminal. Both `print' and `printf'
+can be told to send their output to other places. This is called
+"redirection".
+
+A redirection appears after the `print' or `printf' statement.
+Redirections in `awk' are written just like redirections in shell
+commands, except that they are written inside the `awk' program.
+
+Here are the three forms of output redirection. They are all shown
+for the `print' statement, but they work for `printf' also.
+
+`print ITEMS > OUTPUT-FILE'
+ This type of redirection prints the items onto the output file
+ OUTPUT-FILE. The file name OUTPUT-FILE can be any expression.
+ Its value is changed to a string and then used as a filename
+ (*note Expressions::.).
+
+ When this type of redirection is used, the OUTPUT-FILE is erased
+ before the first output is written to it. Subsequent writes do
+ not erase OUTPUT-FILE, but append to it. If OUTPUT-FILE does
+ not exist, then it is created.
+
+ For example, here is how one `awk' program can write a list of
+ BBS names to a file `name-list' and a list of phone numbers to a
+ file `phone-list'. Each output file contains one name or number
+ per line.
+
+ awk '{ print $2 > "phone-list"
+ print $1 > "name-list" }' BBS-list
+
+`print ITEMS >> OUTPUT-FILE'
+ This type of redirection prints the items onto the output file
+ OUTPUT-FILE. The difference between this and the single--`>'
+ redirection is that the old contents (if any) of OUTPUT-FILE are
+ not erased. Instead, the `awk' output is appended to the file.
+
+`print ITEMS | COMMAND'
+ It is also possible to send output through a "pipe" instead of
+ into a file. This type of redirection opens a pipe to COMMAND
+ and writes the values of ITEMS through this pipe, to another
+ process created to execute COMMAND.
+
+ The redirection argument COMMAND is actually an `awk'
+ expression. Its value is converted to a string, whose contents
+ give the shell command to be run.
+
+ For example, this produces two files, one unsorted list of BBS
+ names and one list sorted in reverse alphabetical order:
+
+ awk '{ print $1 > "names.unsorted"
+ print $1 | "sort -r > names.sorted" }' BBS-list
+
+ Here the unsorted list is written with an ordinary redirection
+ while the sorted list is written by piping through the `sort'
+ utility.
+
+ Here is an example that uses redirection to mail a message to a
+ mailing list `bug-system'. This might be useful when trouble is
+ encountered in an `awk' script run periodically for system
+ maintenance.
+
+ print "Awk script failed:", $0 | "mail bug-system"
+ print "processing record number", FNR, "of", FILENAME | "mail bug-system"
+ close ("mail bug-system")
+
+ We use a `close' statement here because it's a good idea to
+ close the pipe as soon as all the intended output has been sent
+ to it. *Note Close Output::, for more information on this.
+
+Redirecting output using `>', `>>', or `|' asks the system to open a
+file or pipe only if the particular FILE or COMMAND you've specified
+has not already been written to by your program.
+
+
+
+File: gawk-info, Node: Close Output, Up: Redirection
+
+Closing Output Files and Pipes
+------------------------------
+
+When a file or pipe is opened, the filename or command associated
+with it is remembered by `awk' and subsequent writes to the same file
+or command are appended to the previous writes. The file or pipe
+stays open until `awk' exits. This is usually convenient.
+
+Sometimes there is a reason to close an output file or pipe earlier
+than that. To do this, use the `close' command, as follows:
+
+ close (FILENAME)
+
+or
+
+ close (COMMAND)
+
+The argument FILENAME or COMMAND can be any expression. Its value
+must exactly equal the string used to open the file or pipe to begin
+with--for example, if you open a pipe with this:
+
+ print $1 | "sort -r > names.sorted"
+
+then you must close it with this:
+
+ close ("sort -r > names.sorted")
+
+Here are some reasons why you might need to close an output file:
+
+ * To write a file and read it back later on in the same `awk'
+ program. Close the file when you are finished writing it; then
+ you can start reading it with `getline' (*note Getline::.).
+
+ * To write numerous files, successively, in the same `awk'
+ program. If you don't close the files, eventually you will
+ exceed the system limit on the number of open files in one
+ process. So close each one when you are finished writing it.
+
+ * To make a command finish. When you redirect output through a
+ pipe, the command reading the pipe normally continues to try to
+ read input as long as the pipe is open. Often this means the
+ command cannot really do its work until the pipe is closed. For
+ example, if you redirect output to the `mail' program, the
+ message will not actually be sent until the pipe is closed.
+
+ * To run the same subprogram a second time, with the same arguments.
+ This is not the same thing as giving more input to the first run!
+
+ For example, suppose you pipe output to the `mail' program. If
+ you output several lines redirected to this pipe without closing
+ it, they make a single message of several lines. By contrast,
+ if you close the pipe after each line of output, then each line
+ makes a separate message.
+
+
+
+File: gawk-info, Node: Printf, Prev: Redirection, Up: Printing
+
+Using `printf' Statements For Fancier Printing
+==============================================
+
+If you want more precise control over the output format than `print'
+gives you, use `printf'. With `printf' you can specify the width to
+use for each item, and you can specify various stylistic choices for
+numbers (such as what radix to use, whether to print an exponent,
+whether to print a sign, and how many digits to print after the
+decimal point). You do this by specifying a "format string".
+
+* Menu:
+
+* Basic Printf:: Syntax of the `printf' statement.
+* Format-Control:: Format-control letters.
+* Modifiers:: Format--specification modifiers.
+* Printf Examples:: Several examples.
+
+
+
+File: gawk-info, Node: Basic Printf, Next: Format-Control, Up: Printf
+
+Introduction to the `printf' Statement
+--------------------------------------
+
+The `printf' statement looks like this:
+
+ printf FORMAT, ITEM1, ITEM2, ...
+
+ The entire list of items may optionally be enclosed in parentheses.
+The parentheses are necessary if any of the item expressions uses a
+relational operator; otherwise it could be confused with a
+redirection (*note Redirection::.). The relational operators are
+`==', `!=', `<', `>', `>=', `<=', `~' and `!~' (*note Comparison
+Ops::.).
+
+The difference between `printf' and `print' is the argument FORMAT.
+This is an expression whose value is taken as a string; its job is to
+say how to output each of the other arguments. It is called the
+"format string".
+
+The format string is essentially the same as in the C library
+function `printf'. Most of FORMAT is text to be output verbatim.
+Scattered among this text are "format specifiers", one per item.
+Each format specifier says to output the next item at that place in
+the format.
+
+The `printf' statement does not automatically append a newline to its
+output. It outputs nothing but what the format specifies. So if you
+want a newline, you must include one in the format. The output
+separator variables `OFS' and `ORS' have no effect on `printf'
+statements.
+
+
+
+File: gawk-info, Node: Format-Control, Next: Modifiers, Prev: Basic Printf, Up: Printf
+
+Format--Control Characters
+--------------------------
+
+A format specifier starts with the character `%' and ends with a
+"format--control letter"; it tells the `printf' statement how to
+output one item. (If you actually want to output a `%', write `%%'.)
+The format--control letter specifies what kind of value to print.
+The rest of the format specifier is made up of optional "modifiers"
+which are parameters such as the field width to use.
+
+Here is a list of them:
+
+`c'
+ This prints a number as an ASCII character. Thus, `printf "%c",
+ 65' outputs the letter `A'. The output for a string value is
+ the first character of the string.
+
+`d'
+ This prints a decimal integer.
+
+`e'
+ This prints a number in scientific (exponential) notation. For
+ example,
+
+ printf "%4.3e", 1950
+
+ prints `1.950e+03', with a total of 4 significant figures of
+ which 3 follow the decimal point. The `4.3' are "modifiers",
+ discussed below.
+
+`f'
+ This prints a number in floating point notation.
+
+`g'
+ This prints either scientific notation or floating point
+ notation, whichever is shorter.
+
+`o'
+ This prints an unsigned octal integer.
+
+`s'
+ This prints a string.
+
+`x'
+ This prints an unsigned hexadecimal integer.
+
+`%'
+ This isn't really a format--control letter, but it does have a
+ meaning when used after a `%': the sequence `%%' outputs one
+ `%'. It does not consume an argument.
+
+
+
+File: gawk-info, Node: Modifiers, Next: Printf Examples, Prev: Format-Control, Up: Printf
+
+Modifiers for `printf' Formats
+------------------------------
+
+A format specification can also include "modifiers" that can control
+how much of the item's value is printed and how much space it gets.
+The modifiers come between the `%' and the format--control letter.
+Here are the possible modifiers, in the order in which they may appear:
+
+`-'
+ The minus sign, used before the width modifier, says to
+ left--justify the argument within its specified width. Normally
+ the argument is printed right--justified in the specified width.
+
+`WIDTH'
+ This is a number representing the desired width of a field.
+ Inserting any number between the `%' sign and the format control
+ character forces the field to be expanded to this width. The
+ default way to do this is to pad with spaces on the left.
+
+`.PREC'
+ This is a number that specifies the precision to use when
+ printing. This specifies the number of digits you want printed
+ to the right of the decimal place.
+
+The C library `printf''s dynamic WIDTH and PREC capability (for
+example, `"%*.*s"') is not supported. However, it can be easily
+simulated using concatenation to dynamically build the format string.
+
+
+
+File: gawk-info, Node: Printf Examples, Prev: Modifiers, Up: Printf
+
+Examples of Using `printf'
+--------------------------
+
+Here is how to use `printf' to make an aligned table:
+
+ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
+
+prints the names of bulletin boards (`$1') of the file `BBS-list' as
+a string of 10 characters, left justified. It also prints the phone
+numbers (`$2') afterward on the line. This will produce an aligned
+two--column table of names and phone numbers, like so:
+
+ aardvark 555-5553
+ alpo-net 555-3412
+ barfly 555-7685
+ bites 555-1675
+ camelot 555-0542
+ core 555-2912
+ fooey 555-1234
+ foot 555-6699
+ macfoo 555-6480
+ sdace 555-3430
+ sabafoo 555-2127
+
+Did you notice that we did not specify that the phone numbers be
+printed as numbers? They had to be printed as strings because the
+numbers are separated by a dash. This dash would be interpreted as a
+"minus" sign if we had tried to print the phone numbers as numbers.
+This would have led to some pretty confusing results.
+
+We did not specify a width for the phone numbers because they are the
+last things on their lines. We don't need to put spaces after them.
+
+We could make our table look even nicer by adding headings to the
+tops of the columns. To do this, use the BEGIN pattern (*note
+BEGIN/END::.) to cause the header to be printed only once, at the
+beginning of the `awk' program:
+
+ awk 'BEGIN { print "Name Number"
+ print "--- -----" }
+ { printf "%-10s %s\n", $1, $2 }' BBS-list
+
+Did you notice that we mixed `print' and `printf' statements in the
+above example? We could have used just `printf' statements to get
+the same results:
+
+ awk 'BEGIN { printf "%-10s %s\n", "Name", "Number"
+ printf "%-10s %s\n", "---", "-----" }
+ { printf "%-10s %s\n", $1, $2 }' BBS-list
+
+By outputting each column heading with the same format specification
+used for the elements of the column, we have made sure that the
+headings will be aligned just like the columns.
+
+The fact that the same format specification is used can be emphasized
+by storing it in a variable, like so:
+
+ awk 'BEGIN { format = "%-10s %s\n"
+ printf format, "Name", "Number"
+ printf format, "---", "-----" }
+ { printf format, $1, $2 }' BBS-list
+
+See if you can use the `printf' statement to line up the headings and
+table data for our `inventory-shipped' example covered earlier in the
+section on the `print' statement (*note Print::.).
+
+
+
+File: gawk-info, Node: One-liners, Next: Patterns, Prev: Printing, Up: Top
+
+Useful ``One-liners''
+*********************
+
+Useful `awk' programs are often short, just a line or two. Here is a
+collection of useful, short programs to get you started. Some of
+these programs contain constructs that haven't been covered yet. The
+description of the program will give you a good idea of what is going
+on, but please read the rest of the manual to become an `awk' expert!
+
+`awk '{ num_fields = num_fields + NF }'
+`` END { print num_fields }'''
+ This program prints the total number of fields in all input lines.
+
+`awk 'length($0) > 80''
+ This program prints every line longer than 80 characters. The
+ sole rule has a relational expression as its pattern, and has no
+ action (so the default action, printing the record, is used).
+
+`awk 'NF > 0''
+ This program prints every line that has at least one field.
+ This is an easy way to delete blank lines from a file (or
+ rather, to create a new file similar to the old file but from
+ which the blank lines have been deleted).
+
+`awk '{ if (NF > 0) print }''
+ This program also prints every line that has at least one field.
+ Here we allow the rule to match every line, then decide in the
+ action whether to print.
+
+`awk 'BEGIN { for (i = 1; i <= 7; i++)'
+`` print int(101 * rand()) }'''
+ This program prints 7 random numbers from 0 to 100, inclusive.
+
+`ls -l FILES | awk '{ x += $4 } ; END { print "total bytes: " x }''
+ This program prints the total number of bytes used by FILES.
+
+`expand FILE | awk '{ if (x < length()) x = length() }'
+`` END { print "maximum line length is " x }'''
+ This program prints the maximum line length of FILE. The input
+ is piped through the `expand' program to change tabs into
+ spaces, so the widths compared are actually the right--margin
+ columns.
+
+
+
+File: gawk-info, Node: Patterns, Next: Actions, Prev: One-liners, Up: Top
+
+Patterns
+********
+
+Patterns control the execution of rules: a rule is executed when its
+pattern matches the input record. The `awk' language provides
+several special patterns that are described in the sections that
+follow. Patterns include:
+
+NULL
+ The empty pattern, which matches every input record. (*Note The
+ Empty Pattern: Empty.)
+
+/REGULAR EXPRESSION/
+ A regular expression as a pattern. It matches when the text of
+ the input record fits the regular expression. (*Note Regular
+ Expressions as Patterns: Regexp.)
+
+CONDEXP
+ A single comparison expression. It matches when it is true.
+ (*Note Comparison Expressions as Patterns: Comparison Patterns.)
+
+`BEGIN'
+`END'
+ Special patterns to supply start--up or clean--up information to
+ `awk'. (*Note Specifying Record Ranges With Patterns: BEGIN/END.)
+
+PAT1, PAT2
+ A pair of patterns separated by a comma, specifying a range of
+ records. (*Note Specifying Record Ranges With Patterns: Ranges.)
+
+CONDEXP1 BOOLEAN CONDEXP2
+ A "compound" pattern, which combines expressions with the
+ operators `and', `&&', and `or', `||'. (*Note Boolean
+ Operators and Patterns: Boolean.)
+
+! CONDEXP
+ The pattern CONDEXP is evaluated. Then the `!' performs a
+ boolean ``not'' or logical negation operation; if the input line
+ matches the pattern in CONDEXP then the associated action is
+ *not* executed. If the input line did not match that pattern,
+ then the action *is* executed. (*Note Boolean Operators and
+ Patterns: Boolean.)
+
+(EXPR)
+ Parentheses may be used to control how operators nest.
+
+PAT1 ? PAT2 : PAT3
+ The first pattern is evaluated. If it is true, the input line
+ is tested against the second pattern, otherwise it is tested
+ against the third. (*Note Conditional Patterns: Conditional
+ Patterns.)
+
+* Menu:
+
+The following subsections describe these forms in detail:
+
+* Empty:: The empty pattern, which matches every record.
+
+* Regexp:: Regular expressions such as `/foo/'.
+
+* Comparison Patterns:: Comparison expressions such as `$1 > 10'.
+
+* Boolean:: Combining comparison expressions.
+
+* Ranges:: Using pairs of patterns to specify record ranges.
+
+* BEGIN/END:: Specifying initialization and cleanup rules.
+
+* Conditional Patterns:: Patterns such as `pat1 ? pat2 : pat3'.
+
+
+
+File: gawk-info, Node: Empty, Next: Regexp, Up: Patterns
+
+The Empty Pattern
+=================
+
+An empty pattern is considered to match *every* input record. For
+example, the program:
+
+ awk '{ print $1 }' BBS-list
+
+prints just the first field of every record.
+
+
+
+File: gawk-info, Node: Regexp, Next: Comparison Patterns, Prev: Empty, Up: Patterns
+
+Regular Expressions as Patterns
+===============================
+
+A "regular expression", or "regexp", is a way of describing classes
+of strings. When enclosed in slashes (`/'), it makes an `awk'
+pattern that matches every input record that contains a match for the
+regexp.
+
+The simplest regular expression is a sequence of letters, numbers, or
+both. Such a regexp matches any string that contains that sequence.
+Thus, the regexp `foo' matches any string containing `foo'. (More
+complicated regexps let you specify classes of similar strings.)
+
+* Menu:
+
+* Usage: Regexp Usage. How regexps are used in patterns.
+* Operators: Regexp Operators. How to write a regexp.
+
+
+
+File: gawk-info, Node: Regexp Usage, Next: Regexp Operators, Up: Regexp
+
+How to use Regular Expressions
+------------------------------
+
+When you enclose `foo' in slashes, you get a pattern that matches a
+record that contains `foo'. For example, this prints the second
+field of each record that contains `foo' anywhere:
+
+ awk '/foo/ { print $2 }' BBS-list
+
+Regular expressions can also be used in comparison expressions. Then
+you can specify the string to match against; it need not be the
+entire current input record. These comparison expressions can be
+used as patterns or in `if' and `while' statements.
+
+`EXP ~ /REGEXP/'
+ This is true if the expression EXP (taken as a character string)
+ is matched by REGEXP. The following example matches, or
+ selects, all input records with the letter `J' in the first field:
+
+ awk '$1 ~ /J/' inventory-shipped
+
+ So does this:
+
+ awk '{ if ($1 ~ /J/) print }' inventory-shipped
+
+`EXP !~ /REGEXP/'
+ This is true if the expression EXP (taken as a character string)
+ is *not* matched by REGEXP. The following example matches, or
+ selects, all input records whose first field *does not* contain
+ the letter `J':
+
+ awk '$1 !~ /J/' inventory-shipped
+
+The right hand side of a `~' or `!~' operator need not be a constant
+regexp (i.e. a string of characters between `/'s). It can also be
+"computed", or "dynamic". For example:
+
+ identifier = "[A-Za-z_][A-Za-z_0-9]+"
+ $0 ~ identifier
+
+sets `identifier' to a regexp that describes `awk' variable names,
+and tests if the input record matches this regexp.
+
+A dynamic regexp may actually be any expression. The expression is
+evaluated, and the result is treated as a string that describes a
+regular expression.
+
+
+
+File: gawk-info, Node: Regexp Operators, Prev: Regexp Usage, Up: Regexp
+
+Regular Expression Operators
+----------------------------
+
+You can combine regular expressions with the following characters,
+called "regular expression operators", or "metacharacters", to
+increase the power and versatility of regular expressions. This is a
+table of metacharacters:
+
+`\'
+ This is used to suppress the special meaning of a character when
+ matching. For example:
+
+ \$
+
+ matches the character `$'.
+
+`^'
+ This matches the beginning of the string or the beginning of a
+ line within the string. For example:
+
+ ^@chapter
+
+ matches the `@chapter' at the beginning of a string, and can be
+ used to identify chapter beginnings in Texinfo source files.
+
+`$'
+ This is similar to `^', but it matches only at the end of a
+ string or the end of a line within the string. For example:
+
+ /p$/
+
+ as a pattern matches a record that ends with a `p'.
+
+`.'
+ This matches any single character except a newline. For example:
+
+ .P
+
+ matches any single character followed by a `P' in a string.
+ Using concatenation we can make regular expressions like `U.A',
+ which matches any three--character string that begins with `U'
+ and ends with `A'.
+
+`[...]'
+ This is called a "character set". It matches any one of a group
+ of characters that are enclosed in the square brackets. For
+ example:
+
+ [MVX]
+
+ matches any of the characters `M', `V', or `X' in a string.
+
+ Ranges of characters are indicated by using a hyphen between the
+ beginning and ending characters, and enclosing the whole thing
+ in brackets. For example:
+
+ [0-9]
+
+ matches any string that contains a digit.
+
+ Note that special patterns have to be followed to match the
+ characters, `]', `-', and `^' when they are enclosed in the
+ square brackets. To match a `]', make it the first character in
+ the set. For example:
+
+ []d]
+
+ matches either `]', or `d'.
+
+ To match `-', write it as `--', which is a range containing only
+ `-'. You may also make the `-' be the first or last character
+ in the set. To match `^', make it any character except the
+ first one of a set.
+
+`[^ ...]'
+ This is the "complemented character set". The first character
+ after the `[' *must* be a `^'. This matches any characters
+ *except* those in the square brackets. For example:
+
+ [^0-9]
+
+ matches any characters that are not digits.
+
+`|'
+ This is the "alternation operator" and it is used to specify
+ alternatives. For example:
+
+ ^P|[0-9]
+
+ matches any string that matches either `^P' or `[0-9]'. This
+ means it matches any string that contains a digit or starts with
+ `P'.
+
+`(...)'
+ Parentheses are used for grouping in regular expressions as in
+ arithmetic. They can be used to concatenate regular expressions
+ containing the alternation operator, `|'.
+
+`*'
+ This symbol means that the preceding regular expression is to be
+ repeated as many times as possible to find a match. For example:
+
+ ph*
+
+ applies the `*' symbol to the preceding `h' and looks for
+ matches to one `p' followed by any number of `h''s. This will
+ also match just `p' if no `h''s are present.
+
+ The `*' means repeat the *smallest* possible preceding
+ expression in order to find a match. The `awk' language
+ processes a `*' by matching as many repetitions as can be found.
+ For example:
+
+ awk '/\(c[ad][ad]*r x\)/ { print }' sample
+
+ matches every record in the input containing a string of the
+ form `(car x)', `(cdr x)', `(cadr x)', and so on.
+
+`+'
+ This symbol is similar to `*', but the preceding expression must
+ be matched at least once. This means that:
+
+ wh+y
+
+ would match `why' and `whhy' but not `wy', whereas `wh*y' would
+ match all three of these strings. And this is a simpler way of
+ writing the last `*' example:
+
+ awk '/\(c[ad]+r x\)/ { print }' sample
+
+`?'
+ This symbol is similar to `*', but the preceding expression can
+ be matched once or not at all. For example:
+
+ fe?d
+
+ will match `fed' or `fd', but nothing else.
+
+In regular expressions, the `*', `+', and `?' operators have the
+highest precedence, followed by concatenation, and finally by `|'.
+As in arithmetic, parentheses can change how operators are grouped.
+
+Any other character stands for itself. However, it is important to
+note that case in regular expressions *is* significant, both when
+matching ordinary (i.e. non--metacharacter) characters, and inside
+character sets. Thus a `w' in a regular expression matches only a
+lower case `w' and not either an uppercase or lowercase `w'. When
+you want to do a case--independent match, you have to use a character
+set: `[Ww]'.
+
+
+
+File: gawk-info, Node: Comparison Patterns, Next: Ranges, Prev: Regexp, Up: Patterns
+
+Comparison Expressions as Patterns
+==================================
+
+"Comparison patterns" use "relational operators" to compare strings
+or numbers. The relational operators are the same as in C. Here is
+a table of them:
+
+`X < Y'
+ True if X is less than Y.
+
+`X <= Y'
+ True if X is less than or equal to Y.
+
+`X > Y'
+ True if X is greater than Y.
+
+`X >= Y'
+ True if X is greater than or equal to Y.
+
+`X == Y'
+ True if X is equal to Y.
+
+`X != Y'
+ True if X is not equal to Y.
+
+Comparison expressions can be used as patterns to control whether a
+rule is executed. The expression is evaluated for each input record
+read, and the pattern is considered matched if the condition is "true".
+
+The operands of a relational operator are compared as numbers if they
+are both numbers. Otherwise they are converted to, and compared as,
+strings (*note Conversion::.). Strings are compared by comparing the
+first character of each, then the second character of each, and so on.
+Thus, `"10"' is less than `"9"'.
+
+The following example prints the second field of each input record
+whose first field is precisely `foo'.
+
+ awk '$1 == "foo" { print $2 }' BBS-list
+
+Contrast this with the following regular expression match, which
+would accept any record with a first field that contains `foo':
+
+ awk '$1 ~ "foo" { print $2 }' BBS-list
+
+
+
+File: gawk-info, Node: Ranges, Next: BEGIN/END, Prev: Comparison Patterns, Up: Patterns
+
+Specifying Record Ranges With Patterns
+======================================
+
+A "range pattern" is made of two patterns separated by a comma:
+`BEGPAT, ENDPAT'. It matches ranges of consecutive input records.
+The first pattern BEGPAT controls where the range begins, and the
+second one ENDPAT controls where it ends.
+
+They work as follows: BEGPAT is matched against every input record;
+when a record matches BEGPAT, the range pattern becomes "turned on".
+The range pattern matches this record. As long as it stays turned
+on, it automatically matches every input record read. But meanwhile,
+ENDPAT is matched against every input record, and when it matches,
+the range pattern is turned off again for the following record. Now
+we go back to checking BEGPAT against each record. For example:
+
+ awk '$1 == "on", $1 == "off"'
+
+prints every record between on/off pairs, inclusive.
+
+The record that turns on the range pattern and the one that turns it
+off both match the range pattern. If you don't want to operate on
+these records, you can write `if' statements in the rule's action to
+distinguish them.
+
+It is possible for a pattern to be turned both on and off by the same
+record, if both conditions are satisfied by that record. Then the
+action is executed for just that record.
+
+
+
+File: gawk-info, Node: BEGIN/END, Next: Boolean, Prev: Ranges, Up: Patterns
+
+`BEGIN' and `END' Special Patterns
+==================================
+
+`BEGIN' and `END' are special patterns. They are not used to match
+input records. Rather, they are used for supplying start--up or
+clean--up information to your `awk' script. A `BEGIN' rule is
+executed, once, before the first input record has been read. An
+`END' rule is executed, once, after all the input has been read. For
+example:
+
+ awk 'BEGIN { print "Analysis of ``foo'' program" }
+ /foo/ { ++foobar }
+ END { print "``foo'' appears " foobar " times." }' BBS-list
+
+This program finds out how many times the string `foo' appears in the
+input file `BBS-list'. The `BEGIN' pattern prints out a title for
+the report. There is no need to use the `BEGIN' pattern to
+initialize the counter `foobar' to zero, as `awk' does this for us
+automatically (*note Variables::.). The second rule increments the
+variable `foobar' every time a record containing the pattern `foo' is
+read. The last rule prints out the value of `foobar' at the end of
+the run.
+
+The special patterns `BEGIN' and `END' do not combine with other
+kinds of patterns.
+
+An `awk' program may have multiple `BEGIN' and/or `END' rules. The
+contents of multiple `BEGIN' or `END' rules are treated as if they
+had been enclosed in a single rule, in the order that the rules are
+encountered in the `awk' program. (This feature was introduced with
+the new version of `awk'.)
+
+Multiple `BEGIN' and `END' sections are also useful for writing
+library functions that need to do initialization and/or cleanup of
+their own. Note that the order in which library functions are named
+on the command line will affect the order in which their `BEGIN' and
+`END' rules will be executed. Therefore you have to be careful how
+you write your library functions. (*Note Command Line::, for more
+information on using library functions.)
+
+If an `awk' program only has a `BEGIN' rule, and no other rules, then
+the program will exit after the `BEGIN' rule has been run. Older
+versions of `awk' used to read their input until end of file was
+seen. However, if an `END' rule exists as well, then the input will
+be read, even if there are no other rules in the program.
+
+`BEGIN' and `END' rules must have actions; there is no default action
+for these rules since there is no current record when they run.
+
+
+
+File: gawk-info, Node: Boolean, Next: Conditional Patterns, Prev: BEGIN/END, Up: Patterns
+
+Boolean Operators and Patterns
+==============================
+
+A boolean pattern is a combination of other patterns using the
+boolean operators ``or'' (`||'), ``and'' (`&&'), and ``not'' (`!'),
+along with parentheses to control nesting. Whether the boolean
+pattern matches an input record is computed from whether its
+subpatterns match.
+
+The subpatterns of a boolean pattern can be regular expressions,
+matching expressions, comparisons, or other boolean combinations of
+such. Range patterns cannot appear inside boolean operators, since
+they don't make sense for classifying a single record, and neither
+can the special patterns `BEGIN' and `END', which never match any
+input record.
+
+Here are descriptions of the three boolean operators.
+
+`PAT1 && PAT2'
+ Matches if both PAT1 and PAT2 match by themselves. For example,
+ the following command prints all records in the input file
+ `BBS-list' that contain both `2400' and `foo'.
+
+ awk '/2400/ && /foo/' BBS-list
+
+ Whether PAT2 matches is tested only if PAT1 succeeds. This can
+ make a difference when PAT2 contains expressions that have side
+ effects: in the case of `/foo/ && ($2 == bar++)', the variable
+ `bar' is not incremented if there is no `foo' in the record.
+
+`PAT1 || PAT2'
+ Matches if at least one of PAT1 and PAT2 matches the current
+ input record. For example, the following command prints all
+ records in the input file `BBS-list' that contain *either*
+ `2400' or `foo', or both.
+
+ awk '/2400/ || /foo/' BBS-list
+
+ Whether PAT2 matches is tested only if PAT1 fails to match.
+ This can make a difference when PAT2 contains expressions that
+ have side effects.
+
+`!PAT'
+ Matches if PAT does not match. For example, the following
+ command prints all records in the input file `BBS-list' that do
+ *not* contain the string `foo'.
+
+ awk '! /foo/' BBS-list
+
+Note that boolean patterns are built from other patterns just as
+boolean expressions are built from other expressions (*note Boolean
+Ops::.). Any boolean expression is also a valid boolean pattern.
+But the converse is not true: simple regular expression patterns such
+as `/foo/' are not allowed in boolean expressions. Regular
+expressions can appear in boolean expressions only in conjunction
+with the matching operators, `~' and `!~'.
+
+
+
+File: gawk-info, Node: Conditional Patterns, Prev: Boolean, Up: Patterns
+
+Conditional Patterns
+====================
+
+Patterns may use a "conditional expression" much like the conditional
+expression of the C language. This takes the form:
+
+ PAT1 ? PAT2 : PAT3
+
+The first pattern is evaluated. If it evaluates to TRUE, then the
+input record is tested against PAT2. Otherwise it is tested against
+PAT3. The conditional pattern matches if PAT2 or PAT3 (whichever one
+is selected) matches.
+
+
+
+File: gawk-info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top
+
+Actions: The Basics
+*******************
+
+The "action" part of an `awk' rule tells `awk' what to do once a
+match for the pattern is found. An action consists of one or more
+`awk' "statements", enclosed in curly braces (`{' and `}'). The
+curly braces must be used even if the action contains only one
+statement, or even if it contains no statements at all. Action
+statements are separated by newlines or semicolons.
+
+Besides the print statements already covered (*note Printing::.),
+there are four kinds of action statements: expressions, control
+statements, compound statements, and function definitions.
+
+ * "Expressions" include assignments, arithmetic, function calls,
+ and more (*note Expressions::.).
+
+ * "Control statements" specify the control flow of `awk' programs.
+ The `awk' language gives you C--like constructs (`if', `for',
+ `while', and so on) as well as a few special ones (*note
+ Statements::.).
+
+ * A "compound statement" is just one or more `awk' statements
+ enclosed in curly braces. This way you can group several
+ statements to form the body of an `if' or similar statement.
+
+ * You can define "user--defined functions" for use elsewhere in
+ the `awk' program (*note User-defined::.).
+
+
+
+File: gawk-info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top
+
+Actions: Expressions
+********************
+
+Expressions are the basic building block of `awk' actions. An
+expression evaluates to a value, which you can print, test, store in
+a variable or pass to a function.
+
+But, beyond that, an expression can assign a new value to a variable
+or a field, with an assignment operator.
+
+An expression can serve as a statement on its own. Most other action
+statements are made up of various combinations of expressions. As in
+other languages, expressions in `awk' include variables, array
+references, constants, and function calls, as well as combinations of
+these with various operators.
+
+* Menu:
+
+* Constants:: String and numeric constants.
+* Variables:: Variables give names to values for future use.
+* Fields:: Field references such as `$1' are also expressions.
+* Arrays:: Array element references are expressions.
+
+* Arithmetic Ops:: Arithmetic operations (`+', `-', etc.)
+* Concatenation:: Concatenating strings.
+* Comparison Ops:: Comparison of numbers and strings with `<', etc.
+* Boolean Ops:: Combining comparison expressions using boolean operators
+ `||' (``or''), `&&' (``and'') and `!' (``not'').
+
+* Assignment Ops:: Changing the value of a variable or a field.
+* Increment Ops:: Incrementing the numeric value of a variable.
+
+* Conversion:: The conversion of strings to numbers and vice versa.
+* Conditional Exp:: Conditional expressions select between two subexpressions
+ under control of a third subexpression.
+* Function Calls:: A function call is an expression.
+
+
+
+File: gawk-info, Node: Constants, Next: Variables, Up: Expressions
+
+Constant Expressions
+====================
+
+There are two types of constants: numeric constants and string
+constants.
+
+The "numeric constant" is a number. This number can be an integer, a
+decimal fraction, or a number in scientific (exponential) notation.
+Note that all numeric values are represented within `awk' in
+double--precision floating point. Here are some examples of numeric
+constants, which all have the same value:
+
+ 105
+ 1.05e+2
+ 1050e-1
+
+A string constant consists of a sequence of characters enclosed in
+double--quote marks. For example:
+
+ "parrot"
+
+represents the string constant `parrot'. Strings in `gawk' can be of
+any length and they can contain all the possible 8--bit ASCII
+characters including ASCII NUL. Other `awk' implementations may have
+difficulty with some character codes.
+
+Some characters cannot be included literally in a string. You
+represent them instead with "escape sequences", which are character
+sequences beginning with a backslash (`\').
+
+One use of the backslash is to include double--quote characters in a
+string. Since a plain double--quote would end the string, you must
+use `\"'. Backslash itself is another character that can't be
+included normally; you write `\\' to put one backslash in the string.
+
+Another use of backslash is to represent unprintable characters such
+as newline. While there is nothing to stop you from writing these
+characters directly in an `awk' program, they may look ugly.
+
+`\b'
+ Represents a backspaced, H'.
+
+`\f'
+ Represents a formfeed, L'.
+
+`\n'
+ Represents a newline, J'.
+
+`\r'
+ Represents a carriage return, M'.
+
+`\t'
+ Represents a horizontal tab, I'.
+
+`\v'
+ Represents a vertical tab, K'.
+
+`\NNN'
+ Represents the octal value NNN, where NNN is one to three digits
+ between 0 and 7. For example, the code for the ASCII ESC
+ (escape) character is `\033'.
+
+
+
+File: gawk-info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions
+
+Variables
+=========
+
+Variables let you give names to values and refer to them later. You
+have already seen variables in many of the examples. The name of a
+variable must be a sequence of letters, digits and underscores, but
+it may not begin with a digit. Case is significant in variable
+names; `a' and `A' are distinct variables.
+
+A variable name is a valid expression by itself; it represents the
+variable's current value. Variables are given new values with
+"assignment operators" and "increment operators". *Note Assignment
+Ops::.
+
+A few variables have special built--in meanings, such as `FS', the
+field separator, and `NF', the number of fields in the current input
+record. *Note Special::, for a list of them. Special variables can
+be used and assigned just like all other variables, but their values
+are also used or changed automatically by `awk'. Each special
+variable's name is made entirely of upper case letters.
+
+Variables in `awk' can be assigned either numeric values or string
+values. By default, variables are initialized to the null string,
+which has the numeric value zero. So there is no need to
+``initialize'' each variable explicitly in `awk', the way you would
+need to do in C or most other traditional programming languages.
+
+
+
+File: gawk-info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions
+
+Arithmetic Operators
+====================
+
+The `awk' language uses the common arithmetic operators when
+evaluating expressions. All of these arithmetic operators follow
+normal precedence rules, and work as you would expect them to. This
+example divides field 3 by field 4, adds field 2, stores the result
+into field 1, and prints the results:
+
+ awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped
+
+The arithmetic operators in `awk' are:
+
+`X + Y'
+ Addition.
+
+`X - Y'
+ Subtraction.
+
+`- X'
+ Negation.
+
+`X / Y'
+ Division. Since all numbers in `awk' are double--precision
+ floating point, the result is not rounded to an integer: `3 / 4'
+ has the value 0.75.
+
+`X * Y'
+ Multiplication.
+
+`X % Y'
+ Remainder. The quotient is rounded toward zero to an integer,
+ multiplied by Y and this result is subtracted from X. This
+ operation is sometimes known as ``trunc--mod''. The following
+ relation always holds:
+
+ `b * int(a / b) + (a % b) == a'
+
+ One undesirable effect of this definition of remainder is that X
+ % Y is negative if X is negative. Thus,
+
+ -17 % 8 = -1
+
+`X ^ Y'
+`X ** Y'
+ Exponentiation: X raised to the Y power. `2 ^ 3' has the value
+ 8. The character sequence `**' is equivalent to `^'.
+
+
+
+File: gawk-info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions
+
+String Concatenation
+====================
+
+There is only one string operation: concatenation. It does not have
+a specific operator to represent it. Instead, concatenation is
+performed by writing expressions next to one another, with no
+operator. For example:
+
+ awk '{ print "Field number one: " $1 }' BBS-list
+
+produces, for the first record in `BBS-list':
+
+ Field number one: aardvark
+
+If you hadn't put the space after the `:', the line would have run
+together. For example:
+
+ awk '{ print "Field number one:" $1 }' BBS-list
+
+produces, for the first record in `BBS-list':
+
+ Field number one:aardvark
+
+
+
+File: gawk-info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions
+
+Comparison Expressions
+======================
+
+"Comparison expressions" use "relational operators" to compare
+strings or numbers. The relational operators are the same as in C.
+Here is a table of them:
+
+`X < Y'
+ True if X is less than Y.
+
+`X <= Y'
+ True if X is less than or equal to Y.
+
+`X > Y'
+ True if X is greater than Y.
+
+`X >= Y'
+ True if X is greater than or equal to Y.
+
+`X == Y'
+ True if X is equal to Y.
+
+`X != Y'
+ True if X is not equal to Y.
+
+`X ~ REGEXP'
+ True if regexp REGEXP matches the string X.
+
+`X !~ REGEXP'
+ True if regexp REGEXP does not match the string X.
+
+`SUBSCRIPT in ARRAY'
+ True if array ARRAY has an element with the subscript SUBSCRIPT.
+
+Comparison expressions have the value 1 if true and 0 if false.
+
+The operands of a relational operator are compared as numbers if they
+are both numbers. Otherwise they are converted to, and compared as,
+strings (*note Conversion::.). Strings are compared by comparing the
+first character of each, then the second character of each, and so on.
+Thus, `"10"' is less than `"9"'.
+
+For example,
+
+ $1 == "foo"
+
+has the value of 1, or is true, if the first field of the current
+input record is precisely `foo'. By contrast,
+
+ $1 ~ /foo/
+
+has the value 1 if the first field contains `foo'.
+
+
+
+File: gawk-info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions
+
+Boolean Operators
+=================
+
+A boolean expression is combination of comparison expressions or
+matching expressions, using the boolean operators ``or'' (`||'),
+``and'' (`&&'), and ``not'' (`!'), along with parentheses to control
+nesting. The truth of the boolean expression is computed by
+combining the truth values of the component expressions.
+
+Boolean expressions can be used wherever comparison and matching
+expressions can be used. They can be used in `if' and `while'
+statements. They have numeric values (1 if true, 0 if false).
+
+In addition, every boolean expression is also a valid boolean
+pattern, so you can use it as a pattern to control the execution of
+rules.
+
+Here are descriptions of the three boolean operators, with an example
+of each. It may be instructive to compare these examples with the
+analogous examples of boolean patterns (*note Boolean::.), which use
+the same boolean operators in patterns instead of expressions.
+
+`BOOLEAN1 && BOOLEAN2'
+ True if both BOOLEAN1 and BOOLEAN2 are true. For example, the
+ following statement prints the current input record if it
+ contains both `2400' and `foo'.
+
+ if ($0 ~ /2400/ && $0 ~ /foo/) print
+
+ The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is
+ true. This can make a difference when BOOLEAN2 contains
+ expressions that have side effects: in the case of `$0 ~ /foo/
+ && ($2 == bar++)', the variable `bar' is not incremented if
+ there is no `foo' in the record.
+
+`BOOLEAN1 || BOOLEAN2'
+ True if at least one of BOOLEAN1 and BOOLEAN2 is true. For
+ example, the following command prints all records in the input
+ file `BBS-list' that contain *either* `2400' or `foo', or both.
+
+ awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list
+
+ The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is
+ true. This can make a difference when BOOLEAN2 contains
+ expressions that have side effects.
+
+`!BOOLEAN'
+ True if BOOLEAN is false. For example, the following program
+ prints all records in the input file `BBS-list' that do *not*
+ contain the string `foo'.
+
+ awk '{ if (! ($0 ~ /foo/)) print }' BBS-list
+
+
+
+File: gawk-info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions
+
+Assignment Operators
+====================
+
+An "assignment" is an expression that stores a new value into a
+variable. For example, let's assign the value 1 to the variable `z':
+
+ z = 1
+
+After this expression is executed, the variable `z' has the value 1.
+Whatever old value `z' had before the assignment is forgotten.
+
+The `=' sign is called an "assignment operator". It is the simplest
+assignment operator because the value of the right--hand operand is
+stored unchanged.
+
+The left--hand operand of an assignment can be a variable (*note
+Variables::.), a field (*note Changing Fields::.) or an array element
+(*note Arrays::.). These are all called "lvalues", which means they
+can appear on the left side of an assignment operator. The
+right--hand operand may be any expression; it produces the new value
+which the assignment stores in the specified variable, field or array
+element.
+
+Assignments can store string values also. For example, this would
+store the value `"this food is good"' in the variable `message':
+
+ thing = "food"
+ predicate = "good"
+ message = "this " thing " is " predicate
+
+(This also illustrates concatenation of strings.)
+
+It is important to note that variables do *not* have permanent types.
+The type of a variable is simply the type of whatever value it
+happens to hold at the moment. In the following program fragment,
+the variable `foo' has a numeric value at first, and a string value
+later on:
+
+ foo = 1
+ print foo
+ foo = "bar"
+ print foo
+
+When the second assignment gives `foo' a string value, the fact that
+it previously had a numeric value is forgotten.
+
+An assignment is an expression, so it has a value: the same value
+that is assigned. Thus, `z = 1' as an expression has the value 1.
+One consequence of this is that you can write multiple assignments
+together:
+
+ x = y = z = 0
+
+stores the value 0 in all three variables. It does this because the
+value of `z = 0', which is 0, is stored into `y', and then the value
+of `y = z = 0', which is 0, is stored into `x'.
+
+You can use an assignment anywhere an expression is called for. For
+example, it is valid to write `x != (y = 1)' to set `y' to 1 and then
+test whether `x' equals 1. But this style tends to make programs
+hard to read; except in a one--shot program, you should rewrite it to
+get rid of such nesting of assignments. This is never very hard.
+
+Aside from `=', there are several other assignment operators that do
+arithmetic with the old value of the variable. For example, the
+operator `+=' computes a new value by adding the right--hand value to
+the old value of the variable. Thus, the following assignment adds 5
+to the value of `foo':
+
+ foo += 5
+
+This is precisely equivalent to the following:
+
+ foo = foo + 5
+
+Use whichever one makes the meaning of your program clearer.
+
+Here is a table of the arithmetic assignment operators. In each
+case, the right--hand operand is an expression whose value is
+converted to a number.
+
+`LVALUE += INCREMENT'
+ Adds INCREMENT to the value of LVALUE to make the new value of
+ LVALUE.
+
+`LVALUE -= DECREMENT'
+ Subtracts DECREMENT from the value of LVALUE.
+
+`LVALUE *= COEFFICIENT'
+ Multiplies the value of LVALUE by COEFFICIENT.
+
+`LVALUE /= QUOTIENT'
+ Divides the value of LVALUE by QUOTIENT.
+
+`LVALUE %= MODULUS'
+ Sets LVALUE to its remainder by MODULUS.
+
+`LVALUE ^= POWER'
+`LVALUE **= POWER'
+ Raises LVALUE to the power POWER.
+
+
+
+File: gawk-info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions
+
+Increment Operators
+===================
+
+"Increment operators" increase or decrease the value of a variable by
+1. You could do the same thing with an assignment operator, so the
+increment operators add no power to the `awk' language; but they are
+convenient abbreviations for something very common.
+
+The operator to add 1 is written `++'. There are two ways to use
+this operator: pre--incrementation and post--incrementation.
+
+To pre--increment a variable V, write `++V'. This adds 1 to the
+value of V and that new value is also the value of this expression.
+The assignment expression `V += 1' is completely equivalent.
+
+Writing the `++' after the variable specifies post--increment. This
+increments the variable value just the same; the difference is that
+the value of the increment expression itself is the variable's *old*
+value. Thus, if `foo' has value 4, then the expression `foo++' has
+the value 4, but it changes the value of `foo' to 5.
+
+The post--increment `foo++' is nearly equivalent to writing `(foo +=
+1) - 1'. It is not perfectly equivalent because all numbers in `awk'
+are floating point: in floating point, `foo + 1 - 1' does not
+necessarily equal `foo'. But the difference will be minute as long
+as you stick to numbers that are fairly small (less than a trillion).
+
+Any lvalue can be incremented. Fields and array elements are
+incremented just like variables.
+
+The decrement operator `--' works just like `++' except that it
+subtracts 1 instead of adding. Like `++', it can be used before the
+lvalue to pre--decrement or after it to post--decrement.
+
+Here is a summary of increment and decrement expressions.
+
+`++LVALUE'
+ This expression increments LVALUE and the new value becomes the
+ value of this expression.
+
+`LVALUE++'
+ This expression causes the contents of LVALUE to be incremented.
+ The value of the expression is the *old* value of LVALUE.
+
+`--LVALUE'
+ Like `++LVALUE', but instead of adding, it subtracts. It
+ decrements LVALUE and delivers the value that results.
+
+`LVALUE--'
+ Like `LVALUE++', but instead of adding, it subtracts. It
+ decrements LVALUE. The value of the expression is the *old*
+ value of LVALUE.
+
+
+
+File: gawk-info, Node: Conversion, Next: Conditional Exp, Prev: Increment Ops, Up: Expressions
+
+Conversion of Strings and Numbers
+=================================
+
+Strings are converted to numbers, and numbers to strings, if the
+context of your `awk' statement demands it. For example, if the
+values of `foo' or `bar' in the expression `foo + bar' happen to be
+strings, they are converted to numbers before the addition is
+performed. If numeric values appear in string concatenation, they
+are converted to strings. Consider this:
+
+ two = 2; three = 3
+ print (two three) + 4
+
+This eventually prints the (numeric) value `27'. The numeric
+variables `two' and `three' are converted to strings and concatenated
+together, and the resulting string is converted back to a number
+before adding `4'. The resulting numeric value `27' is printed.
+
+If, for some reason, you need to force a number to be converted to a
+string, concatenate the null string with that number. To force a
+string to be converted to a number, add zero to that string. Strings
+that can't be interpreted as valid numbers are given the numeric
+value zero.
+
+The exact manner in which numbers are converted into strings is
+controlled by the `awk' special variable `OFMT' (*note Special::.).
+Numbers are converted using a special version of the `sprintf'
+function (*note Built-in::.) with `OFMT' as the format specifier.
+
+`OFMT''s default value is `"%.6g"', which prints a value with at
+least six significant digits. You might want to change it to specify
+more precision, if your version of `awk' uses double precision
+arithmetic. Double precision on most modern machines gives you 16 or
+17 decimal digits of precision.
+
+Strange results can happen if you set `OFMT' to a string that doesn't
+tell `sprintf' how to format floating point numbers in a useful way.
+For example, if you forget the `%' in the format, all numbers will be
+converted to the same constant string.
+
+
+
+File: gawk-info, Node: Conditional Exp, Next: Function Calls, Prev: Conversion, Up: Expressions
+
+Conditional Expressions
+=======================
+
+A "conditional expression" is a special kind of expression with three
+operands. It allows you to use one expression's value to select one
+of two other expressions.
+
+The conditional expression looks the same as in the C language:
+
+ SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP
+
+There are three subexpressions. The first, SELECTOR, is always
+computed first. If it is ``true'' (not zero) then IF-TRUE-EXP is
+computed next and its value becomes the value of the whole expression.
+Otherwise, IF-FALSE-EXP is computed next and its value becomes the
+value of the whole expression.
+
+For example, this expression produces the absolute value of `x':
+
+ x > 0 ? x : -x
+
+Each time the conditional expression is computed, exactly one of
+IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This
+is important when the expressions contain side effects. For example,
+this conditional expression examines element `i' of either array `a'
+or array `b', and increments `i'.
+
+ x == y ? a[i++] : b[i++]
+
+This is guaranteed to increment `i' exactly once, because each time
+one or the other of the two increment expressions will be executed
+and the other will not be.
+
+
+
+File: gawk-info, Node: Function Calls, Prev: Conditional Exp, Up: Expressions
+
+Function Calls
+==============
+
+A "function" is a name for a particular calculation. Because it has
+a name, you can ask for it by name at any point in the program. For
+example, the function `sqrt' computes the square root of a number.
+
+A fixed set of functions are "built in", which means they are
+available in every `awk' program. The `sqrt' function is one of
+these. *Note Built-in::, for a list of built--in functions and their
+descriptions. In addition, you can define your own functions in the
+program for use elsewhere in the same program. *Note User-defined::,
+for how to do this.
+
+The way to use a function is with a "function call" expression, which
+consists of the function name followed by a list of "arguments" in
+parentheses. The arguments are expressions which give the raw
+materials for the calculation that the function will do. When there
+is more than one argument, they are separated by commas. If there
+are no arguments, write just `()' after the function name.
+
+*Do not put any space between the function name and the
+open--parenthesis!* A user--defined function name looks just like
+the name of a variable, and space would make the expression look like
+concatenation of a variable with an expression inside parentheses.
+Space before the parenthesis is harmless with built--in functions,
+but it is best not to get into the habit of using space, lest you do
+likewise for a user--defined function one day by mistake.
+
+Each function needs a particular number of arguments. For example,
+the `sqrt' function must be called with a single argument, like this:
+
+ sqrt(ARGUMENT)
+
+The argument is the number to take the square root of.
+
+Some of the built--in functions allow you to omit the final argument.
+If you do so, they will use a reasonable default. *Note Built-in::,
+for full details. If arguments are omitted in calls to user--defined
+functions, then those arguments are treated as local variables,
+initialized to the null string (*note User-defined::.).
+
+Like every other expression, the function call has a value, which is
+computed by the function based on the arguments you give it. In this
+example, the value of `sqrt(ARGUMENT)' is the square root of the
+argument. A function can also have side effects, such as assigning
+the values of certain variables or doing I/O.
+
+Here is a command to read numbers, one number per line, and print the
+square root of each one:
+
+ awk '{ print "The square root of", $1, "is", sqrt($1) }'
+
+
+
+File: gawk-info, Node: Statements, Next: Arrays, Prev: Expressions, Up: Top
+
+Actions: Statements
+*******************
+
+"Control statements" such as `if', `while', and so on control the
+flow of execution in `awk' programs. Most of the control statements
+in `awk' are patterned on similar statements in C.
+
+The simplest kind of statement is an expression. The other kinds of
+statements start with special keywords such as `if' and `while', to
+distinguish them from simple expressions.
+
+In all the examples in this chapter, BODY can be either a single
+statement or a group of statements. Groups of statements are
+enclosed in braces, and separated by newlines or semicolons.
+
+* Menu:
+
+* Expressions:: One kind of statement simply computes an expression.
+
+* If:: Conditionally execute some `awk' statements.
+
+* While:: Loop until some condition is satisfied.
+
+* Do:: Do specified action while looping until some
+ condition is satisfied.
+
+* For:: Another looping statement, that provides
+ initialization and increment clauses.
+
+* Break:: Immediately exit the innermost enclosing loop.
+
+* Continue:: Skip to the end of the innermost enclosing loop.
+
+* Next:: Stop processing the current input record.
+
+* Exit:: Stop execution of `awk'.
+
+
+
+File: gawk-info, Node: If, Next: While, Up: Statements
+
+The `if' Statement
+==================
+
+The `if'-`else' statement is `awk''s decision--making statement. The
+`else' part of the statement is optional.
+
+ `if (CONDITION) BODY1 else BODY2'
+
+Here CONDITION is an expression that controls what the rest of the
+statement will do. If CONDITION is true, BODY1 is executed;
+otherwise, BODY2 is executed (assuming that the `else' clause is
+present). The condition is considered true if it is nonzero or
+nonnull.
+
+Here is an example:
+
+ awk '{ if (x % 2 == 0)
+ print "x is even"
+ else
+ print "x is odd" }'
+
+In this example, if the statement containing `x' is found to be true
+(that is, x is divisible by 2), then the first `print' statement is
+executed, otherwise the second `print' statement is performed.
+
+If the `else' appears on the same line as BODY1, and BODY1 is a
+single statement, then a semicolon must separate BODY1 from `else'.
+To illustrate this, let's rewrite the previous example:
+
+ awk '{ if (x % 2 == 0) print "x is even"; else
+ print "x is odd" }'
+
+If you forget the `;', `awk' won't be able to parse it, and you will
+get a syntax error.
+
+We would not actually write this example this way, because a human
+reader might fail to see the `else' if it were not the first thing on
+its line.
+
+
+
+File: gawk-info, Node: While, Next: Do, Prev: If, Up: Statements
+
+The `while' Statement
+=====================
+
+In programming, a loop means a part of a program that is (or at least
+can be) executed two or more times in succession.
+
+The `while' statement is the simplest looping statement in `awk'. It
+repeatedly executes a statement as long as a condition is true. It
+looks like this:
+
+ while (CONDITION)
+ BODY
+
+Here BODY is a statement that we call the "body" of the loop, and
+CONDITION is an expression that controls how long the loop keeps
+running.
+
+The first thing the `while' statement does is test CONDITION. If
+CONDITION is true, it executes the statement BODY. After BODY has
+been executed, CONDITION is tested again and this process is repeated
+until CONDITION is no longer true. If CONDITION is initially false,
+the body of the loop is never executed.
+
+ awk '{ i = 1
+ while (i <= 3) {
+ print $i
+ i++
+ }
+ }'
+
+This example prints the first three input fields, one per line.
+
+The loop works like this: first, the value of `i' is set to 1. Then,
+the `while' tests whether `i' is less than or equal to three. This
+is the case when `i' equals one, so the `i'-th field is printed.
+Then the `i++' increments the value of `i' and the loop repeats.
+
+When `i' reaches 4, the loop exits. Here BODY is a compound
+statement enclosed in braces. As you can see, a newline is not
+required between the condition and the body; but using one makes the
+program clearer unless the body is a compound statement or is very
+simple.
+
+
+
+File: gawk-info, Node: Do, Next: For, Prev: While, Up: Statements
+
+The `do'--`while' Statement
+===========================
+
+The `do' loop is a variation of the `while' looping statement. The
+`do' loop executes the BODY once, then repeats BODY as long as
+CONDITION is true. It looks like this:
+
+ do
+ BODY
+ while (CONDITION)
+
+Even if CONDITION is false at the start, BODY is executed at least
+once (and only once, unless executing BODY makes CONDITION true).
+Contrast this with the corresponding `while' statement:
+
+ while (CONDITION)
+ BODY
+
+This statement will not execute BODY even once if CONDITION is false
+to begin with.
+
+Here is an example of a `do' statement:
+
+ awk '{ i = 1
+ do {
+ print $0
+ i++
+ } while (i <= 10)
+ }'
+
+prints each input record ten times. It isn't a very realistic
+example, since in this case an ordinary `while' would do just as
+well. But this is normal; there is only occasionally a real use for
+a `do' statement.
+
+
+
+File: gawk-info, Node: For, Next: Break, Prev: Do, Up: Statements
+
+The `for' Statement
+===================
+
+The `for' statement makes it more convenient to count iterations of a
+loop. The general form of the `for' statement looks like this:
+
+ for (INITIALIZATION; CONDITION; INCREMENT)
+ BODY
+
+This statement starts by executing INITIALIZATION. Then, as long as
+CONDITION is true, it repeatedly executes BODY and then INCREMENT.
+Typically INITIALIZATION sets a variable to either zero or one,
+INCREMENT adds 1 to it, and CONDITION compares it against the desired
+number of iterations.
+
+Here is an example of a `for' statement:
+
+ awk '{ for (i = 1; i <= 3; i++)
+ print $i
+ }'
+
+This prints the first three fields of each input record, one field
+per line.
+
+In the `for' statement, BODY stands for any statement, but
+INITIALIZATION, CONDITION and INCREMENT are just expressions. You
+cannot set more than one variable in the INITIALIZATION part unless
+you use a multiple assignment statement such as `x = y = 0', which is
+possible only if all the initial values are equal. (But you can
+initialize additional variables by writing their assignments as
+separate statements preceding the `for' loop.)
+
+The same is true of the INCREMENT part; to increment additional
+variables, you must write separate statements at the end of the loop.
+The C compound expression, using C's comma operator, would be useful
+in this context, but it is not supported in `awk'.
+
+Most often, INCREMENT is an increment expression, as in the example
+above. But this is not required; it can be any expression whatever.
+For example, this statement prints odd numbers from 1 to 100:
+
+ # print odd numbers from 1 to 100
+ for (i = 1; i <= 100; i += 2)
+ print i
+
+Any of the three expressions following `for' may be omitted if you
+don't want it to do anything. Thus, `for (;x > 0;)' is equivalent to
+`while (x > 0)'. If the CONDITION part is empty, it is treated as
+TRUE, effectively yielding an infinite loop.
+
+In most cases, a `for' loop is an abbreviation for a `while' loop, as
+shown here:
+
+ INITIALIZATION
+ while (CONDITION) {
+ BODY
+ INCREMENT
+ }
+
+(The only exception is when the `continue' statement (*note
+Continue::.) is used inside the loop; changing a `for' statement to a
+`while' statement in this way can change the effect of the `continue'
+statement inside the loop.)
+
+The `awk' language has a `for' statement in addition to a `while'
+statement because often a `for' loop is both less work to type and
+more natural to think of. Counting the number of iterations is very
+common in loops. It can be easier to think of this counting as part
+of looping rather than as something to do inside the loop.
+
+The next section has more complicated examples of `for' loops.
+
+There is an alternate version of the `for' loop, for iterating over
+all the indices of an array:
+
+ for (i in array)
+ PROCESS array[i]
+
+*Note Arrays::, for more information on this version of the `for' loop.
+
+
+
+File: gawk-info, Node: Break, Next: Continue, Prev: For, Up: Statements
+
+The `break' Statement
+=====================
+
+The `break' statement jumps out of the innermost `for', `while', or
+`do'--`while' loop that encloses it. The following example finds the
+smallest divisor of any number, and also identifies prime numbers:
+
+ awk '# find smallest divisor of num
+ { num = $1
+ for (div = 2; div*div <= num; div++)
+ if (num % div == 0)
+ break
+ if (num % div == 0)
+ printf "Smallest divisor of %d is %d\n", num, div
+ else
+ printf "%d is prime\n", num }'
+
+When the remainder is zero in the first `if' statement, `awk'
+immediately "breaks" out of the containing `for' loop. This means
+that `awk' proceeds immediately to the statement following the loop
+and continues processing. (This is very different from the `exit'
+statement (*note Exit::.) which stops the entire `awk' program.)
+
+Here is another program equivalent to the previous one. It
+illustrates how the CONDITION of a `for' or `while' could just as
+well be replaced with a `break' inside an `if':
+
+ awk '# find smallest divisor of num
+ { num = $1
+ for (div = 2; ; div++) {
+ if (num % div == 0) {
+ printf "Smallest divisor of %d is %d\n", num, div
+ break
+ }
+ if (div*div > num) {
+ printf "%d is prime\n", num
+ break
+ }
+ }
+ }'
+
+
+
+File: gawk-info, Node: Continue, Next: Next, Prev: Break, Up: Statements
+
+The `continue' Statement
+========================
+
+The `continue' statement, like `break', is used only inside `for',
+`while', and `do'--`while' loops. It skips over the rest of the loop
+body, causing the next cycle around the loop to begin immediately.
+Contrast this with `break', which jumps out of the loop altogether.
+Here is an example:
+
+ # print names that don't contain the string "ignore"
+
+ # first, save the text of each line
+ { names[NR] = $0 }
+
+ # print what we're interested in
+ END {
+ for (x in names) {
+ if (names[x] ~ /ignore/)
+ continue
+ print names[x]
+ }
+ }
+
+If any of the input records contain the string `ignore', this example
+skips the print statement and continues back to the first statement
+in the loop.
+
+This isn't a practical example of `continue', since it would be just
+as easy to write the loop like this:
+
+ for (x in names)
+ if (x !~ /ignore/)
+ print x
+
+The `continue' statement causes `awk' to skip the rest of what is
+inside a `for' loop, but it resumes execution with the increment part
+of the `for' loop. The following program illustrates this fact:
+
+ awk 'BEGIN {
+ for (x = 0; x <= 20; x++) {
+ if (x == 5)
+ continue
+ printf ("%d ", x)
+ }
+ print ""
+ }'
+
+This program prints all the numbers from 0 to 20, except for 5, for
+which the `printf' is skipped. Since the increment `x++' is not
+skipped, `x' does not remain stuck at 5.
+
+
+
+File: gawk-info, Node: Next, Next: Exit, Prev: Continue, Up: Statements
+
+The `next' Statement
+====================
+
+The `next' statement forces `awk' to immediately stop processing the
+current record and go on to the next record. This means that no
+further rules are executed for the current record. The rest of the
+current rule's action is not executed either.
+
+Contrast this with the effect of the `getline' function (*note
+Getline::.). That too causes `awk' to read the next record
+immediately, but it does not alter the flow of control in any way.
+So the rest of the current action executes with a new input record.
+
+At the grossest level, `awk' program execution is a loop that reads
+an input record and then tests each rule pattern against it. If you
+think of this loop as a `for' statement whose body contains the
+rules, then the `next' statement is analogous to a `continue'
+statement: it skips to the end of the body of the loop, and executes
+the increment (which reads another record).
+
+For example, if your `awk' program works only on records with four
+fields, and you don't want it to fail when given bad input, you might
+use the following rule near the beginning of the program:
+
+ NF != 4 {
+ printf ("line %d skipped: doesn't have 4 fields", FNR) > "/dev/tty"
+ next
+ }
+
+so that the following rules will not see the bad record. The error
+message is redirected to `/dev/tty' (the terminal), so that it won't
+get lost amid the rest of the program's regular output.
+
+
+
+File: gawk-info, Node: Exit, Prev: Next, Up: Statements
+
+The `exit' Statement
+====================
+
+The `exit' statement causes `awk' to immediately stop executing the
+current rule and to stop processing input; any remaining input is
+ignored.
+
+If an `exit' statement is executed from a `BEGIN' rule the program
+stops processing everything immediately. No input records will be
+read. However, if an `END' rule is present, it will be executed
+(*note BEGIN/END::.).
+
+If `exit' is used as part of an `END' rule, it causes the program to
+stop immediately.
+
+An `exit' statement that is part an ordinary rule (that is, not part
+of a `BEGIN' or `END' rule) stops the execution of any further
+automatic rules, but the `END' rule is executed if there is one. If
+you don't want the `END' rule to do its job in this case, you can set
+a variable to nonzero before the `exit' statement, and check that
+variable in the `END' rule.
+
+If an argument is supplied to `exit', its value is used as the exit
+status code for the `awk' process. If no argument is supplied,
+`exit' returns status zero (success).
+
+For example, let's say you've discovered an error condition you
+really don't know how to handle. Conventionally, programs report
+this by exiting with a nonzero status. Your `awk' program can do
+this using an `exit' statement with a nonzero argument. Here's an
+example of this:
+
+ BEGIN {
+ if (("date" | getline date_now) < 0) {
+ print "Can't get system date"
+ exit 4
+ }
+ }
+
+
+
+File: gawk-info, Node: Arrays, Next: Built-in, Prev: Statements, Up: Top
+
+Actions: Using Arrays in `awk'
+******************************
+
+An "array" is a table of various values, called "elements". The
+elements of an array are distinguished by their "indices". Names of
+arrays in `awk' are strings of alphanumeric characters and
+underscores, just like regular variables.
+
+You cannot use the same identifier as both a variable and as an array
+name in one `awk' program.
+
+* Menu:
+
+* Intro: Array Intro. Basic facts abou arrays in `awk'.
+* Reference to Elements:: How to examine one element of an array.
+* Assigning Elements:: How to change an element of an array.
+* Example: Array Example. Sample program explained.
+
+* Scanning an Array:: A variation of the `for' statement. It loops
+ through the indices of an array's existing elements.
+
+* Delete:: The `delete' statement removes an element from an array.
+
+* Multi-dimensional:: Emulating multi--dimensional arrays in `awk'.
+* Multi-scanning:: Scanning multi--dimensional arrays.
+
+
+
+File: gawk-info, Node: Array Intro, Next: Reference to Elements, Up: Arrays
+
+Introduction to Arrays
+======================
+
+The `awk' language has one--dimensional "arrays" for storing groups
+of related strings or numbers. Each array must have a name; valid
+array names are the same as valid variable names, and they do
+conflict with variable names: you can't have both an array and a
+variable with the same name at any point in an `awk' program.
+
+Arrays in `awk' superficially resemble arrays in other programming
+languages; but there are fundamental differences. In `awk', you
+don't need to declare the size of an array before you start to use it.
+What's more, in `awk' any number or even a string may be used as an
+array index.
+
+In most other languages, you have to "declare" an array and specify
+how many elements or components it has. In such languages, the
+declaration causes a contiguous block of memory to be allocated for
+that many elements. An index in the array must be a positive
+integer; for example, the index 0 specifies the first element in the
+array, which is actually stored at the beginning of the block of
+memory. Index 1 specifies the second element, which is stored in
+memory right after the first element, and so on. It is impossible to
+add more elements to the array, because it has room for only as many
+elements as you declared. (Some languages have arrays whose first
+index is 1, others require that you specify both the first and last
+index when you declare the array. In such a language, an array could
+be indexed, for example, from -3 to 17.) A contiguous array of four
+elements might look like this, conceptually, if the element values
+are 8, `"foo"', `""' and 30:
+
+ +--------+--------+-------+--------+
+ | 8 | "foo" | "" | 30 | value
+ +--------+--------+-------+--------+
+ 0 1 2 3 index
+
+Only the values are stored; the indices are implicit from the order
+of the values. 8 is the value at index 0, because 8 appears in the
+position with 0 elements before it.
+
+Arrays in `awk' are different: they are "associative". This means
+that each array is a collection of pairs: an index, and its
+corresponding array element value:
+
+ Element 4 Value 30
+ Element 2 Value "foo"
+ Element 1 Value 8
+ Element 3 Value ""
+
+We have shown the pairs in jumbled order because their order doesn't
+mean anything.
+
+One advantage of an associative array is that new pairs can be added
+at any time. For example, suppose we add to that array a tenth
+element whose value is `"number ten"'. The result is this:
+
+ Element 10 Value "number ten"
+ Element 4 Value 30
+ Element 2 Value "foo"
+ Element 1 Value 8
+ Element 3 Value ""
+
+Now the array is "sparse" (i.e. some indices are missing): it has
+elements number 4 and 10, but doesn't have an element 5, 6, 7, 8, or 9.
+
+Another consequence of associative arrays is that the indices don't
+have to be positive integers. Any number, or even a string, can be
+an index. For example, here is an array which translates words from
+English into French:
+
+ Element "dog" Value "chien"
+ Element "cat" Value "chat"
+ Element "one" Value "un"
+ Element 1 Value "un"
+
+Here we decided to translate the number 1 in both spelled--out and
+numeral form--thus illustrating that a single array can have both
+numbers and strings as indices.
+
+When `awk' creates an array for you, e.g. with the `split' built--in
+function (*note String Functions::.), that array's indices start at
+the number one.
+
+
+
+File: gawk-info, Node: Reference to Elements, Next: Assigning Elements, Prev: Array Intro, Up: Arrays
+
+Referring to an Array Element
+=============================
+
+The principal way of using an array is to refer to one of its elements.
+An array reference is an expression which looks like this:
+
+ ARRAY[INDEX]
+
+Here ARRAY is the name of an array. The expression INDEX is the
+index of the element of the array that you want. The value of the
+array reference is the current value of that array element.
+
+For example, `foo[4.3]' is an expression for the element of array
+`foo' at index 4.3.
+
+If you refer to an array element that has no recorded value, the
+value of the reference is `""', the null string. This includes
+elements to which you have not assigned any value, and elements that
+have been deleted (*note Delete::.). Such a reference automatically
+creates that array element, with the null string as its value. (In
+some cases, this is unfortunate, because it might waste memory inside
+`awk').
+
+You can find out if an element exists in an array at a certain index
+with the expression:
+
+ INDEX in ARRAY
+
+This expression tests whether or not the particular index exists,
+without the side effect of creating that element if it is not present.
+The expression has the value 1 (true) if `ARRAY[SUBSCRIPT]' exists,
+and 0 (false) if it does not exist.
+
+For example, to find out whether the array `frequencies' contains the
+subscript `"2"', you would ask:
+
+ if ("2" in frequencies) print "Subscript \"2\" is present."
+
+Note that this is *not* a test of whether or not the array
+`frequencies' contains an element whose *value* is `"2"'. (There is
+no way to that except to scan all the elements.) Also, this *does
+not* create `frequencies["2"]', while the following (incorrect)
+alternative would:
+
+ if (frequencies["2"] != "") print "Subscript \"2\" is present."
+
+
+
+File: gawk-info, Node: Assigning Elements, Next: Array Example, Prev: Reference to Elements, Up: Arrays
+
+Assigning Array Elements
+========================
+
+Array elements are lvalues: they can be assigned values just like
+`awk' variables:
+
+ ARRAY[SUBSCRIPT] = VALUE
+
+Here ARRAY is the name of your array. The expression SUBSCRIPT is
+the index of the element of the array that you want to assign a
+value. The expression VALUE is the value you are assigning to that
+element of the array.
+
+
+
+File: gawk-info, Node: Array Example, Next: Scanning an Array, Prev: Assigning Elements, Up: Arrays
+
+Basic Example of an Array
+=========================
+
+The following program takes a list of lines, each beginning with a
+line number, and prints them out in order of line number. The line
+numbers are not in order, however, when they are first read: they
+are scrambled. This program sorts the lines by making an array using
+the line numbers as subscripts. It then prints out the lines in
+sorted order of their numbers. It is a very simple program, and will
+get confused if it encounters repeated numbers, gaps, or lines that
+don't begin with a number.
+
+ BEGIN {
+ max=0
+ }
+
+ {
+ if ($1 > max)
+ max = $1
+ arr[$1] = $0
+ }
+
+ END {
+ for (x = 1; x <= max; x++)
+ print arr[x]
+ }
+
+The first rule just initializes the variable `max'. (This is not
+strictly necessary, since an uninitialized variable has the null
+string as its value, and the null string is effectively zero when
+used in a context where a number is required.)
+
+The second rule keeps track of the largest line number seen so far;
+it also stores each line into the array `arr', at an index that is
+the line's number.
+
+The third rule runs after all the input has been read, to print out
+all the lines.
+
+When this program is run with the following input:
+
+ 5 I am the Five man
+ 2 Who are you? The new number two!
+ 4 . . . And four on the floor
+ 1 Who is number one?
+ 3 I three you.
+
+ its output is this:
+
+ 1 Who is number one?
+ 2 Who are you? The new number two!
+ 3 I three you.
+ 4 . . . And four on the floor
+ 5 I am the Five man
+
+
+
+File: gawk-info, Node: Scanning an Array, Next: Delete, Prev: Array Example, Up: Arrays
+
+Scanning All Elements of an Array
+=================================
+
+In programs that use arrays, often you need a loop that will execute
+once for each element of an array. In other languages, where arrays
+are contiguous and indices are limited to positive integers, this is
+easy: the largest index is one less than the length of the array, and
+you can find all the valid indices by counting from zero up to that
+value. This technique won't do the job in `awk', since any number or
+string may be an array index. So `awk' has a special kind of `for'
+statement for scanning an array:
+
+ for (VAR in ARRAY)
+ BODY
+
+This loop executes BODY once for each different value that your
+program has previously used as an index in ARRAY, with the variable
+VAR set to that index.
+
+Here is a program that uses this form of the `for' statement. The
+first rule scans the input records and notes which words appear (at
+least once) in the input, by storing a 1 into the array `used' with
+the word as index. The second rule scans the elements of `used' to
+find all the distinct words that appear in the input. It prints each
+word that is more than 10 characters long, and also prints the number
+of such words. *Note Built-in::, for more information on the
+built--in function `length'.
+
+ # Record a 1 for each word that is used at least once.
+ {
+ for (i = 0; i < NF; i++)
+ used[$i] = 1
+ }
+
+ # Find number of distinct words more than 10 characters long.
+ END {
+ num_long_words = 0
+ for (x in used)
+ if (length(x) > 10) {
+ ++num_long_words
+ print x
+ }
+ print num_long_words, "words longer than 10 characters"
+ }
+
+*Note Sample Program::, for a more detailed example of this type.
+
+The order in which elements of the array are accessed by this
+statement is determined by the internal arrangement of the array
+elements within `awk' and cannot be controlled or changed. This can
+lead to problems if new elements are added to ARRAY by statements in
+BODY; you cannot predict whether or not the `for' loop will reach
+them. Similarly, changing VAR inside the loop can produce strange
+results. It is best to avoid such things.
+
+
+
+File: gawk-info, Node: Delete, Next: Multi-dimensional, Prev: Scanning an Array, Up: Arrays
+
+The `delete' Statement
+======================
+
+You can remove an individual element of an array using the `delete'
+statement:
+
+ delete ARRAY[INDEX]
+
+When an array element is deleted, it is as if you had never referred
+to it and had never given it any value. Any value the element
+formerly had can no longer be obtained.
+
+Here is an example of deleting elements in an array:
+
+ awk '{ for (i in frequencies)
+ delete frequencies[i]
+ }'
+
+This example removes all the elements from the array `frequencies'.
+
+If you delete an element, the `for' statement to scan the array will
+not report that element, and the `in' operator to check for the
+presence of that element will return 0:
+
+ delete foo[4]
+ if (4 in foo)
+ print "This will never be printed"
+
+
+
+File: gawk-info, Node: Multi-dimensional, Next: Multi-scanning, Prev: Delete, Up: Arrays
+
+Multi--dimensional arrays
+=========================
+
+A multi--dimensional array is an array in which an element is
+identified by a sequence of indices, not a single index. For
+example, a two--dimensional array requires two indices. The usual
+way (in most languages, including `awk') to refer to an element of a
+two--dimensional array named `grid' is with `grid[x,y]'.
+
+Multi--dimensional arrays are supported in `awk' through
+concatenation of indices into one string. What happens is that `awk'
+converts the indices into strings (*note Conversion::.) and
+concatenates them together, with a separator between them. This
+creates a single string that describes the values of the separate
+indices. The combined string is used as a single index into an
+ordinary, one--dimensional array. The separator used is the value of
+the special variable `SUBSEP'.
+
+For example, suppose the value of `SUBSEP' is `","' and the
+expression `foo[5,12]="value"' is executed. The numbers 5 and 12
+will be concatenated with a comma between them, yielding `"5,12"';
+thus, the array element `foo["5,12"]' will be set to `"value"'.
+
+Once the element's value is stored, `awk' has no record of whether it
+was stored with a single index or a sequence of indices. The two
+expressions `foo[5,12]' and `foo[5 SUBSEP 12]' always have the same
+value.
+
+The default value of `SUBSEP' is not a comma; it is the string
+`"\034"', which contains a nonprinting character that is unlikely to
+appear in an `awk' program or in the input data.
+
+The usefulness of choosing an unlikely character comes from the fact
+that index values that contain a string matching `SUBSEP' lead to
+combined strings that are ambiguous. Suppose that `SUBSEP' is a
+comma; then `foo["a,b", "c"]' and `foo["a", "b,c"]' will be
+indistinguishable because both are actually stored as `foo["a,b,c"]'.
+Because `SUBSEP' is `"\034"', such confusion can actually happen only
+when an index contains the character `"\034"', which is a rare event.
+
+You can test whether a particular index--sequence exists in a
+``multi--dimensional'' array with the same operator `in' used for
+single dimensional arrays. Instead of a single index as the
+left--hand operand, write the whole sequence of indices, separated by
+commas, in parentheses:
+
+ (SUBSCRIPT1, SUBSCRIPT2, ...) in ARRAY
+
+The following example treats its input as a two--dimensional array of
+fields; it rotates this array 90 degrees clockwise and prints the
+result. It assumes that all lines have the same number of elements.
+
+ awk 'BEGIN {
+ max_nf = max_nr = 0
+ }
+
+ {
+ if (max_nf < NF)
+ max_nf = NF
+ max_nr = NR
+ for (x = 1; x <= NF; x++)
+ vector[x, NR] = $x
+ }
+
+ END {
+ for (x = 1; x <= max_nf; x++) {
+ for (y = max_nr; y >= 1; --y)
+ printf("%s ", vector[x, y])
+ printf("\n")
+ }
+ }'
+
+When given the input:
+
+ 1 2 3 4 5 6
+ 2 3 4 5 6 1
+ 3 4 5 6 1 2
+ 4 5 6 1 2 3
+
+it produces:
+
+ 4 3 2 1
+ 5 4 3 2
+ 6 5 4 3
+ 1 6 5 4
+ 2 1 6 5
+ 3 2 1 6
+
+
+
+File: gawk-info, Node: Multi-scanning, Prev: Multi-dimensional, Up: Arrays
+
+Scanning Multi--dimensional Arrays
+==================================
+
+There is no special `for' statement for scanning a
+``multi--dimensional'' array; there cannot be one, because in truth
+there are no multi--dimensional arrays or elements; there is only a
+multi--dimensional *way of accessing* an array.
+
+However, if your program has an array that is always accessed as
+multi--dimensional, you can get the effect of scanning it by
+combining the scanning `for' statement (*note Scanning an Array::.)
+with the `split' built--in function (*note String Functions::.). It
+works like this:
+
+ for (combined in ARRAY) {
+ split (combined, separate, SUBSEP)
+ ...
+ }
+
+This finds each concatenated, combined index in the array, and splits
+it into the individual indices by breaking it apart where the value
+of `SUBSEP' appears. The split--out indices become the elements of
+the array `separate'.
+
+Thus, suppose you have previously stored in `ARRAY[1, "foo"]'; then
+an element with index `"1\034foo"' exists in ARRAY. (Recall that the
+default value of `SUBSEP' contains the character with code 034.)
+Sooner or later the `for' statement will find that index and do an
+iteration with `combined' set to `"1\034foo"'. Then the `split'
+function will be called as follows:
+
+ split ("1\034foo", separate, "\034")
+
+The result of this is to set `separate[1]' to 1 and `separate[2]' to
+`"foo"'. Presto, the original sequence of separate indices has been
+recovered.
+
+
+
+File: gawk-info, Node: Built-in, Next: User-defined, Prev: Arrays, Up: Top
+
+Built--in functions
+*******************
+
+"Built--in" functions are functions always available for your `awk'
+program to call. This chapter defines all the built--in functions
+that exist; some of them are mentioned in other sections, but they
+are summarized here for your convenience. (You can also define new
+functions yourself. *Note User-defined::.)
+
+In most cases, any extra arguments given to built--in functions are
+ignored. The defaults for omitted arguments vary from function to
+function and are described under the individual functions.
+
+The name of a built--in function need not be followed immediately by
+the opening left parenthesis of the arguments; whitespace is allowed.
+However, it is wise to write no space there, since user--defined
+functions do not allow space.
+
+When a function is called, expressions that create the function's
+actual parameters are evaluated completely before the function call
+is performed. For example, in the code fragment:
+
+ i = 4
+ j = myfunc(i++)
+
+the variable `i' will be set to 5 before `myfunc' is called with a
+value of 4 for its actual parameter.
+
+* Menu:
+
+* Numeric Functions:: Functions that work with numbers,
+ including `int', `sin' and `rand'.
+
+* String Functions:: Functions for string manipulation,
+ such as `split', `match', and `sprintf'.
+
+* I/O Functions:: Functions for files and shell commands
+
+
+
+File: gawk-info, Node: Numeric Functions, Next: String Functions, Up: Built-in
+
+Numeric Built--in Functions
+===========================
+
+The general syntax of the numeric built--in functions is the same for
+each. Here is an example of that syntax:
+
+ awk '# Read input records containing a pair of points: x0, y0, x1, y1.
+ # Print the points and the distance between them.
+ { printf "%f %f %f %f %f\n", $1, $2, $3, $4,
+ sqrt(($2-$1) * ($2-$1) + ($4-$3) * ($4-$3)) }'
+
+This calculates the square root of a calculation that uses the values
+of the fields. It then prints the first four fields of the input
+record and the result of the square root calculation.
+
+Here is the full list of numeric built--in functions:
+
+`int(X)'
+ This gives you the integer part of X, truncated toward 0. This
+ produces the nearest integer to X, located between X and 0.
+
+ For example, `int(3)' is 3, `int(3.9)' is 3, `int(-3.9)' is -3,
+ and `int(-3)' is -3 as well.
+
+`sqrt(X)'
+ This gives you the positive square root of X. It reports an
+ error if X is negative.
+
+`exp(X)'
+ This gives you the exponential of X, or reports an error if X is
+ out of range. The range of values X can have depends on your
+ machine's floating point representation.
+
+`log(X)'
+ This gives you the natural logarithm of X, if X is positive;
+ otherwise, it reports an error.
+
+`sin(X)'
+ This gives you the sine of X, with X in radians.
+
+`cos(X)'
+ This gives you the cosine of X, with X in radians.
+
+`atan2(Y, X)'
+ This gives you the arctangent of Y/X, with both in radians.
+
+`rand()'
+ This gives you a random number. The values of `rand()' are
+ uniformly--distributed between 0 and 1. The value is never 0
+ and never 1.
+
+ Often you want random integers instead. Here is a user--defined
+ function you can use to obtain a random nonnegative integer less
+ than N:
+
+ function randint(n) {
+ return int(n * rand())
+ }
+
+ The multiplication produces a random real number at least 0, and
+ less than N. We then make it an integer (using `int') between 0
+ and `N-1'.
+
+ Here is an example where a similar function is used to produce
+ random integers between 1 and N:
+
+ awk '
+ # Function to roll a simulated die.
+ function roll(n) { return 1 + int(rand() * n) }
+
+ # Roll 3 six--sided dice and print total number of points.
+ {
+ printf("%d points\n", roll(6)+roll(6)+roll(6))
+ }'
+
+ *Note* that `rand()' starts generating numbers from the same
+ point, or "seed", each time you run `awk'. This means that the
+ same program will produce the same results each time you run it.
+ The numbers are random within one `awk' run, but predictable
+ from run to run. This is convenient for debugging, but if you
+ want a program to do different things each time it is used, you
+ must change the seed to a value that will be different in each
+ run. To do this, use `srand'.
+
+`srand(X)'
+ The function `srand(X)' sets the starting point, or "seed", for
+ generating random numbers to the value X.
+
+ Each seed value leads to a particular sequence of ``random''
+ numbers. Thus, if you set the seed to the same value a second
+ time, you will get the same sequence of ``random'' numbers again.
+
+ If you omit the argument X, as in `srand()', then the current
+ date and time of day are used for a seed. This is the way to
+ get random numbers that are truly unpredictable.
+
+ The return value of `srand()' is the previous seed. This makes
+ it easy to keep track of the seeds for use in consistently
+ reproducing sequences of random numbers.
+
+
+
+File: gawk-info, Node: String Functions, Next: I/O Functions, Prev: Numeric Functions, Up: Built-in
+
+Built--in Functions for String Manipulation
+===========================================
+
+`index(IN, FIND)'
+ This searches the string IN for the first occurrence of the
+ string FIND, and returns the position where that occurrence
+ begins in the string IN. For example:
+
+ awk 'BEGIN { print index("peanut", "an") }'
+
+ prints `3'. If FIND is not found, `index' returns 0.
+
+`length(STRING)'
+ This gives you the number of characters in STRING. If STRING is
+ a number, the length of the digit string representing that
+ number is returned. For example, `length("abcde")' is 5.
+ Whereas, `length(15 * 35)' works out to 3. How? Well, 15 * 35
+ = 525, and 525 is then converted to the string `"525"', which
+ has three characters.
+
+`match(STRING, REGEXP)'
+ The `match' function searches the string, STRING, for the
+ longest, leftmost substring matched by the regular expression,
+ REGEXP. It returns the character position, or "index", of where
+ that substring begins (1, if it starts at the beginning of
+ STRING). If no match if found, it returns 0.
+
+ The `match' function sets the special variable `RSTART' to the
+ index. It also sets the special variable `RLENGTH' to the
+ length of the matched substring. If no match is found, `RSTART'
+ is set to 0, and `RLENGTH' to -1.
+
+ For example:
+
+ awk '{
+ if ($1 == "FIND")
+ regex = $2
+ else {
+ where = match($0, regex)
+ if (where)
+ print "Match of", regex, "found at", where, "in", $0
+ }
+ }'
+
+ This program looks for lines that match the regular expression
+ stored in the variable `regex'. This regular expression can be
+ changed. If the first word on a line is `FIND', `regex' is
+ changed to be the second word on that line. Therefore, given:
+
+ FIND fo*bar
+ My program was a foobar
+ But none of it would doobar
+ FIND Melvin
+ JF+KM
+ This line is property of The Reality Engineering Co.
+ This file was created by Melvin.
+
+ `awk' prints:
+
+ Match of fo*bar found at 18 in My program was a foobar
+ Match of Melvin found at 26 in This file was created by Melvin.
+
+`split(STRING, ARRAY, FIELD_SEPARATOR)'
+ This divides STRING up into pieces separated by FIELD_SEPARATOR,
+ and stores the pieces in ARRAY. The first piece is stored in
+ `ARRAY[1]', the second piece in `ARRAY[2]', and so forth. The
+ string value of the third argument, FIELD_SEPARATOR, is used as
+ a regexp to search for to find the places to split STRING. If
+ the FIELD_SEPARATOR is omitted, the value of `FS' is used.
+ `split' returns the number of elements created.
+
+ The `split' function, then, splits strings into pieces in a
+ manner similar to the way input lines are split into fields.
+ For example:
+
+ split("auto-da-fe", a, "-")
+
+ splits the string `auto-da-fe' into three fields using `-' as
+ the separator. It sets the contents of the array `a' as follows:
+
+ a[1] = "auto"
+ a[2] = "da"
+ a[3] = "fe"
+
+ The value returned by this call to `split' is 3.
+
+`sprintf(FORMAT, EXPRESSION1,...)'
+ This returns (without printing) the string that `printf' would
+ have printed out with the same arguments (*note Printf::.). For
+ example:
+
+ sprintf("pi = %.2f (approx.)", 22/7)
+
+ returns the string `"pi = 3.14 (approx.)"'.
+
+`sub(REGEXP, REPLACEMENT_STRING, TARGET_VARIABLE)'
+ The `sub' function alters the value of TARGET_VARIABLE. It
+ searches this value, which should be a string, for the leftmost
+ substring matched by the regular expression, REGEXP, extending
+ this match as far as possible. Then the entire string is
+ changed by replacing the matched text with REPLACEMENT_STRING.
+ The modified string becomes the new value of TARGET_VARIABLE.
+
+ This function is peculiar because TARGET_VARIABLE is not simply
+ used to compute a value, and not just any expression will do: it
+ must be a variable, field or array reference, so that `sub' can
+ store a modified value there. If this argument is omitted, then
+ the default is to use and alter `$0'.
+
+ For example:
+
+ str = "water, water, everywhere"
+ sub(/at/, "ith", str)
+
+ sets `str' to `"wither, water, everywhere"', by replacing the
+ leftmost, longest occurrence of `at' with `ith'.
+
+ The `sub' function returns the number of substitutions made
+ (either one or zero).
+
+ The special character, `&', in the replacement string,
+ REPLACEMENT_STRING, stands for the precise substring that was
+ matched by REGEXP. (If the regexp can match more than one
+ string, then this precise substring may vary.) For example:
+
+ awk '{ sub(/candidate/, "& and his wife"); print }'
+
+ will change the first occurrence of ``candidate'' to ``candidate
+ and his wife'' on each input line.
+
+ The effect of this special character can be turned off by
+ preceding it with a backslash (`\&'). To include a backslash in
+ the replacement string, it too must be preceded with a (second)
+ backslash.
+
+ Note: if you use `sub' with a third argument that is not a
+ variable, field or array element reference, then it will still
+ search for the pattern and return 0 or 1, but the modified
+ string is thrown away because there is no place to put it. For
+ example:
+
+ sub(/USA/, "United States", "the USA and Canada")
+
+ will indeed produce a string `"the United States and Canada"',
+ but there will be no way to use that string!
+
+`gsub(REGEXP, REPLACEMENT_STRING, TARGET_VARIABLE)'
+ This is similar to the `sub' function, except `gsub' replaces
+ *all* of the longest, leftmost, *non--overlapping* matching
+ substrings it can find. The ``g'' in `gsub' stands for
+ "global", which means replace *everywhere*. For example:
+
+ awk '{ gsub(/Britain/, "United Kingdom"); print }'
+
+ replaces all occurrences of the string `Britain' with `United
+ Kingdom' for all input records.
+
+ The `gsub' function returns the number of substitutions made.
+ If the variable to be searched and altered, TARGET_VARIABLE, is
+ omitted, then the entire input record, `$0', is used.
+
+ The characters `&' and `\' are special in `gsub' as they are in
+ `sub' (see immediately above).
+
+`substr(STRING, START, LENGTH)'
+ This returns a LENGTH--character--long substring of STRING,
+ starting at character number START. The first character of a
+ string is character number one. For example,
+ `substr("washington", 5, 3)' returns `"ing"'.
+
+ If LENGTH is not present, this function returns the whole suffix
+ of STRING that begins at character number START. For example,
+ `substr("washington", 5)' returns `"ington"'.
+
+
+
+File: gawk-info, Node: I/O Functions, Prev: String Functions, Up: Built-in
+
+Built--in Functions for I/O to Files and Commands
+=================================================
+
+`close(FILENAME)'
+ Close the file FILENAME. The argument may alternatively be a
+ shell command that was used for redirecting to or from a pipe;
+ then the pipe is closed.
+
+ *Note Close Input::, regarding closing input files and pipes.
+ *Note Close Output::, regarding closing output files and pipes.
+
+`system(COMMAND)'
+ The system function allows the user to execute operating system
+ commands and then return to the `awk' program. The `system'
+ function executes the command given by the string value of
+ COMMAND. It returns, as its value, the status returned by the
+ command that was executed. This is known as returning the "exit
+ status".
+
+ For example, if the following fragment of code is put in your
+ `awk' program:
+
+ END {
+ system("mail -s 'awk run done' operator < /dev/null")
+ }
+
+ the system operator will be sent mail when the `awk' program
+ finishes processing input and begins its end--of--input
+ processing.
+
+ Note that much the same result can be obtained by redirecting
+ `print' or `printf' into a pipe. However, if your `awk' program
+ is interactive, this function is useful for cranking up large
+ self--contained programs, such as a shell or an editor.
+
+
+
+File: gawk-info, Node: User-defined, Next: Special, Prev: Built-in, Up: Top
+
+User--defined Functions
+***********************
+
+Complicated `awk' programs can often be simplified by defining your
+own functions. User--defined functions can be called just like
+built--in ones (*note Function Calls::.), but it is up to you to
+define them--to tell `awk' what they should do.
+
+* Menu:
+
+* Definition Syntax:: How to write definitions and what they mean.
+* Function Example:: An example function definition and what it does.
+* Function Caveats:: Things to watch out for.
+* Return Statement:: Specifying the value a function returns.
+
+
+
+File: gawk-info, Node: Definition Syntax, Next: Function Example, Up: User-defined
+
+Syntax of Function Definitions
+==============================
+
+The definition of a function named NAME looks like this:
+
+ function NAME (PARAMETER-LIST) {
+ BODY-OF-FUNCTION
+ }
+
+A valid function name is like a valid variable name: a sequence of
+letters, digits and underscores, not starting with a digit.
+
+Such function definitions can appear anywhere between the rules of
+the `awk' program. The general format of an `awk' program, then, is
+now modified to include sequences of rules *and* user--defined
+function definitions.
+
+The function definition need not precede all the uses of the function.
+This is because `awk' reads the entire program before starting to
+execute any of it.
+
+The PARAMETER-LIST is a list of the function's "local" variable
+names, separated by commas. Within the body of the function, local
+variables refer to arguments with which the function is called. If
+the function is called with fewer arguments than it has local
+variables, this is not an error; the extra local variables are simply
+set as the null string.
+
+The local variable values hide or "shadow" any variables of the same
+names used in the rest of the program. The shadowed variables are
+not accessible in the function definition, because there is no way to
+name them while their names have been taken away for the local
+variables. All other variables used in the `awk' program can be
+referenced or set normally in the function definition.
+
+The local variables last only as long as the function is executing.
+Once the function finishes, the shadowed variables come back.
+
+The BODY-OF-FUNCTION part of the definition is the most important
+part, because this is what says what the function should actually *do*.
+The local variables exist to give the body a way to talk about the
+arguments.
+
+Functions may be "recursive", i.e., they can call themselves, either
+directly, or indirectly (via calling a second function that calls the
+first again).
+
+The keyword `function' may also be written `func'.
+
+
+
+File: gawk-info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined
+
+Function Definition Example
+===========================
+
+Here is an example of a user--defined function, called `myprint',
+that takes a number and prints it in a specific format.
+
+ function myprint(num)
+ {
+ printf "%6.3g\n", num
+ }
+
+To illustrate, let's use the following `awk' rule to use, or "call",
+our `myprint' function:
+
+ $3 > 0 { myprint($3) }'
+
+This program prints, in our special format, all the third fields that
+contain a positive number in our input. Therefore, when given:
+
+ 1.2 3.4 5.6 7.8
+ 9.10 11.12 13.14 15.16
+ 17.18 19.20 21.22 23.24
+
+this program, using our function to format the results, will print:
+
+ 5.6
+ 13.1
+ 21.2
+
+Here is a rather contrived example of a recursive function. It
+prints a string backwards:
+
+ function rev (str, len) {
+ if (len == 0) {
+ printf "\n"
+ return
+ }
+ printf "%c", substr(str, len, 1)
+ rev(str, len - 1)
+ }
+
+
+
+File: gawk-info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined
+
+Caveats of Function Calling
+===========================
+
+*Note* that there cannot be any blanks between the function name and
+the left parenthesis of the argument list, when calling a function.
+This is so `awk' can tell you are not trying to concatenate the value
+of a variable with the value of an expression inside the parentheses.
+
+When a function is called, it is given a *copy* of the values of its
+arguments. This is called "passing by value". The caller may use a
+variable as the expression for the argument, but the called function
+does not know this: all it knows is what value the argument had. For
+example, if you write this code:
+
+ foo = "bar"
+ z = myfunc(foo)
+
+then you should not think of the argument to `myfunc' as being ``the
+variable `foo'''. Instead, think of the argument as the string
+value, `"bar"'.
+
+If the function `myfunc' alters the values of its local variables,
+this has no effect on any other variables. In particular, if
+`myfunc' does this:
+
+ function myfunc (win) {
+ print win
+ win = "zzz"
+ print win
+ }
+
+to change its first argument variable `win', this *does not* change
+the value of `foo' in the caller. The role of `foo' in calling
+`myfunc' ended when its value, `"bar"', was computed. If `win' also
+exists outside of `myfunc', this definition will not change it--that
+value is shadowed during the execution of `myfunc' and cannot be seen
+or changed from there.
+
+However, when arrays are the parameters to functions, they are *not*
+copied. Instead, the array itself is made available for direct
+manipulation by the function. This is usually called "passing by
+reference". Changes made to an array parameter inside the body of a
+function *are* visible outside that function. *This can be very
+dangerous if you don't watch what you are doing.* For example:
+
+ function changeit (array, ind, nvalue) {
+ array[ind] = nvalue
+ }
+
+ BEGIN {
+ a[1] = 1 ; a[2] = 2 ; a[3] = 3
+ changeit(a, 2, "two")
+ printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
+ }
+
+will print `a[1] = 1, a[2] = two, a[3] = 3', because the call to
+`changeit' stores `"two"' in the second element of `a'.
+
+
+
+File: gawk-info, Node: Return Statement, Prev: Function Caveats, Up: User-defined
+
+The `return' statement
+======================
+
+The body of a user--defined function can contain a `return' statement.
+This statement returns control to the rest of the `awk' program. It
+can also be used to return a value for use in the rest of the `awk'
+program. It looks like:
+
+ `return EXPRESSION'
+
+The EXPRESSION part is optional. If it is omitted, then the returned
+value is undefined and, therefore, unpredictable.
+
+A `return' statement with no value expression is assumed at the end
+of every function definition. So if control reaches the end of the
+function definition, then the function returns an unpredictable value.
+
+Here is an example of a user--defined function that returns a value
+for the largest number among the elements of an array:
+
+ function maxelt (vec, i, ret) {
+ for (i in vec) {
+ if (ret == "" || vec[i] > ret)
+ ret = vec[i]
+ }
+ return ret
+ }
+
+You call `maxelt' with one argument, an array name. The local
+variables `i' and `ret' are not intended to be arguments; while there
+is nothing to stop you from passing two or three arguments to
+`maxelt', the results would be strange.
+
+When writing a function definition, it is conventional to separate
+the parameters from the local variables with extra spaces, as shown
+above in the definition of `maxelt'.
+
+Here is a program that uses, or calls, our `maxelt' function. This
+program loads an array, calls `maxelt', and then reports the maximum
+number in that array:
+
+ awk '
+ function maxelt (vec, i, ret) {
+ for (i in vec) {
+ if (ret == "" || vec[i] > ret)
+ ret = vec[i]
+ }
+ return ret
+ }
+
+ # Load all fields of each record into nums.
+ {
+ for(i = 1; i <= NF; i++)
+ nums[NR, i] = $i
+ }
+
+ END {
+ print maxelt(nums)
+ }'
+
+Given the following input:
+
+ 1 5 23 8 16
+ 44 3 5 2 8 26
+ 256 291 1396 2962 100
+ -6 467 998 1101
+ 99385 11 0 225
+
+our program tells us (predictably) that:
+
+ 99385
+
+is the largest number in our array.
+
+
+
+File: gawk-info, Node: Special, Next: Sample Program, Prev: User-defined, Up: Top
+
+Special Variables
+*****************
+
+Most `awk' variables are available for you to use for your own
+purposes; they will never change except when your program assigns
+them, and will never affect anything except when your program
+examines them.
+
+A few variables have special meanings. Some of them `awk' examines
+automatically, so that they enable you to tell `awk' how to do
+certain things. Others are set automatically by `awk', so that they
+carry information from the internal workings of `awk' to your program.
+
+Most of these variables are also documented in the chapters where
+their areas of activity are described.
+
+* Menu:
+
+* User-modified:: Special variables that you change to control `awk'.
+
+* Auto-set:: Special variables where `awk' gives you information.
+
+
+
+File: gawk-info, Node: User-modified, Next: Auto-set, Up: Special
+
+Special Variables That Control `awk'
+====================================
+
+This is a list of the variables which you can change to control how
+`awk' does certain things.
+
+`FS'
+ `FS' is the input field separator (*note Field Separators::.).
+ The value is a regular expression that matches the separations
+ between fields in an input record.
+
+ The default value is `" "', a string consisting of a single
+ space. As a special exception, this value actually means that
+ any sequence of spaces and tabs is a single separator. It also
+ causes spaces and tabs at the beginning or end of a line to be
+ ignored.
+
+ You can set the value of `FS' on the command line using the `-F'
+ option:
+
+ awk -F, 'PROGRAM' INPUT-FILES
+
+`OFMT'
+ This string is used by `awk' to control conversion of numbers to
+ strings (*note Conversion::.). It works by being passed, in
+ effect, as the first argument to the `sprintf' function. Its
+ default value is `"%.6g"'.
+
+`OFS'
+ This is the output field separator (*note Output Separators::.).
+ It is output between the fields output by a `print' statement.
+ Its default value is `" "', a string consisting of a single space.
+
+`ORS'
+ This is the output record separator (*note Output
+ Separators::.). It is output at the end of every `print'
+ statement. Its default value is the newline character, often
+ represented in `awk' programs as `\n'.
+
+`RS'
+ This is `awk''s record separator (*note Records::.). Its
+ default value is a string containing a single newline character,
+ which means that an input record consists of a single line of
+ text.
+
+`SUBSEP'
+ `SUBSEP' is a subscript separator (*note Multi-dimensional::.).
+ It has the default value of `"\034"', and is used to separate
+ the parts of the name of a multi--dimensional array. Thus, if
+ you access `foo[12,3]', it really accesses `foo["12\0343"]'.
+
+
+
+File: gawk-info, Node: Auto-set, Prev: User-modified, Up: Special
+
+Special Variables That Convey Information to You
+================================================
+
+This is a list of the variables that are set automatically by `awk'
+on certain occasions so as to provide information for your program.
+
+`ARGC'
+`ARGV'
+ The command--line arguments available to `awk' are stored in an
+ array called `ARGV'. `ARGC' is the number of command--line
+ arguments present. `ARGV' is indexed from zero to `ARGC' - 1.
+ For example:
+
+ awk '{ print ARGV[$1] }' inventory-shipped BBS-list
+
+ In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
+ `"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'.
+ `ARGC' is 3, one more than the index of the last element in
+ `ARGV' since the elements are numbered from zero.
+
+ Notice that the `awk' program is not treated as an argument.
+ The `-f' `FILENAME' option, and the `-F' option, are also not
+ treated as arguments for this purpose.
+
+ Variable assignments on the command line *are* treated as
+ arguments, and do show up in the `ARGV' array.
+
+ Your program can alter `ARGC' the elements of `ARGV'. Each time
+ `awk' reaches the end of an input file, it uses the next element
+ of `ARGV' as the name of the next input file. By storing a
+ different string there, your program can change which files are
+ read. You can use `-' to represent the standard input. By
+ storing additional elements and incrementing `ARGC' you can
+ cause additional files to be read.
+
+ If you decrease the value of `ARGC', that eliminates input files
+ from the end of the list. By recording the old value of `ARGC'
+ elsewhere, your program can treat the eliminated arguments as
+ something other than file names.
+
+ To eliminate a file from the middle of the list, store the null
+ string (`""') into `ARGV' in place of the file's name. As a
+ special feature, `awk' ignores file names that have been
+ replaced with the null string.
+
+`ENVIRON'
+ This is an array that contains the values of the environment.
+ The array indices are the environment variable names; the values
+ are the values of the particular environment variables. For
+ example, `ENVIRON["HOME"]' might be `/u/close'. Changing this
+ array does not affect the environment passed on to any programs
+ that `awk' may spawn via redirection or the `system' function.
+ (This may not work under operating systems other than MS-DOS,
+ Unix, or GNU.)
+
+`FILENAME'
+ This is the name of the file that `awk' is currently reading.
+ If `awk' is reading from the standard input (in other words,
+ there are no files listed on the command line), `FILENAME' is
+ set to `"-"'. `FILENAME' is changed each time a new file is
+ read (*note Reading Files::.).
+
+`FNR'
+ `FNR' is the current record number in the current file. `FNR'
+ is incremented each time a new record is read (*note Getline::.).
+ It is reinitialized to 0 each time a new input file is started.
+
+`NF'
+ `NF' is the number of fields in the current input record. `NF'
+ is set each time a new record is read, when a new field is
+ created, or when $0 changes (*note Fields::.).
+
+`NR'
+ This is the number of input records `awk' has processed since
+ the beginning of the program's execution. (*note Records::.).
+ `NR' is set each time a new record is read.
+
+`RLENGTH'
+ `RLENGTH' is the length of the string matched by the `match'
+ function (*note String Functions::.). `RLENGTH' is set by
+ invoking the `match' function. Its value is the length of the
+ matched string, or -1 if no match was found.
+
+`RSTART'
+ `RSTART' is the start of the string matched by the `match'
+ function (*note String Functions::.). `RSTART' is set by
+ invoking the `match' function. Its value is the position of the
+ string where the matched string starts, or 0 if no match was
+ found.
+
+
+
+File: gawk-info, Node: Sample Program, Next: Notes, Prev: Special, Up: Top
+
+Sample Program
+**************
+
+The following example is a complete `awk' program, which prints the
+number of occurrences of each word in its input. It illustrates the
+associative nature of `awk' arrays by using strings as subscripts.
+It also demonstrates the `for X in ARRAY' construction. Finally, it
+shows how `awk' can be used in conjunction with other utility
+programs to do a useful task of some complexity with a minimum of
+effort. Some explanations follow the program listing.
+
+ awk '
+ # Print list of word frequencies
+ {
+ for (i = 1; i <= NF; i++)
+ freq[$i]++
+ }
+
+ END {
+ for (word in freq)
+ printf "%s\t%d\n", word, freq[word]
+ }'
+
+The first thing to notice about this program is that it has two
+rules. The first rule, because it has an empty pattern, is executed
+on every line of the input. It uses `awk''s field--accessing
+mechanism (*note Fields::.) to pick out the individual words from the
+line, and the special variable `NF' (*note Special::.) to know how
+many fields are available.
+
+For each input word, an element of the array `freq' is incremented to
+reflect that the word has been seen an additional time.
+
+The second rule, because it has the pattern `END', is not executed
+until the input has been exhausted. It prints out the contents of
+the `freq' table that has been built up inside the first action.
+
+Note that this program has several problems that would prevent it
+from being useful by itself on real text files:
+
+ * Words are detected using the `awk' convention that fields are
+ separated by whitespace and that other characters in the input
+ (except newlines) don't have any special meaning to `awk'. This
+ means that punctuation characters count as part of words.
+
+ * The `awk' language considers upper and lower case characters to
+ be distinct. Therefore, `foo' and `Foo' will not be treated by
+ this program as the same word. This is undesirable since in
+ normal text, words are capitalized if they begin sentences, and
+ a frequency analyzer should not be sensitive to that.
+
+ * The output does not come out in any useful order. You're more
+ likely to be interested in which words occur most frequently, or
+ having an alphabetized table of how frequently each word occurs.
+
+The way to solve these problems is to use other operating system
+utilities to process the input and output of the `awk' script.
+Suppose the script shown above is saved in the file `frequency.awk'.
+Then the shell command:
+
+ tr A-Z a-z < file1 | tr -cd 'a-z\012' \
+ | awk -f frequency.awk \
+ | sort +1 -nr
+
+produces a table of the words appearing in `file1' in order of
+decreasing frequency.
+
+The first `tr' command in this pipeline translates all the upper case
+characters in `file1' to lower case. The second `tr' command deletes
+all the characters in the input except lower case characters and
+newlines. The second argument to the second `tr' is quoted to
+protect the backslash in it from being interpreted by the shell. The
+`awk' program reads this suitably massaged data and produces a word
+frequency table, which is not ordered.
+
+The `awk' script's output is now sorted by the `sort' command and
+printed on the terminal. The options given to `sort' in this example
+specify to sort by the second field of each input line (skipping one
+field), that the sort keys should be treated as numeric quantities
+(otherwise `15' would come before `5'), and that the sorting should
+be done in descending (reverse) order.
+
+See the general operating system documentation for more information
+on how to use the `tr' and `sort' commands.
+
+
+
+File: gawk-info, Node: Notes, Next: Glossary, Prev: Sample Program, Up: Top
+
+Implementation Notes
+********************
+
+This appendix contains information mainly of interest to implementors
+and maintainers of `gawk'. Everything in it applies specifically to
+`gawk', and not to other implementations.
+
+* Menu:
+
+* Extensions:: Things`gawk' does that Unix `awk' does not.
+
+* Future Extensions:: Things likely to appear in a future release.
+
+* Improvements:: Suggestions for future improvements.
+
+* Manual Improvements:: Suggestions for improvements to this manual.
+
+
+
+File: gawk-info, Node: Extensions, Next: Future Extensions, Up: Notes
+
+GNU Extensions to the AWK Language
+==================================
+
+Several new features are in a state of flux. They are described here
+merely to document them somewhat, but they will probably change. We
+hope they will be incorporated into other versions of `awk', too.
+
+All of these features can be turned off either by compiling `gawk'
+with `-DSTRICT', or by invoking `gawk' as `awk'.
+
+The `AWKPATH' environment variable
+ When opening a file supplied via the `-f' option, if the
+ filename does not contain a `/', `gawk' will perform a "path
+ search" for the file, similar to that performed by the shell.
+ `gawk' gets its search path from the `AWKPATH' environment
+ variable. If that variable does not exist, it uses the default
+ path `".:/usr/lib/awk:/usr/local/lib/awk"'.
+
+Case Independent Matching
+ Two new operators have been introduced, `~~', and `!~~'. These
+ perform regular expression match and no-match operations that
+ are case independent. In other words, `A' and `a' would both
+ match `/a/'.
+
+The `-i' option
+ This option causes the `~' and `!~' operators to behave like the
+ `~~' and `!~~' operators described above.
+
+The `-v' option
+ This option prints version information for this particular copy
+ of `gawk'. This is so you can determine if your copy of `gawk'
+ is up to date with respect to whatever the Free Software
+ Foundation is currently distributing. It may disappear in a
+ future version of `gawk'.
+
+
+
+File: gawk-info, Node: Future Extensions, Next: Improvements, Prev: Extensions, Up: Notes
+
+Extensions Likely To Appear In A Future Release
+===============================================
+
+Here are some more extensions that indicate the directions we are
+currently considering for `gawk'. Like the previous section, this
+section is also subject to change. None of these are implemented yet.
+
+The `IGNORECASE' special variable
+ If `IGNORECASE' is non--zero, then *all* regular expression
+ matching will be done in a case--independent fashion. The `-i'
+ option and the `~~' and `!~~' operators will go away, as this
+ mechanism generalizes those facilities.
+
+More Escape Sequences
+ The ANSI C `\a', and `\x' escape sequences will be recognized.
+ Unix `awk' does not recognize `\v', although `gawk' does.
+
+`RS' as a regexp
+ The meaning of `RS' will be generalized along the lines of `FS'.
+
+Transliteration Functions
+ We are planning on adding `toupper' and `tolower' functions
+ which will take string arguments, and return strings where the
+ case of each letter has been transformed to upper-- or
+ lower--case respectively.
+
+Access To System File Descriptors
+ `gawk' will recognize the special file names `/dev/stdin',
+ `/dev/stdout', `/dev/stderr', and `/dev/fd/N' internally. These
+ will allow access to inherited file descriptors from within an
+ `awk' program.
+
+
+
+File: gawk-info, Node: Improvements, Next: Manual Improvements, Prev: Future Extensions, Up: Notes
+
+Suggestions for Future Improvements
+===================================
+
+Here are some projects that would--be `gawk' hackers might like to
+take on. They vary in size from a few days to a few weeks of
+programming, depending on which one you choose and how fast a
+programmer you are. Please send any improvements you write to the
+maintainers at the GNU project.
+
+ 1. State machine regexp matcher: At present, `gawk' uses the
+ backtracking regular expression matcher from the GNU subroutine
+ library. If a regexp is really going to be used a lot of times,
+ it is faster to convert it once to a description of a finite
+ state machine, then run a routine simulating that machine every
+ time you want to match the regexp. You could use the matching
+ routines used by GNU `egrep'.
+
+ 2. Compilation of `awk' programs: `gawk' uses a `Bison'
+ (YACC--like) parser to convert the script given it into a syntax
+ tree; the syntax tree is then executed by a simple recursive
+ evaluator. Both of these steps incur a lot of overhead, since
+ parsing can be slow (especially if you also do the previous
+ project and convert regular expressions to finite state machines
+ at compile time) and the recursive evaluator performs many
+ procedure calls to do even the simplest things.
+
+ It should be possible for `gawk' to convert the script's parse
+ tree into a C program which the user would then compile, using
+ the normal C compiler and a special `gawk' library to provide
+ all the needed functions (regexps, fields, associative arrays,
+ type coercion, and so on).
+
+ An easier possibility might be for an intermediate phase of
+ `awk' to convert the parse tree into a linear byte code form
+ like the one used in GNU Emacs Lisp. The recursive evaluator
+ would then be replaced by a straight line byte code interpreter
+ that would be intermediate in speed between running a compiled
+ program and doing what `gawk' does now.
+
+
+
+File: gawk-info, Node: Manual Improvements, Prev: Improvements, Up: Notes
+
+Suggestions For Future Improvements of This Manual
+==================================================
+
+ 1. An error message section has not been included in this version
+ of the manual. Perhaps some nice beta testers will document
+ some of the messages for the future.
+
+ 2. A summary page has not been included, as the ``man'', or help,
+ page that comes with the `gawk' code should suffice.
+
+ GNU only supports Info, so this manual itself should contain
+ whatever forms of information it would be useful to have on an
+ Info summary page.
+
+ 3. A function and variable index has not been included as we are
+ not sure what to put in it.
+
+ 4. A section summarizing the differences between V7 `awk' and
+ System V Release 4 `awk' would be useful for long--time `awk'
+ hackers.
+
+
+
+File: gawk-info, Node: Glossary, Next: Index, Prev: Notes, Up: Top
+
+Glossary
+********
+
+Action
+ A series of `awk' statements attached to a rule. If the rule's
+ pattern matches an input record, the `awk' language executes the
+ rule's action. Actions are always enclosed in curly braces.
+
+Amazing `awk' assembler
+ Henry Spencer at the University of Toronto wrote a retargetable
+ assembler completely as `awk' scripts. It is thousands of lines
+ long, including machine descriptions for several 8--bit
+ microcomputers. It is distributed with `gawk' and is a good
+ example of a program that would have been better written in
+ another language.
+
+Assignment
+ An `awk' expression that changes the value of some `awk'
+ variable or data object. An object that you can assign to is
+ called an "lvalue".
+
+Built-in function
+ The `awk' language provides built--in functions that perform
+ various numerical and string computations. Examples are `sqrt'
+ (for the square root of a number) and `substr' (for a substring
+ of a string).
+
+C
+ The system programming language that most of GNU is written in.
+ The `awk' programming language has C--like syntax, and this
+ manual points out similarities between `awk' and C when
+ appropriate.
+
+Compound statement
+ A series of `awk' statements, enclosed in curly braces.
+ Compound statements may be nested.
+
+Concatenation
+ Concatenating two strings means sticking them together, one
+ after another, giving a new string. For example, the string
+ `foo' concatenated with the string `bar' gives the string
+ `foobar'.
+
+Conditional expression
+ A relation that is either true or false, such as `(a < b)'.
+ Conditional expressions are used in `if' and `while' statements,
+ and in patterns to select which input records to process.
+
+Curly braces
+ The characters `{' and `}'. Curly braces are used in `awk' for
+ delimiting actions, compound statements, and function bodies.
+
+Data objects
+ These are numbers and strings of characters. Numbers are
+ converted into strings and vice versa, as needed.
+
+Escape Sequences
+ A special sequence of characters used for describing
+ non--printable characters, such as `\n' for newline, or `\033'
+ for the ASCII ESC (escape) character.
+
+Field
+ When `awk' reads an input record, it splits the record into
+ pieces separated by whitespace (or by a separator regexp which
+ you can change by setting the special variable `FS'). Such
+ pieces are called fields.
+
+Format
+ Format strings are used to control the appearance of output in
+ the `printf' statement. Also, data conversions from numbers to
+ strings are controlled by the format string contained in the
+ special variable `OFMT'.
+
+Function
+ A specialized group of statements often used to encapsulate
+ general or program--specific tasks. `awk' has a number of
+ built--in functions, and also allows you to define your own.
+
+`gawk'
+ The GNU implementation of `awk'.
+
+`awk' language
+ The language in which `awk' programs are written.
+
+`awk' program
+ An `awk' program consists of a series of "patterns" and
+ "actions", collectively known as "rules". For each input record
+ given to the program, the program's rules are all processed in
+ turn. `awk' programs may also contain function definitions.
+
+`awk' script
+ Another name for an `awk' program.
+
+Input record
+ A single chunk of data read in by `awk'. Usually, an `awk'
+ input record consists of one line of text.
+
+Keyword
+ In the `awk' language, a keyword is a word that has special
+ meaning. Keywords are reserved and may not be used as variable
+ names.
+
+ The keywords are: `if', `else', `while', `do...while', `for',
+ `for...in', `break', `continue', `delete', `next', `function',
+ `func', and `exit'.
+
+Lvalue
+ An expression that can appear on the left side of an assignment
+ operator. In most languages, lvalues can be variables or array
+ elements. In `awk', a field designator can also be used as an
+ lvalue.
+
+Number
+ A numeric valued data object. The `gawk' implementation uses
+ double precision floating point to represent numbers.
+
+Pattern
+ Patterns tell `awk' which input records are interesting to which
+ rules.
+
+ A pattern is an arbitrary conditional expression against which
+ input is tested. If the condition is satisfied, the pattern is
+ said to "match" the input record. A typical pattern might
+ compare the input record against a regular expression.
+
+Range (of input lines)
+ A sequence of consecutive lines from the input file. A pattern
+ can specify ranges of input lines for `awk' to process, or it
+ can specify single lines.
+
+Recursion
+ When a function calls itself, either directly or indirectly. If
+ this isn't clear, refer to the entry for ``recursion''.
+
+Redirection
+ Redirection means performing input from other than the standard
+ input stream, or output to other than the standard output stream.
+
+ You can redirect the output of the `print' and `printf'
+ statements to a file or a system command, using the `>', `>>',
+ and `|' operators. You can redirect input to the `getline'
+ statement using the `<' and `|' operators.
+
+Regular Expression
+ See ``regexp''.
+
+Regexp
+ Short for "regular expression". A regexp is a pattern that
+ denotes a set of strings, possibly an infinite set. For
+ example, the regexp `R.*xp' matches any string starting with the
+ letter `R' and ending with the letters `xp'. In `awk', regexps
+ are used in patterns and in conditional expressions.
+
+Rule
+ A segment of an `awk' program, that specifies how to process
+ single input records. A rule consists of a "pattern" and an
+ "action". `awk' reads an input record; then, for each rule, if
+ the input record satisfies the rule's pattern, `awk' executes
+ the rule's action. Otherwise, the rule does nothing for that
+ input record.
+
+Special Variable
+ The variables `ARGC', `ARGV', `ENVIRON', `FILENAME', `FNR',
+ `FS', `NF', `NR', `OFMT', `OFS', `ORS', `RLENGTH', `RSTART',
+ `RS', `SUBSEP', have special meaning to `awk'. Changing some of
+ them affects `awk''s running environment.
+
+Stream Editor
+ A program that reads records from an input stream and processes
+ them one or more at a time. This is in contrast with batch
+ programs, which may expect to read their input files in entirety
+ before starting to do anything, and with interactive programs,
+ which require input from the user.
+
+String
+ A datum consisting of a sequence of characters, such as `I am a
+ string'. Constant strings are written with double--quotes in
+ the `awk' language, and may contain "escape sequences".
+
+Whitespace
+ A sequence of blank or tab characters occurring inside an input
+ record or a string.
+
+
+
+File: gawk-info, Node: Index, Prev: Glossary, Up: Top
+
+Index
+*****
+
+* Menu:
+
+* #!: Executable Scripts.
+* -f option: Long.
+* `$NF', last field in record: Fields.
+* `$' (field operator): Fields.
+* `>>': Redirection.
+* `>': Redirection.
+* `BEGIN', special pattern: BEGIN/END.
+* `END', special pattern: BEGIN/END.
+* `awk' language: This Manual.
+* `awk' program: This Manual.
+* `break' statement: Break.
+* `close' statement for input: Close Input.
+* `close' statement for output: Close Output.
+* `continue' statement: Continue.
+* `delete' statement: Delete.
+* `exit' statement: Exit.
+* `for (x in ...)': Scanning an Array.
+* `for' statement: For.
+* `if' statement: If.
+* `next' statement: Next.
+* `print $0': Very Simple.
+* `printf' statement, format of: Basic Printf.
+* `printf', format-control characters: Format-Control.
+* `printf', modifiers: Modifiers.
+* `print' statement: Print.
+* `return' statement: Return Statement.
+* `while' statement: While.
+* `|': Redirection.
+* `BBS-list' file: The Files.
+* `inventory-shipped' file: The Files.
+* Accessing fields: Fields.
+* Acronym: History.
+* Action, curly braces: Actions.
+* Action, curly braces: Getting Started.
+* Action, default: Very Simple.
+* Action, definition of: Getting Started.
+* Action, general: Actions.
+* Action, separating statements: Actions.
+* Applications of `awk': When.
+* Arguments in function call: Function Calls.
+* Arguments, Command Line: Command Line.
+* Arithmetic operators: Arithmetic Ops.
+* Array assignment: Assigning Elements.
+* Array reference: Reference to Elements.
+* Arrays: Array Intro.
+* Arrays, definition of: Array Intro.
+* Arrays, deleting an element: Delete.
+* Arrays, determining presence of elements: Reference to Elements.
+* Arrays, multi-dimensional subscripts: Multi-dimensional.
+* Arrays, special `for' statement: Scanning an Array.
+* Assignment operators: Assignment Ops.
+* Associative arrays: Array Intro.
+* Backslash Continuation: Statements/Lines.
+* Basic function of `gawk': Getting Started.
+* Body of a loop: While.
+* Boolean expressions: Boolean Ops.
+* Boolean operators: Boolean Ops.
+* Boolean patterns: Boolean.
+* Built-in functions, list of: Built-in.
+* Built-in variables: Variables.
+* Calling a function: Function Calls.
+* Case sensitivity and gawk: Read Terminal.
+* Changing contents of a field: Changing Fields.
+* Changing the record separator: Records.
+* Closing files and pipes: Close Output.
+* Command Line: Command Line.
+* Command line formats: Running gawk.
+* Command line, setting `FS' on: Field Separators.
+* Comments: Comments.
+* Comparison expressions: Comparison Ops.
+* Comparison expressions as patterns: Comparison Patterns.
+* Compound statements: Actions.
+* Computed Regular Expressions: Regexp Usage.
+* Concatenation: Concatenation.
+* Conditional Patterns: Conditional Patterns.
+* Conditional expression: Conditional Exp.
+* Constants, types of: Constants.
+* Continuing statements on the next line: Statements/Lines.
+* Conversion of strings and numbers: Conversion.
+* Curly braces: Actions.
+* Curly braces: Getting Started.
+* Default action: Very Simple.
+* Default pattern: Very Simple.
+* Deleting elements of arrays: Delete.
+* Differences between `gawk' and `awk': Arithmetic Ops.
+* Differences between `gawk' and `awk': Constants.
+* Documenting `awk' programs: Comments.
+* Dynamic Regular Expressions: Regexp Usage.
+* Element assignment: Assigning Elements.
+* Element of array: Reference to Elements.
+* Emacs Lisp: When.
+* Empty pattern: Empty.
+* Escape sequence notation: Constants.
+* Examining fields: Fields.
+* Executable Scripts: Executable Scripts.
+* Expression, conditional: Conditional Exp.
+* Expressions: Actions.
+* Expressions, boolean: Boolean Ops.
+* Expressions, comparison: Comparison Ops.
+* Field separator, `FS': Field Separators.
+* Field separator, choice of: Field Separators.
+* Field separator, setting on command line: Field Separators.
+* Field, changing contents of: Changing Fields.
+* Fields: Fields.
+* Fields, negative-numbered: Non-Constant Fields.
+* Fields, semantics of: Field Separators.
+* Fields, separating: Field Separators.
+* Format specifier: Format-Control.
+* Format string: Basic Printf.
+* Formatted output: Printf.
+* Function call: Function Calls.
+* Function definitions: Actions.
+* Functions, user-defined: User-defined.
+* General input: Reading Files.
+* History of `awk': History.
+* How gawk works: Two Rules.
+* Increment operators: Increment Ops.
+* Input file, sample: The Files.
+* Input, `getline' function: Getline.
+* Input, general: Reading Files.
+* Input, multiple line records: Multiple.
+* Input, standard: Read Terminal.
+* Input, standard: Reading Files.
+* Interaction of `awk' with other programs: I/O Functions.
+* Invocation of `gawk': Command Line.
+* Language, `awk': This Manual.
+* Loop: While.
+* Loops, breaking out of: Break.
+* Lvalue: Assignment Ops.
+* Manual, using this: This Manual.
+* Metacharacters: Regexp Operators.
+* Mod function, semantics of: Arithmetic Ops.
+* Modifiers (in format specifiers): Modifiers.
+* Multiple line records: Multiple.
+* Multiple passes over data: Command Line.
+* Multiple statements on one line: Statements/Lines.
+* Negative-numbered fields: Non-Constant Fields.
+* Number of fields, `NF': Fields.
+* Number of records, `FNR': Records.
+* Number of records, `NR': Records.
+* Numerical constant: Constants.
+* Numerical value: Constants.
+* One-liners: One-liners.
+* Operator, Ternary: Conditional Patterns.
+* Operators, `$': Fields.
+* Operators, arithmetic: Arithmetic Ops.
+* Operators, assignment: Assignment Ops.
+* Operators, boolean: Boolean Ops.
+* Operators, increment: Increment Ops.
+* Operators, regular expression matching: Regexp Usage.
+* Operators, relational: Comparison Ops.
+* Operators, relational: Comparison Patterns.
+* Operators, string: Concatenation.
+* Operators, string-matching: Regexp Usage.
+* Options, Command Line: Command Line.
+* Output: Printing.
+* Output field separator, `OFS': Output Separators.
+* Output record separator, `ORS': Output Separators.
+* Output redirection: Redirection.
+* Output, formatted: Printf.
+* Output, piping: Redirection.
+* Passes, Multiple: Command Line.
+* Pattern, case sensitive: Read Terminal.
+* Pattern, comparison expressions: Comparison Patterns.
+* Pattern, default: Very Simple.
+* Pattern, definition of: Getting Started.
+* Pattern, empty: Empty.
+* Pattern, regular expressions: Regexp.
+* Patterns, `BEGIN': BEGIN/END.
+* Patterns, `END': BEGIN/END.
+* Patterns, Conditional: Conditional Patterns.
+* Patterns, boolean: Boolean.
+* Patterns, definition of: Patterns.
+* Patterns, types of: Patterns.
+* Pipes for output: Redirection.
+* Printing, general: Printing.
+* Program, `awk': This Manual.
+* Program, Self contained: Executable Scripts.
+* Program, definition of: Getting Started.
+* Programs, documenting: Comments.
+* Range pattern: Ranges.
+* Reading files, `getline' function: Getline.
+* Reading files, general: Reading Files.
+* Reading files, multiple line records: Multiple.
+* Record separator, `RS': Records.
+* Records, multiple line: Multiple.
+* Redirection of output: Redirection.
+* Reference to array: Reference to Elements.
+* Regexp: Regexp.
+* Regular Expressions, Computed: Regexp Usage.
+* Regular Expressions, Dynamic: Regexp Usage.
+* Regular expression matching operators: Regexp Usage.
+* Regular expression, metacharacters: Regexp Operators.
+* Regular expressions as patterns: Regexp.
+* Regular expressions, field separators and: Field Separators.
+* Relational operators: Comparison Patterns.
+* Relational operators: Comparison Ops.
+* Removing elements of arrays: Delete.
+* Rule, definition of: Getting Started.
+* Running gawk programs: Running gawk.
+* Sample input file: The Files.
+* Scanning an array: Scanning an Array.
+* Script, definition of: Getting Started.
+* Scripts, Executable: Executable Scripts.
+* Scripts, Shell: Executable Scripts.
+* Self contained Programs: Executable Scripts.
+* Separator character, choice of: Field Separators.
+* Shell Scripts: Executable Scripts.
+* Single quotes, why they are needed: One-shot.
+* Special variables, user modifiable: User-modified.
+* Standard input: Read Terminal.
+* Standard input: Reading Files.
+* Statements: Statements.
+* Statements: Actions.
+* String constants: Constants.
+* String operators: Concatenation.
+* String value: Constants.
+* String-matching operators: Regexp Usage.
+* Subscripts, multi-dimensional in arrays: Multi-dimensional.
+* Ternary Operator: Conditional Patterns.
+* Use of comments: Comments.
+* User-defined functions: User-defined.
+* User-defined variables: Variables.
+* Uses of `awk': Preface.
+* Using this manual: This Manual.
+* Variables, built-in: Variables.
+* Variables, user-defined: Variables.
+* What is `awk': Preface.
+* When to use `awk': When.
+* file, `awk' program: Long.
+* patterns, range: Ranges.
+* program file: Long.
+* regexp search operators: Regexp Usage.
+* running long programs: Long.
+
+
+ 
+Tag Table:
+Node: Top918
+Node: Preface2804
+Node: History4267
+Node: License5644
+Node: This Manual18989
+Node: The Files20330
+Node: Getting Started22914
+Node: Very Simple24249
+Node: Two Rules26030
+Node: More Complex28066
+Node: Running gawk30908
+Node: One-shot31827
+Node: Read Terminal32945
+Node: Long33862
+Node: Executable Scripts34991
+Node: Command Line36534
+Node: Comments40168
+Node: Statements/Lines41067
+Node: When43498
+Node: Reading Files45420
+Node: Records47119
+Node: Fields49902
+Node: Non-Constant Fields52789
+Node: Changing Fields54591
+Node: Field Separators57302
+Node: Multiple62004
+Node: Assignment Options64393
+Node: Getline65608
+Node: Close Input74958
+Node: Printing76023
+Node: Print76748
+Node: Print Examples78712
+Node: Output Separators80751
+Node: Redirection82417
+Node: Close Output85886
+Node: Printf88132
+Node: Basic Printf88908
+Node: Format-Control90261
+Node: Modifiers91806
+Node: Printf Examples93108
+Node: One-liners95707
+Node: Patterns97642
+Node: Empty100130
+Node: Regexp100402
+Node: Regexp Usage101173
+Node: Regexp Operators102947
+Node: Comparison Patterns107890
+Node: Ranges109336
+Node: BEGIN/END110722
+Node: Boolean113151
+Node: Conditional Patterns115605
+Node: Actions116105
+Node: Expressions117435
+Node: Constants119124
+Node: Variables121097
+Node: Arithmetic Ops122454
+Node: Concatenation123840
+Node: Comparison Ops124569
+Node: Boolean Ops125973
+Node: Assignment Ops128266
+Node: Increment Ops131817
+Node: Conversion134112
+Node: Conditional Exp136066
+Node: Function Calls137384
+Node: Statements139939
+Node: If141253
+Node: While142627
+Node: Do144232
+Node: For145265
+Node: Break148306
+Node: Continue149848
+Node: Next151476
+Node: Exit152985
+Node: Arrays154514
+Node: Array Intro155624
+Node: Reference to Elements159227
+Node: Assigning Elements161115
+Node: Array Example161615
+Node: Scanning an Array163336
+Node: Delete165642
+Node: Multi-dimensional166529
+Node: Multi-scanning169746
+Node: Built-in171303
+Node: Numeric Functions172806
+Node: String Functions176601
+Node: I/O Functions183717
+Node: User-defined185189
+Node: Definition Syntax185834
+Node: Function Example187928
+Node: Function Caveats189034
+Node: Return Statement191386
+Node: Special193612
+Node: User-modified194478
+Node: Auto-set196511
+Node: Sample Program200558
+Node: Notes204316
+Node: Extensions204909
+Node: Future Extensions206490
+Node: Improvements207922
+Node: Manual Improvements210034
+Node: Glossary210928
+Node: Index217934
+
+End Tag Table