Info file gawk-info, produced by Makeinfo, -*- Text -*- from input file gawk.texinfo. This file documents `awk', a program that you can use to select particular records in a file and perform operations upon them. Copyright (C) 1989 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.  File: gawk-info, Node: Top, Next: Preface, Prev: (dir), Up: (dir) This file documents `awk', a program that you can use to select particular records in a file and perform operations upon them; it contains the following chapters: * Menu: * Preface:: What you can do with `awk'; brief history and acknowledgements. * License:: Your right to copy and distribute `gawk'. * This Manual:: Using this manual. Includes sample input files that you can use. * Getting Started:: A basic introduction to using `awk'. How to run an `awk' program. Command line syntax. * Reading Files:: How to read files and manipulate fields. * Printing:: How to print using `awk'. Describes the `print' and `printf' statements. Also describes redirection of output. * One-liners:: Short, sample `awk' programs. * Patterns:: The various types of patterns explained in detail. * Actions:: The various types of actions are introduced here. Describes expressions and the various operators in detail. Also describes comparison expressions. * Statements:: The various control statements are described in detail. * Arrays:: The description and use of arrays. Also includes array--oriented control statements. * User-defined:: User--defined functions are described in detail. * Built-in:: The built--in functions are summarized here. * Special:: The special variables are summarized here. * Sample Program:: A sample `awk' program with a complete explanation. * Notes:: Something about the implementation of `gawk'. * Glossary:: An explanation of some unfamiliar terms. * Index::  File: gawk-info, Node: Preface, Next: License, Prev: Top, Up: Top Preface ******* If you are like many computer users, you frequently would like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time--consuming inconvenience that may take many lines of code. The job may be easier with `awk'. The `awk' utility interprets a special--purpose programming language that makes it possible to handle simple data--reformatting jobs easily with just a few lines of code. The GNU implementation of `awk' is called `gawk'; it is fully upward compatible with the System V Release 3.1 and later version of `awk'. All properly written `awk' programs should work with `gawk'. So we usually don't distinguish between `gawk' and other `awk' implementations in this manual. This manual teaches you what `awk' does and how you can use `awk' effectively. You should already be familiar with basic, general--purpose, operating system commands such as `ls'. Using `awk' you can: * manage small, personal databases, * generate reports, * validate data, * produce indexes, and perform other document preparation tasks, * even experiment with algorithms that can be adapted later to other computer languages! * Menu: * History:: The history of gawk and awk. Acknowledgements.  File: gawk-info, Node: History, Up: Preface History of `awk' and `gawk' =========================== The name `awk' comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of `awk' was written in 1977. In 1985 a new version made the programming language more powerful, introducing user--defined functions, multiple input streams, and computed regular expressions. The GNU implementation, `gawk', was written in 1986 by Paul Rubin and Jay Fenlason, with advice from Richard Stallman. John Woods contributed parts of the code as well. In 1988, David Trueman, with help from Arnold Robbins, reworked `gawk' for compatibility with the newer `awk'. Many people need to be thanked for their assistance in producing this manual. Jay Fenlason contributed many ideas and sample programs. Richard Mlynarik and Robert Chassell gave helpful comments on drafts of this manual. The paper ``A Supplemental Document for `awk''' by John W. Pierce of the Chemistry Department at UC San Diego, pinpointed several issues relevant both to `awk' implementation and to this manual, that would otherwise have escaped us. Finally, we would like to thank Brian Kernighan of Bell Labs for invaluable assistance during the testing and debugging of `gawk', and for help in clarifying several points about the language.  File: gawk-info, Node: License, Next: This Manual, Prev: Preface, Up: Top GNU GENERAL PUBLIC LICENSE ************************** Version 1, February 1989 Copyright (C) 1989 Free Software Foundation, Inc. 675 Mass Ave, Cambridge, MA 02139, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble ========= The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. The General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 1. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The ``Program'', below, refers to any such program or work, and a ``work based on the Program'' means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as ``you''. 2. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this General Public License and to the absence of any warranty; and give any other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. 3. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph 1 above, provided that you also do the following: * cause the modified files to carry prominent notices stating that you changed the files and the date of any change; and * cause the whole of any work that you distribute or publish, that in whole or in part contains the Program or any part thereof, either with or without modifications, to be licensed at no charge to all third parties under the terms of this General Public License (except that you may choose to grant warranty protection to some or all third parties, at your option). * If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the simplest and most usual way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this General Public License. * You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. 4. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: * accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Paragraphs 1 and 2 above; or, * accompany it with a written offer, valid for at least three years, to give any third party free (except for a nominal charge for the cost of distribution) a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Paragraphs 1 and 2 above; or, * accompany it with the information you received as to where the corresponding source code may be obtained. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form alone.) Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special exception, it need not include source code for modules which are standard libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. 5. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer the Program is void, and will automatically terminate your rights to use the Program under this License. However, parties who have received copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. 6. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. 7. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. 8. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and ``any later version'', you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation. 9. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Programs ======================================================= If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the ``copyright'' line and a pointer to where the full notice is found. ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. Copyright (C) 19YY NAME OF AUTHOR This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a ``copyright disclaimer'' for the program, if necessary. Here a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (a program to direct compilers to make passes at assemblers) written by James Hacker. SIGNATURE OF TY COON, 1 April 1989 Ty Coon, President of Vice That's all there is to it!  File: gawk-info, Node: This Manual, Next: Getting Started, Prev: License, Up: Top Using This Manual ***************** The term `gawk' refers to a program (a version of `awk') developed by the Free Software Foundation, and to the language you use to tell it what to do. When we need to be careful, we call the program ``the `awk' utility'' and the language ``the `awk' language''. The purpose of this manual is to explain the `awk' language and how to run the `awk' utility. The term "`awk' program" refers to a program written by you in the `awk' programming language. *Note Getting Started::, for the bare essentials you need to know to start using `awk'. Useful ``one--liners'' are included to give you a feel for the `awk' language (*note One-liners::.). A sizable sample `awk' program has been provided for you (*note Sample Program::.). If you find terms that you aren't familiar with, try looking them up in the glossary (*note Glossary::.). Most of the time complete `awk' programs are used as examples, but in some of the more advanced sections, only the part of the `awk' program that illustrates the concept being described is shown. * Menu: This chapter contains the following sections: * The Files:: Sample data files for use in the `awk' programs illustrated in this manual.  File: gawk-info, Node: The Files, Up: This Manual Input Files for the Examples ============================ This manual contains many sample programs. The data for many of those programs comes from two files. The first file, called `BBS-list', represents a list of computer bulletin board systems and information about those systems. Each line of this file is one "record". Each record contains the name of a computer bulletin board, its phone number, the board's baud rate, and a code for the number of hours it is operational. An `A' in the last column means the board operates 24 hours all week. A `B' in the last column means the board operates evening and weekend hours, only. A `C' means the board operates only on weekends. aardvark 555-5553 1200/300 B alpo-net 555-3412 2400/1200/300 A barfly 555-7685 1200/300 A bites 555-1675 2400/1200/300 A camelot 555-0542 300 C core 555-2912 1200/300 C fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sdace 555-3430 2400/1200/300 A sabafoo 555-2127 1200/300 C The second data file, called `inventory-shipped', represents information about shipments during the year. Each line of this file is also one record. Each record contains the month of the year, the number of green crates shipped, the number of red boxes shipped, the number of orange bags shipped, and the number of blue packages shipped, respectively. Jan 13 25 15 115 Feb 15 32 24 226 Mar 15 24 34 228 Apr 31 52 63 420 May 16 34 29 208 Jun 31 42 75 492 Jul 24 34 67 436 Aug 15 34 47 316 Sep 13 55 37 277 Oct 29 54 68 525 Nov 20 87 82 577 Dec 17 35 61 401 Jan 21 36 64 620 Feb 26 58 80 652 Mar 24 75 70 495 Apr 21 70 74 514 If you are reading this in GNU Emacs using Info, you can copy the regions of text showing these sample files into your own test files. This way you can try out the examples shown in the remainder of this document. You do this by using the command `M-x write-region' to copy text from the Info file into a file for use with `awk' (see your ``GNU Emacs Manual'' for more information). Using this information, create your own `BBS-list' and `inventory-shipped' files, and practice what you learn in this manual.  File: gawk-info, Node: Getting Started, Next: Reading Files, Prev: This Manual, Up: Top Getting Started With `awk' ************************** The basic function of `awk' is to search files for lines (or other units of text) that contain certain patterns. When a line matching any of those patterns is found, `awk' performs specified actions on that line. Then `awk' keeps processing input lines until the end of the file is reached. An `awk' "program" or "script" consists of a series of "rules". (They may also contain "function definitions", but that is an advanced feature, so let's ignore it for now. *Note User-defined::.) A rule contains a "pattern", an "action", or both. Actions are enclosed in curly braces to distinguish them from patterns. Therefore, an `awk' program is a sequence of rules in the form: PATTERN { ACTION } PATTERN { ACTION } ... * Menu: * Very Simple:: A very simple example. * Two Rules:: A less simple one--line example with two rules. * More Complex:: A more complex example. * Running gawk:: How to run gawk programs; includes command line syntax. * Comments:: Adding documentation to gawk programs. * Statements/Lines:: Subdividing or combining statements into lines. * When:: When to use gawk and when to use other things.  File: gawk-info, Node: Very Simple, Next: Two Rules, Up: Getting Started A Very Simple Example ===================== The following command runs a simple `awk' program that searches the input file `BBS-list' for the string of characters: `foo'. (A string of characters is usually called, quite simply, a "string".) awk '/foo/ { print $0 }' BBS-list When lines containing `foo' are found, they are printed, because `print $0' means print the current line. (Just `print' by itself also means the same thing, so we could have written that instead.) You will notice that slashes, `/', surround the string `foo' in the actual `awk' program. The slashes indicate that `foo' is a pattern to search for. This type of pattern is called a "regular expression", and is covered in more detail later (*note Regexp::.). There are single quotes around the `awk' program so that the shell won't interpret any of it as special shell characters. Here is what this program prints: fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sabafoo 555-2127 1200/300 C In an `awk' rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for *every* input line. If the action is omitted, the default action is to print all lines that match the pattern. We could leave out the action (the print statement and the curly braces) in the above example, and the result would be the same: all lines matching the pattern `foo' would be printed. (By comparison, omitting the print statement but retaining the curly braces makes an empty action that does nothing; then no lines would be printed.)  File: gawk-info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started An Example with Two Rules ========================= The `awk' utility reads the input files one line at a time. For each line, `awk' tries the patterns of all the rules. If several patterns match then several actions are run, in the order in which they appear in the `awk' program. If no patterns match, then no actions are run. After processing all the rules (perhaps none) that match the line, `awk' reads the next line (however, *note Next::.). This continues until the end of the file is reached. For example, the `awk' program: /12/ { print $0 } /21/ { print $0 } contains two rules. The first rule has the string `12' as the pattern and `print $0' as the action. The second rule has the string `21' as the pattern and also has `print $0' as the action. Each rule's action is enclosed in its own pair of braces. This `awk' program prints every line that contains the string `12' *or* the string `21'. If a line contains both strings, it is printed twice, once by each rule. If we run this program on our two sample data files, `BBS-list' and `inventory-shipped', as shown here: awk '/12/ { print $0 } /21/ { print $0 }' BBS-list inventory-shipped we get the following output: aardvark 555-5553 1200/300 B alpo-net 555-3412 2400/1200/300 A barfly 555-7685 1200/300 A bites 555-1675 2400/1200/300 A core 555-2912 1200/300 C fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sdace 555-3430 2400/1200/300 A sabafoo 555-2127 1200/300 C sabafoo 555-2127 1200/300 C Jan 21 36 64 620 Apr 21 70 74 514 Note how the line in `BBS-list' beginning with `sabafoo' was printed twice, once for each rule.  File: gawk-info, Node: More Complex, Next: Running gawk, Prev: Two Rules, Up: Getting Started A More Complex Example ====================== Here is an example to give you an idea of what typical `awk' programs do. This example shows how `awk' can be used to summarize, select, and rearrange the output of another utility. It uses features that haven't been covered yet, so don't worry if you don't understand all the details. ls -l | awk '$5 == "Nov" { sum += $4 } END { print sum }' This command prints the total number of bytes in all the files in the current directory that were last modified in November (of any year). (In the C shell you would need to type a semicolon and then a backslash at the end of the first line; in the Bourne shell you can type the example as shown.) The `ls -l' part of this example is a command that gives you a full listing of all the files in a directory, including file size and date. Its output looks like this: -rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile -rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h -rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h -rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y -rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c -rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c -rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c -rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c The first field contains read--write permissions, the second field contains the number of links to the file, and the third field identifies the owner of the file. The fourth field contains the size of the file in bytes. The fifth, sixth, and seventh fields contain the month, day, and time, respectively, that the file was last modified. Finally, the eighth field contains the name of the file. The `$5 == "Nov"' in our `awk' program is an expression that tests whether the fifth field of the output from `ls -l' matches the string `Nov'. Each time a line has the string `Nov' in its fifth field, the action `{ sum += $4 }' is performed. This adds the fourth field (the file size) to the variable `sum'. As a result, when `awk' has finished reading all the input lines, `sum' will be the sum of the sizes of files whose lines matched the pattern. After the last line of output from `ls' has been processed, the `END' pattern is executed, and the value of `sum' is printed. In this example, the value of `sum' would be 80600. These more advanced `awk' techniques are covered in later sections (*note Actions::.). Before you can move on to more advanced `awk' programming, you have to know how `awk' interprets your input and displays your output. By manipulating "fields" and using special "print" statements, you can produce some very useful and spectacular looking reports.  File: gawk-info, Node: Running gawk, Next: Comments, Prev: More Complex, Up: Getting Started How to Run `awk' Programs ========================= There are several ways to run an `awk' program. If the program is short, it is easiest to include it in the command that runs `awk', like this: awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ... where PROGRAM consists of a series of PATTERNS and ACTIONS, as described earlier. When the program is long, you would probably prefer to put it in a file and run it with a command like this: awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ... * Menu: * One-shot:: Running a short throw--away `awk' program. * Read Terminal:: Using no input files (input from terminal instead). * Long:: Putting permanent `awk' programs in files. * Executable Scripts:: Making self--contained `awk' programs. * Command Line:: How the `awk' command line is laid out.  File: gawk-info, Node: One-shot, Next: Read Terminal, Up: Running gawk One--shot Throw--away `awk' Programs ------------------------------------ Once you are familiar with `awk', you will often type simple programs at the moment you want to use them. Then you can write the program as the first argument of the `awk' command, like this: awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ... where PROGRAM consists of a series of PATTERNS and ACTIONS, as described earlier. This command format tells the shell to start `awk' and use the PROGRAM to process records in the input file(s). There are single quotes around the PROGRAM so that the shell doesn't interpret any `awk' characters as special shell characters. They cause the shell to treat all of PROGRAM as a single argument for `awk'. They also allow PROGRAM to be more than one line long. This format is also useful for running short or medium--sized `awk' programs from shell scripts, because it avoids the need for a separate file for the `awk' program. A self--contained shell script is more reliable since there are no other files to misplace.  File: gawk-info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk Running `awk' without Input Files --------------------------------- You can also use `awk' without any input files. If you type the command line: awk 'PROGRAM' then `awk' applies the PROGRAM to the "standard input", which usually means whatever you type on the terminal. This continues until you indicate end--of--file by typing `Control-d'. For example, if you type: awk '/th/' whatever you type next will be taken as data for that `awk' program. If you go on to type the following data, Kathy Ben Tom Beth Seth Karen Thomas `Control-d' then `awk' will print Kathy Beth Seth as matching the pattern `th'. Notice that it did not recognize `Thomas' as matching the pattern. The `awk' language is "case sensitive", and matches patterns *exactly*.  File: gawk-info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk Running Long Programs --------------------- Sometimes your `awk' programs can be very long. In this case it is more convenient to put the program into a separate file. To tell `awk' to use that file for its program, you type: awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ... The `-f' tells the `awk' utility to get the `awk' program from the file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For example, you could put the program: /th/ into the file `th-prog'. Then the command: awk -f th-prog does the same thing as this one: awk '/th/' which was explained earlier (*note Read Terminal::.). Note that you don't usually need single quotes around the file name that you specify with `-f', because most file names don't contain any of the shell's special characters. If you want to identify your `awk' program files clearly as such, you can add the extension `.awk' to the filename. This doesn't affect the execution of the `awk' program, but it does make ``housekeeping'' easier.  File: gawk-info, Node: Executable Scripts, Next: Command Line, Prev: Long, Up: Running gawk Executable `awk' Programs ------------------------- (The following section assumes that you are already somewhat familiar with `awk'.) Once you have learned `awk', you may want to write self--contained `awk' scripts, using the `#!' script mechanism. You can do this on BSD Unix systems and GNU. For example, you could create a text file named `hello', containing the following (where `BEGIN' is a feature we have not yet discussed): #! /bin/awk -f # a sample awk program BEGIN { print "hello, world" } After making this file executable (with the `chmod' command), you can simply type: hello at the shell, and the system will arrange to run `awk' as if you had typed: awk -f hello Self--contained `awk' scripts are particularly useful for putting `awk' programs into production on your system, without your users having to know that they are actually using an `awk' program. If your system does not support the `#!' mechanism, you can get a similar effect using a regular shell script. It would look something like this: : a sample awk program awk 'PROGRAM' "$@" Using this technique, it is *vital* to enclose the PROGRAM in single quotes to protect it from interpretation by the shell. If you omit the quotes, only a shell wizard can predict the result. The `"$@"' causes the shell to forward all the command line arguments to the `awk' program, without interpretation.  File: gawk-info, Node: Command Line, Prev: Executable Scripts, Up: Running gawk Details of the `awk' Command Line --------------------------------- (The following section assumes that you are already familiar with `awk'.) There are two ways to run `awk'. Here are templates for both of them; items enclosed in `[' and `]' in these templates are optional. awk [ -FFS ] [ -- ] 'PROGRAM' FILE ... awk [ -FFS ] -f SOURCE-FILE [ -f SOURCE-FILE ... ] [ -- ] FILE ... Options begin with a minus sign, and consist of a single character. The options and their meanings are as follows: `-FFS' This sets the `FS' variable to FS (*note Special::.). As a special case, if FS is `t', then `FS' will be set to the tab character (`"\t"'). `-f SOURCE-FILE' Indicates that the `awk' program is to be found in SOURCE-FILE instead of in the first non--option argument. `--' This signals the end of the command line options. If you wish to specify an input file named `-f', you can precede it with the `--' argument to prevent the `-f' from being interpreted as an option. This handling of `--' follows the POSIX argument parsing conventions. Any other options will be flagged as invalid with a warning message, but are otherwise ignored. If the `-f' option is *not* used, then the first non--option command line argument is expected to be the program text. The `-f' option may be used more than once on the command line. `awk' will read its program source from all of the named files, as if they had been concatenated together into one big file. This is useful for creating libraries of `awk' functions. Useful functions can be written once, and then retrieved from a standard place, instead of having to be included into each individual program. You can still type in a program at the terminal and use library functions, by specifying `/dev/tty' as one of the arguments to a `-f'. Type your program, and end it with the keyboard end--of--file character `Control-d'. Any additional arguments on the command line are made available to your `awk' program in the `ARGV' array (*note Special::.). These arguments are normally treated as input files to be processed in the order specified. However, an argument that has the form VAR`='VALUE, means to assign the value VALUE to the variable VAR--it does not specify a file at all. Command line options and the program text (if present) are omitted from the `ARGV' array. All other arguments, including variable assignments, are included (*note Special::.). The distinction between file name arguments and variable--assignment arguments is made when `awk' is about to open the next input file. At that point in execution, it checks the ``file name'' to see whether it is really a variable assignment; if so, instead of trying to read a file it will, *at that point in the execution*, assign the variable. Therefore, the variables actually receive the specified values after all previously specified files have been read. In particular, the values of variables assigned in this fashion are *not* available inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run before `awk' begins scanning the argument list. The variable assignment feature is most useful for assigning to variables such as `RS', `OFS', and `ORS', which control input and output formats, before listing the data files. It is also useful for controlling state if multiple passes are needed over a data file. For example: awk 'pass == 1 { PASS 1 STUFF } pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile  File: gawk-info, Node: Comments, Next: Statements/Lines, Prev: Running gawk, Up: Getting Started Comments in `awk' Programs ========================== When you write a complicated `awk' program, you can put "comments" in the program file to help you remember what the program does, and how it works. A comment starts with the the sharp sign character, `#', and continues to the end of the line. The `awk' language ignores the rest of a line following a sharp sign. For example, we could have put the following into `th-prog': # This program finds records containing the pattern `th'. This is how # you continue comments on additional lines. /th/ You can put comment lines into keyboard--composed throw--away `awk' programs also, but this usually isn't very useful; the purpose of a comment is to help yourself or another person understand the program at another time.  File: gawk-info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started `awk' Statements versus Lines ============================= Most often, each line in an `awk' program is a separate statement or separate rule, like this: awk '/12/ { print $0 } /21/ { print $0 }' BBS-list inventory-shipped But sometimes statements can be more than one line, and lines can contain several statements. You can split a statement into multiple lines by inserting a newline after any of the following: , { ? : || && Lines ending in `do' or `else' automatically have their statements continued on the following line(s). A newline at any other point ends the statement. If you would like to split a single statement into two lines at a point where a newline would terminate it, you can "continue" it by ending the first line with a backslash character, `\'. This is allowed absolutely anywhere in the statement, even in the middle of a string or regular expression. For example: awk '/This program is too long, so continue it\ on the next line/ { print $1 }' We have generally not used backslash continuation in the sample programs in this manual. Since there is no limit on the length of a line, it is never strictly necessary; it just makes programs prettier. We have preferred to make them even more pretty by keeping the statements short. Backslash continuation is most useful when your `awk' program is in a separate source file, instead of typed in on the command line. *Warning: this does not work if you are using the C shell.* Continuation with backslash works for `awk' programs in files, and also for one--shot programs *provided* you are using the Bourne shell, the Korn shell, or the Bourne--again shell. But the C shell used on Berkeley Unix behaves differently! There, you must use two backslashes in a row, followed by a newline. When `awk' statements within one rule are short, you might want to put more than one of them on a line. You do this by separating the statements with semicolons, `;'. This also applies to the rules themselves. Thus, the above example program could have been written: /12/ { print $0 } ; /21/ { print $0 } *Note:* It is a new requirement that rules on the same line require semicolons as a separator in the `awk' language; it was done for consistency with the statements in the action part of rules.  File: gawk-info, Node: When, Prev: Statements/Lines, Up: Getting Started When to Use `awk' ================= What use is all of this to me, you might ask? Using additional operating system utilities, more advanced patterns, field separators, arithmetic statements, and other selection criteria, you can produce much more complex output. The `awk' language is very useful for producing reports from large amounts of raw data, like summarizing information from the output of standard operating system programs such as `ls'. (*Note A More Complex Example: More Complex.) Programs written with `awk' are usually much smaller than they would be in other languages. This makes `awk' programs easy to compose and use. Often `awk' programs can be quickly composed at your terminal, used once, and thrown away. Since `awk' programs are interpreted, you can avoid the usually lengthy edit--compile--test--debug cycle of software development. Complex programs have been written in `awk', including a complete retargetable assembler for 8--bit microprocessors (*note Glossary::. for more information) and a microcode assembler for a special purpose Prolog computer. However, `awk''s capabilities are strained by tasks of such complexity. If you find yourself writing `awk' scripts of more than, say, a few hundred lines, you might consider using a different programming language. Emacs Lisp is a good choice if you need sophisticated string or pattern matching capabilities. The shell is also good at string and pattern matching; in addition it allows powerful use of the standard utilities. More conventional languages like C, C++, or Lisp offer better facilities for system programming and for managing the complexity of large programs. Programs in these languages may require more lines of source code than the equivalent `awk' programs, but they will be easier to maintain and usually run more efficiently.  File: gawk-info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top Reading Files (Input) ********************* In the typical `awk' program, all input is read either from the standard input (usually the keyboard) or from files whose names you specify on the `awk' command line. If you specify input files, `awk' reads data from the first one until it reaches the end; then it reads the second file until it reaches the end, and so on. The name of the current input file can be found in the special variable `FILENAME' (*note Special::.). The input is split automatically into "records", and processed by the rules one record at a time. (Records are the units of text mentioned in the introduction; by default, a record is a line of text.) Each record read is split automatically into "fields", to make it more convenient for a rule to work on parts of the record under consideration. On rare occasions you will need to use the `getline' command, which can do explicit input from any number of files. * Menu: * Records:: Controlling how data is split into records. * Fields:: An introduction to fields. * Field Separators:: The field separator and how to change it. * Multiple:: Reading multi--line records. * Assignment Options:: Setting variables on the command line and a summary of command line syntax. This is an advanced method of input. * Getline:: Reading files under explicit program control using the `getline' function. * Close Input:: Closing an input file (so you can read from the beginning once more).  File: gawk-info, Node: Records, Next: Fields, Up: Reading Files How Input is Split into Records =============================== The `awk' language divides its input into records and fields. Records are separated from each other by the "record separator". By default, the record separator is the "newline" character. Therefore, normally, a record is a line of text. Sometimes you may want to use a different character to separate your records. You can use different characters by changing the special variable `RS'. The value of `RS' is a string that says how to separate records; the default value is `"\n"', the string of just a newline character. This is why lines of text are the default record. Although `RS' can have any string as its value, only the first character of the string will be used as the record separator. The other characters are ignored. `RS' is exceptional in this regard; `awk' uses the full value of all its other special variables. The value of `RS' is changed by "assigning" it a new value (*note Assignment Ops::.). One way to do this is at the beginning of your `awk' program, before any input has been processed, using the special `BEGIN' pattern (*note BEGIN/END::.). This way, `RS' is changed to its new value before any input is read. The new value of `RS' is enclosed in quotation marks. For example: awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list changes the value of `RS' to `/', the slash character, before reading any input. Records are now separated by a slash. The second rule in the `awk' program (the action with no pattern) will proceed to print each record. Since each `print' statement adds a newline at the end of its output, the effect of this `awk' program is to copy the input with each slash changed to a newline. Another way to change the record separator is on the command line, using the variable--assignment feature (*note Command Line::.). awk '...' RS="/" SOURCE-FILE `RS' will be set to `/' before processing SOURCE-FILE. The empty string (a string of no characters) has a special meaning as the value of `RS': it means that records are separated only by blank lines. *Note Multiple::, for more details. The `awk' utility keeps track of the number of records that have been read so far from the current input file. This value is stored in a special variable called `FNR'. It is reset to zero when a new file is started. Another variable, `NR', is the total number of input records read so far from all files. It starts at zero but is never automatically reset to zero. If you change the value of `RS' in the middle of an `awk' run, the new value is used to delimit subsequent records, but the record currently being processed (and records already finished) are not affected.