diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2020-12-01 06:30:36 +0200 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2020-12-01 06:30:36 +0200 |
commit | c432356f6e1a544f31f65b7fbbee9e2f061bdb08 (patch) | |
tree | 0826a19a0493e69a41359d82da92ea3b5fb6d83e /doc/gawkinet.texi | |
parent | 2811b2f83a6f230ade3d79978fcb469b3ce1a582 (diff) | |
download | egawk-c432356f6e1a544f31f65b7fbbee9e2f061bdb08.tar.gz egawk-c432356f6e1a544f31f65b7fbbee9e2f061bdb08.tar.bz2 egawk-c432356f6e1a544f31f65b7fbbee9e2f061bdb08.zip |
Lots of small cleanups in gawkinet.texi.
Diffstat (limited to 'doc/gawkinet.texi')
-rw-r--r-- | doc/gawkinet.texi | 223 |
1 files changed, 121 insertions, 102 deletions
diff --git a/doc/gawkinet.texi b/doc/gawkinet.texi index 2bb22d93..f4dd2f6f 100644 --- a/doc/gawkinet.texi +++ b/doc/gawkinet.texi @@ -61,8 +61,8 @@ @c pages, I think this is the right decision. ADR. @set TITLE TCP/IP Internetworking with @command{gawk} -@set EDITION 1.5 -@set UPDATE-MONTH June, 2020 +@set EDITION 1.6 +@set UPDATE-MONTH November, 2020 @c gawk versions: @set VERSION 5.1 @set PATCHLEVEL 0 @@ -453,7 +453,7 @@ web server or email server. It is the @dfn{host} (system) which is @emph{connected to} in a transaction. For this to work though, the server must be expecting connections. Much as there has to be someone at the office building to answer -the phone@footnote{In the days before voice mail systems!}, the +the phone,@footnote{In the days before voice mail systems!} the server process (usually) has to be started first and be waiting for a connection. @@ -485,12 +485,12 @@ In the case of TCP, the synchronicity is enforced by the protocol when sending data. Data writes @dfn{block} until the data have been received on the other end. For both TCP and UDP, data reads block until there is incoming data waiting to be read. This is summarized in the following table, -where an ``X'' indicates that the given action blocks. +where an ``x'' indicates that the given action blocks. @ifnottex @multitable {Protocol} {Reads} {Writes} -@item TCP @tab X @tab X -@item UDP @tab X @tab +@item TCP @tab x @tab x +@item UDP @tab x @tab @end multitable @end ifnottex @tex @@ -513,9 +513,7 @@ UDP&&X&\cr @comment node-name, next, previous, up @chapter Networking With @command{gawk} -@c STARTOFRANGE netgawk @cindex networks @subentry @command{gawk} and -@c STARTOFRANGE gawknet @cindex @command{gawk} @subentry networking The @command{awk} programming language was originally developed as a pattern-matching language for writing short programs to perform @@ -606,11 +604,8 @@ The special files provided in @command{gawk} hide the details from the programmer, making things much simpler and easier to use. @c Who sez we can't toot our own horn occasionally? -@c STARTOFRANGE filenet @cindex filenames, for network access -@c STARTOFRANGE gawnetf @cindex @command{gawk} @subentry networking @subentry filenames -@c STARTOFRANGE netgawf @cindex networks @subentry @command{gawk} and @subentry filenames The special @value{FN} for network access is made up of several fields, all of which are mandatory: @@ -633,10 +628,10 @@ you allow the system to choose. @node Special File Fields, Comparing Protocols, Gawk Special Files, Gawk Special Files @subsection The Fields of the Special @value{FFN} -This @value{SECTION} explains the meaning of all the other fields, +This @value{SECTION} explains the meaning of all of the fields, as well as the range of values and the defaults. All of the fields are mandatory. To let the system pick a value, -or if the field doesn't apply to the protocol, specify it as @samp{0}: +or if the field doesn't apply to the protocol, specify it as @samp{0} (zero): @table @var @cindex network type field @@ -663,7 +658,9 @@ explained later in this @value{SECTION}. Determines which port on the local machine is used to communicate across the network. Application-level clients usually use @samp{0} to indicate they do not care which local port is -used---instead they specify a remote port to connect to. It is vital for +used---instead they specify a remote port to connect to. + +It is vital for application-level servers to use a number different from @samp{0} here because their service has to be available at a specific publicly known port number. It is possible to use a name from @file{/etc/services} here. @@ -672,14 +669,16 @@ port number. It is possible to use a name from @file{/etc/services} here. @cindex hostname field @cindex servers @subentry as hosts Determines which remote host is to -be at the other end of the connection. Application-level servers must fill +be at the other end of the connection. +Application-level clients must enter a name different from @samp{0}. +The name can be either symbolic +(e.g., @samp{jpl-devvax.jpl.nasa.gov}) or numeric (e.g., @samp{128.149.1.143}). + +Application-level servers must fill this field with a @samp{0} to indicate their being open for all other hosts to connect to them and enforce connection level server behavior this way. It is not possible for an application-level server to restrict its availability to one remote host by entering a host name here. -Application-level clients must enter a name different from @samp{0}. -The name can be either symbolic -(e.g., @samp{jpl-devvax.jpl.nasa.gov}) or numeric (e.g., @samp{128.149.1.143}). @item remoteport Determines which port on the remote @@ -687,7 +686,9 @@ machine is used to communicate across the network. For @file{/inet/tcp} and @file{/inet/udp}, application-level clients @emph{must} use a number other than @samp{0} to indicate to which port on the remote machine -they want to connect. Application-level servers must not fill this field with +they want to connect. + +Application-level servers must not fill this field with a @samp{0}. Instead they specify a local port to which clients connect. It is possible to use a name from @file{/etc/services} here. @end table @@ -849,7 +850,8 @@ network facilities to make them easier to understand and use.} UDP cannot guarantee that the datagrams at the receiving end will arrive in exactly the same order they were sent. Some datagrams could be -lost, some doubled, and some out of order. But no overhead is necessary to +lost, some doubled, and some could arrive out of order. +But no overhead is necessary to accomplish this. This unreliable behavior is good enough for tasks such as data acquisition, logging, and even stateless services like the original versions of NFS. @@ -857,11 +859,8 @@ the original versions of NFS. @node TCP Connecting, Troubleshooting, Gawk Special Files, Using Networking @section Establishing a TCP Connection -@c STARTOFRANGE tcpcon @cindex TCP (Transmission Control Protocol) @subentry connection, establishing -@c STARTOFRANGE netcon @cindex networks @subentry @command{gawk} and @subentry connections -@c STARTOFRANGE gawcon @cindex @command{gawk} @subentry networking @subentry connections Let's observe a network connection at work. Type in the following program and watch the output. Within a second, it connects via TCP (@file{/inet/tcp}) @@ -885,7 +884,7 @@ respects: A special file is used as a shell command that pipes its output into @code{getline}. One would rather expect to see the special file being read like any other file (@samp{getline < -"/inet/tcp/0/localhost/daytime")}. +"/inet/tcp/0/localhost/daytime"}). @item @cindex @code{|} (vertical bar), @code{|&} operator (I/O) @@ -931,20 +930,25 @@ we are pedantic and always explicitly close the connections.) @cindex troubleshooting @subentry networks @subentry connections It may well be that for some reason the program shown in the previous example does not run on your machine. When looking at possible reasons for this, you will learn much -about typical problems that arise in network programming. First of all, +about typical problems that arise in network programming. +@ignore +First of all, your implementation of @command{gawk} may not support network access because it is a pre-3.1 version or you do not have a network interface in your machine. Perhaps your machine uses some other protocol, such as -DECnet or Novell's IPX. For the rest of this @value{CHAPTER}, -we will assume -you work on a Unix machine that supports TCP/IP. If the previous example program does -not run on your machine, it may help to replace the name +DECnet or Novell's IPX. +@end ignore + +For the rest of this @value{CHAPTER}, we will assume you work on a POSIX-style +system that supports TCP/IP. If the previous example program does not +run on your machine, it may help to replace the name @samp{localhost} with the name of your machine or its IP address. If it does, you could replace @samp{localhost} with the name of another machine in your vicinity---this way, the program connects to another machine. Now you should see the date and time being printed by the program, otherwise your machine may not support the @samp{daytime} service. + Try changing the service to @samp{chargen} or @samp{ftp}. This way, the program connects to other services that should give you some response. If you are curious, you should have a look at your @file{/etc/services} file. It could @@ -991,6 +995,7 @@ flavor of Microsoft Windows usually do @emph{not} support these services. Nevertheless, it @emph{is} possible to do networking with @command{gawk} on Microsoft Windows.@footnote{Microsoft preferred to ignore the TCP/IP +@c FIXME: What about Windows 7, 8, 10? family of protocols until 1995. Then came the rise of the Netscape browser as a landmark ``killer application.'' Microsoft added TCP/IP support and their own browser to Microsoft Windows 95 at the last minute. They even back-ported @@ -1009,7 +1014,7 @@ well as UDP. @node Interacting, Setting Up, Troubleshooting, Using Networking @section Interacting with a Network Service -The next program makes use of the possibility to really interact with a +The next program begins really interacting with a network service by printing something into the special file. It asks the so-called @command{finger} service if a user of the machine is logged in. When testing this program, try to change @samp{localhost} to @@ -1031,7 +1036,7 @@ BEGIN @{ After telling the service on the machine which user to look for, the program repeatedly reads lines that come as a reply. When no more -lines are coming (because the service has closed the connection), the +lines are available (because the service has closed the connection), the program also closes the connection. Try replacing @code{"@var{name}"} with your login name (or the name of someone else logged in). For a list of all users currently logged in, replace @var{name} with an empty string @@ -1039,10 +1044,12 @@ of all users currently logged in, replace @var{name} with an empty string @cindex Linux @cindex GNU/Linux -The final @code{close()} command could be safely deleted from +The final @code{close()} call could be safely deleted from the above script, because the operating system closes any open connection -by default when a script reaches the end of execution. In order to avoid +by default when a script reaches the end of execution. But, in order to avoid portability problems, it is best to always close connections explicitly. +@c FIXME: This following statement isn't really true; gawk flushes +@c and closes all open files before exiting. With the Linux kernel, for example, proper closing results in flushing of buffers. Letting the close happen by default may result in discarding buffers. @@ -1052,12 +1059,12 @@ When looking at @file{/etc/services} you may have noticed that the example, change @samp{tcp} to @samp{udp}, and change @samp{finger} to @samp{daytime}. After starting the modified program, you see the expected day and time message. -The program then hangs, because it waits for more lines coming from the -service. However, they never come. This behavior is a consequence of the +The program then hangs, because it waits for more lines to come from the +service. However, they never do. This behavior is a consequence of the differences between TCP and UDP. When using UDP, neither party is automatically informed about the other closing the connection. Continuing to experiment this way reveals many other subtle -differences between TCP and UDP. To avoid such trouble, one should always +differences between TCP and UDP. To avoid such trouble, you should always remember the advice Douglas E.@: Comer and David Stevens give in Volume III of their series @cite{Internetworking With TCP} (page 14): @@ -1111,6 +1118,7 @@ to a new file and edit it, changing the name @samp{daytime} to @samp{8888}. Then start the modified client. You should get a reply like this: +@c FIXME: Let's put a newer date here... @example Sat Sep 27 19:08:16 CEST 1997 @end example @@ -1123,7 +1131,7 @@ Both programs explicitly close the connection. @cindex networks @subentry ports @subentry reserved @cindex Unix, network ports and Now we will intentionally make a mistake to see what happens when the name -@samp{8888} (the so-called port) is already used by another service. +@samp{8888} (the port) is already used by another service. Start the server program in both windows. The first one works, but the second one complains that it could not open the connection. Each port on a single @@ -1138,6 +1146,7 @@ than 1024 (@samp{echo} is at port 7) are reserved for @code{root}. On machines running some flavor of Microsoft Windows, there is no restriction that reserves ports 1 to 1024 for a privileged user; hence, you can start an @samp{echo} server there. +@c FIXME: Is this still true? Turning this short server program into something really useful is simple. Imagine a server that first reads a @value{FN} from the client through the @@ -1148,8 +1157,8 @@ could be: @example BEGIN @{ NetService = "/inet/tcp/8888/0/0" - NetService |& getline - CatPipe = ("cat " $1) # sets $0 and the fields + NetService |& getline # sets $0 and the fields + CatPipe = ("cat " $1) while ((CatPipe | getline) > 0) print $0 |& NetService close(NetService) @@ -1177,9 +1186,11 @@ execute arbitrary commands, anyone would be free to do @samp{rm -rf *}. @cindex Post Office Protocol (POP) @cindex Simple Mail Transfer Protocol (SMTP) The distribution of email is usually done by dedicated email servers that -communicate with your machine using special protocols. To receive email, we -will use the Post Office Protocol (POP). Sending can be done with the much -older Simple Mail Transfer Protocol (SMTP). +communicate with your machine using special protocols. +In this @value{SECTION} we show how simple the basic steps are. + +To receive email, we use the Post Office Protocol (POP). Sending can +be done with the much older Simple Mail Transfer Protocol (SMTP). @cindex email When you type in the following program, replace the @var{emailhost} by the @@ -1194,7 +1205,7 @@ shows you the first email the server has in store: BEGIN @{ POPService = "/inet/tcp/0/@var{emailhost}/pop3" RS = ORS = "\r\n" - print "user @var{name}" |& POPService + print "user @var{name}" |& POPService POPService |& getline print "pass @var{password}" |& POPService POPService |& getline @@ -1214,7 +1225,7 @@ BEGIN @{ @cindex @code{RS} variable @subentry POP and @cindex @code{ORS} variable @subentry POP and @cindex POP (Post Office Protocol) -The record separators @code{RS} and @code{ORS} are redefined because the +We redefine the record separators @code{RS} and @code{ORS} because the protocol (POP) requires CR-LF to separate lines. After identifying yourself to the email service, the command @samp{retr 1} instructs the service to send the first of all your email messages in line. If the service @@ -1274,6 +1285,7 @@ HTTP request that existed when the web was created in the early 1990s. HTTP calls this @code{GET} request a ``method,'' which tells the service to transmit a web page (here the home page of the Yahoo! search engine). Version 1.0 added the request methods @code{HEAD} and +@c FIXME: Update this footnote? @code{POST}. The current version of HTTP is 1.1,@footnote{Version 1.0 of HTTP was defined in RFC 1945. HTTP 1.1 was initially specified in RFC 2068. In June 1999, RFC 2068 was made obsolete by RFC 2616, an update @@ -1298,7 +1310,7 @@ but then you get binary data that should be redirected into a file. Another application is calling a CGI (Common Gateway Interface) script on some server. CGI scripts are used when the contents of a web page are not -constant, but generated instantly at the moment you send a request +constant, but generated on demand at the moment you send a request for the page. For example, to get a detailed report about the current quotes of Motorola stock shares, call a CGI script at Yahoo! with the following: @@ -1312,7 +1324,6 @@ You can also request weather reports this way. @node Primitive Service, Interacting Service, Web page, Using Networking @section A Primitive Web Service -@c STARTOFRANGE webser @cindex web service Now we know enough about HTTP to set up a primitive web service that just says @code{"Hello, world"} when someone connects to it with a browser. @@ -1338,8 +1349,8 @@ Send a line to tell the browser how many bytes follow in the body of the message. This was not necessary earlier because both parties knew that the document ended when the connection closed. Nowadays it is possible to stay connected after the transmission of one web page. -This is to avoid the network traffic necessary for repeatedly establishing -TCP connections for requesting several images. Thus, there is the need to tell +This avoids the network traffic necessary for repeatedly establishing +TCP connections for requesting several images. Thus, it is necessary to tell the receiving party how many bytes will be sent. The header is terminated as usual with an empty line. @@ -1403,8 +1414,7 @@ graphical user interface (GUI). Each HTTP event that the user triggers by some action within the browser is received in this central procedure. Parameters and menu choices are extracted from this request, and an appropriate measure is taken according to -the user's choice. -For example: +the user's choice: @cindex HTTP server, core logic @example @@ -1464,7 +1474,7 @@ applies to the port number. These values are inserted later into the HTML content of the web pages to refer to the home system. Each server that is built around this core has to initialize some -application-dependent variables (such as the default home page) in a procedure +application-dependent variables (such as the default home page) in a function @code{SetUpServer()}, which is called immediately before entering the infinite loop of the server. For now, we will write an instance that initiates a trivial interaction. With this home page, the client user @@ -1493,8 +1503,10 @@ initialized, the server can start communicating to a client browser. @cindex RFC 2068 It does so by printing the HTTP header into the network connection (@samp{print @dots{} |& HttpService}). This command blocks execution of -the server script until a client connects. If this server -script is compared with the primitive one we wrote before, you will notice +the server script until a client connects. + +If you compare this server +script with the primitive one we wrote before, you will notice two additional lines in the header. The first instructs the browser to close the connection after each request. The second tells the browser that it should never try to @emph{remember} earlier requests @@ -1604,11 +1616,9 @@ by calling the tool with the @code{system()} function or through a pipe. @quotation @i{HTTP is like being married: you have to be able to handle whatever you're given, while being very careful what you send back.}@* -Phil Smith III,@* -@uref{http://www.netfunny.com/rhf/jokes/99/Mar/http.html} +@author Phil Smith III,@* @uref{http://www.netfunny.com/rhf/jokes/99/Mar/http.html} @end quotation -@c STARTOFRANGE cgilib @cindex CGI (Common Gateway Interface) @subentry library In @ref{Interacting Service, ,A Web Service with Interaction}, we saw the function @code{CGI_setup()} as part of the web server @@ -1620,7 +1630,7 @@ the hexadecimal value: @samp{%26}. These encoded values should be decoded. Following is a simple library to perform these tasks. This code is used for all web server examples -used throughout the rest of this @value{DOCUMENT}. +throughout the rest of this @value{DOCUMENT}. If you want to use it for your own web server, store the source code into a file named @file{inetlib.awk}. Then you can include these functions into your code by placing the following statement @@ -1631,6 +1641,7 @@ into your program @@include inetlib.awk @end example +@c FIXME: Needs revising, now that gawk has @include @noindent But beware, this mechanism is only possible if you invoke your web server script with @command{igawk} @@ -1705,7 +1716,7 @@ BEGIN @{ @} @} -function CGI_setup( method, uri, version, i) +function CGI_setup(method, uri, version, i) @{ delete GETARG delete MENU @@ -1798,6 +1809,7 @@ BEGIN @{ @c endfile @end example +@c FIXME: Rerun to make sure still correct And this is the result when we run it: @c artificial line wrap in last output line @@ -1823,9 +1835,7 @@ p2=stuff%26junk&percent=a %25 sign @node Simple Server, Caveats, Interacting Service, Using Networking @section A Simple Web Server -@c STARTOFRANGE webserx @cindex web servers -@c STARTOFRANGE serweb @cindex servers @subentry web In the preceding @value{SECTION}, we built the core logic for event-driven GUIs. In this @value{SECTION}, we finally extend the core to a real application. @@ -1872,6 +1882,7 @@ This approach can be used to implement other kinds of servers. The only changes needed to do so are hidden in the functions @code{SetUpServer()} and @code{HandleGET()}. Perhaps it might be necessary to implement other HTTP methods. +@c FIXME: @include? The @command{igawk} program that comes with @command{gawk} may be useful for this process. @@ -1883,7 +1894,7 @@ items, etc.). The function @code{HandleGET()} is a nested case selection that decides which page the user wants to see next. Each nesting level refers to a menu -level of the GUI. Each case implements a certain action of the menu. On the +level of the GUI. Each case implements a certain action of the menu. At the deepest level of case selection, the handler essentially knows what the user wants and stores the answer into the variable that holds the HTML page contents: @@ -1923,7 +1934,7 @@ function HandleGET() @{ Now we are down to the heart of ELIZA, so you can see how it works. Initially the user does not say anything; then ELIZA resets its money -counter and asks the user to tell what comes to mind open heartedly. +counter and asks the user to tell what comes to mind open-heartedly. The subsequent answers are converted to uppercase characters and stored for later comparison. ELIZA presents the bill when being confronted with a sentence that contains the phrase ``shut up.'' Otherwise, it looks for @@ -2188,6 +2199,7 @@ function SetUpEliza() @{ @c endfile @end example +@c FIXME: Not sure what this home page is, or if available any more. Needs updating. @cindex Humphrys, Mark @cindex ELIZA program Some interesting remarks and details (including the original source code @@ -2228,7 +2240,7 @@ establishment of a connection that previously ended with a ``broken pipe.'' Those connections have to ``time out'' for a minute or so before they can reopen. Check this with the command @samp{netstat -a}, which -provides a list of still ``active'' connections. +provides a list of still-active connections. @node Challenges, , Caveats, Using Networking @section Where To Go From Here @@ -2387,7 +2399,7 @@ of all the newsgroups, mailing lists and FAQs on the Internet. @chapter Some Applications and Techniques In this @value{CHAPTER}, we look at a number of self-contained scripts, with an emphasis on concise networking. Along the way, we -work towards creating building blocks that encapsulate often needed +work towards creating building blocks that encapsulate often-needed functions of the networking world, show new techniques that broaden the scope of problems that can be solved with @command{gawk}, and explore leading edge technology that may shape the future of networking. @@ -2406,11 +2418,12 @@ accepted standard for GUIs: the web browser. Now, @command{gawk} can rival even Tcl/Tk. @cindex Tcl/Tk @subentry @command{gawk} and -Tcl and @command{gawk} have much in common. Both are simple scripting languages -that allow us to quickly solve problems with short programs. But Tcl has Tk -on top of it, and @command{gawk} had nothing comparable up to now. While Tcl -needs a large and ever-changing library (Tk, which was bound to the X Window -System until recently), @command{gawk} needs just the networking interface +Tcl and @command{gawk} have much in common. Both are simple scripting +languages that allow us to quickly solve problems with short programs. But +Tcl has Tk on top of it, and @command{gawk} had nothing comparable up +to now. While Tcl needs a large and ever-changing library (Tk, which was +originally bound to the X Window System), @command{gawk} needs just the +networking interface and some kind of browser on the client's side. Besides better portability, the most important advantage of this approach (embracing well-established standards such HTTP and HTML) is that @emph{we do not need to change the @@ -2444,11 +2457,11 @@ site is not working. When a web server breaks down, it makes a difference if customers get a strange ``network unreachable'' message, or a short message telling them that the server has a problem. In such an emergency, the hard disk and everything on it (including the regular web service) may -be unavailable. Rebooting the web server off a diskette makes sense in this +be unavailable. Rebooting the web server off a USB drive makes sense in this setting. To use the PANIC program as an emergency web server, all you need are the -@command{gawk} executable and the program below on a diskette. By default, +@command{gawk} executable and the program below on a USB drive. By default, it connects to port 8080. A different value may be supplied on the command line: @@ -2488,7 +2501,7 @@ could analyze the contents and extract the text or the links. An ASCII browser could be written around GETURL. But more interestingly, web robots are straightforward to write on top of GETURL. On the Internet, you can find several programs of the same name that do the same job. They are usually -much more complex internally and at least 10 times longer. +much more complex internally and at least 10 times as big. At first, GETURL checks if it was called with exactly one web address. Then, it checks if the user chose to use a special proxy server whose name @@ -2744,11 +2757,11 @@ BEGIN @{ Another thing that may look strange is the way GETURL is called. Before calling GETURL, we have to check if the proxy variables need to be passed on. If so, we prepare strings that will become part -of the command line later. In @code{GetHeader()}, we store these strings +of the command line later. In @code{GetHeader}, we store these strings together with the longest part of the command line. Later, in the loop -over the URLs, @code{GetHeader()} is appended with the URL and a redirection +over the URLs, @code{GetHeader} is appended with the URL and a redirection operator to form the command that reads the URL's header over the Internet. -GETURL always produces the headers over @file{/dev/stderr}. That is +GETURL always sends the headers to @file{/dev/stderr}. That is the reason why we need the redirection operator to have the header piped in. @@ -2788,8 +2801,8 @@ of links are missing in the regular expression. However, it is straightforward to add them, if doing so is necessary for other tasks. This program reads an HTML file and prints all the HTTP links that it finds. -It relies on @command{gawk}'s ability to use regular expressions as record -separators. With @code{RS} set to a regular expression that matches links, +It relies on @command{gawk}'s ability to use regular expressions as the record +separator. With @code{RS} set to a regular expression that matches links, the second action is executed each time a non-empty link is found. We can find the matching link itself in @code{RT}. @@ -2799,7 +2812,7 @@ This simple program prints shell commands that can be piped into @command{sh} for execution. This way it is possible to first extract the links, wrap shell commands around them, and pipe all the shell commands into a file. After editing the file, execution of the file retrieves -exactly those files that we really need. In case we do not want to edit, +only those files that we really need. In case we do not want to edit, we can retrieve all the pages like this: @smallexample @@ -2889,6 +2902,7 @@ files.@footnote{Due to licensing problems, the default installation of GNUPlot disables the generation of @file{.gif} files. If your installed version does not accept @samp{set term gif}, just download and install the most recent version of GNUPlot and the +@c FIXME: URL doesn't work @uref{http://www.boutell.com/gd/, GD library} by Thomas Boutell. Otherwise you still have the chance to generate some @@ -3057,7 +3071,7 @@ transmit, but rather raw image data to contain in the body. Most of the work is done in the second menu choice. It starts with a strange JavaScript code snippet. When first implementing this server, -we used a short @code{@w{"<IMG SRC="} MyPrefix "/Image>"} here. But then +we used a short @samp{@w{"<IMG SRC="} MyPrefix "/Image>"} here. But then browsers got smarter and tried to improve on speed by requesting the image and the HTML code at the same time. When doing this, the browser tries to build up a connection for the image request while the request for @@ -3122,7 +3136,7 @@ where it can be viewed by the user. It is probably better not to mix up so many different languages. The result is not very readable. Furthermore, the statistical part of the server does not take care of invalid input. -Among others, using negative variances will cause invalid results. +Among others, using negative variances causes invalid results. @node MAZE, MOBAGWHO, STATIST, Some Applications and Techniques @section MAZE: Walking Through a Maze In Virtual Reality @@ -3132,11 +3146,11 @@ Among others, using negative variances will cause invalid results. @quotation @cindex Perlis, Alan @i{In the long run, every program becomes rococo, and then rubble.}@* -Alan Perlis +@author Alan Perlis @end quotation By now, we know how to present arbitrary @samp{Content-type}s to a browser. -In this @value{SECTION}, our server will present a 3D world to our browser. +In this @value{SECTION}, our server presents a 3D world to our browser. The 3D world is described in a scene description language (VRML, Virtual Reality Modeling Language) that allows us to travel through a perspective view of a 2D maze with our browser. Browsers with a @@ -3147,7 +3161,7 @@ VRML. If you have never written any VRML code, have a look at the VRML FAQ. Presenting a static VRML scene is a bit trivial; in order to expose -@command{gawk}'s new capabilities, we will present a dynamically generated +@command{gawk}'s capabilities, we will present a dynamically generated VRML scene. The function @code{SetUpServer()} is very simple because it only sets the default HTML page and initializes the random number generator. As usual, the surrounding server lets you browse the maze. @@ -3282,8 +3296,8 @@ function MakeMaze(x, y) @{ @i{There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious -deficiencies.} @* -C. A. R. Hoare +deficiencies.} +@author C.A.R.@: Hoare @end quotation A @dfn{mobile agent} is a program that can be dispatched from a computer and @@ -3336,9 +3350,7 @@ process with a dedicated protocol specialized for receiving mobile agents. Our agent example abuses a common web server as a migration tool. So, it needs a universal CGI script on the receiving side (the web server). The receiving script is activated with a @code{POST} request when placed into a location like -@file{/httpd/cgi-bin/PostAgent.sh}. Make sure that the server system uses a -version of @command{gawk} that supports network access (Version 3.1 or later; -verify with @samp{gawk --version}). +@file{/httpd/cgi-bin/PostAgent.sh}. @example @c file eg/network/PostAgent.sh @@ -3488,7 +3500,7 @@ arrival at its new home site. One of the serious obstacles in implementing a framework for mobile agents is that it does not suffice to migrate the code. It is also necessary to migrate the state of execution of the agent. In contrast to @cite{Agent Tcl}, this program does not try to migrate the complete set -of variables. The following conventions are used: +of variables. The following conventions apply: @itemize @bullet @item @@ -3528,7 +3540,7 @@ standard output to avoid irritating the server. @end itemize The application-independent framework is now almost complete. What follows -is the @code{END} pattern that is executed when the mobile agent has +is the @code{END} pattern which executes when the mobile agent has finished reading its own code. First, it checks whether it is already running on a remote host or not. In case initialization has not yet taken place, it starts @code{MyInit()}. Otherwise (later, on a remote host), it @@ -3600,9 +3612,10 @@ is time to start the real work by appending the host's name to the result string, and reading line by line who is logged in on this host. A very annoying circumstance is the fact that the elements of @code{MOBVAR} cannot hold the newline character (@code{"\n"}). If they -did, migration of this string did not work because the string didn't +did, migration of this string would not work because the string wouldn't obey the syntax rule for a string in @command{gawk}. @code{SUBSEP} is used as a temporary replacement. + If the list of hosts to visit holds at least one more entry, the agent migrates to that place to go on working there. Otherwise, we replace the @code{SUBSEP}s @@ -3628,7 +3641,7 @@ Many solutions were suggested for this problem, but most of these were largely concerned with the movements of small green pieces of paper, which is odd because it wasn't the small green pieces of paper that were unhappy.} @* -Douglas Adams, @cite{The Hitch Hiker's Guide to the Galaxy} +@author Douglas Adams, @cite{The Hitch Hiker's Guide to the Galaxy} @end quotation @cindex @command{cron} utility @@ -3639,7 +3652,7 @@ Unix system users can write a list of tasks to be done each day, each week, twice a day, or just once. The list is entered into a file named @file{crontab}. For example, to distribute a newsletter on a daily basis this way, use @command{cron} for calling a script each day early -in the morning. +in the morning: @example # run at 8 am on weekdays, distribute the newsletter @@ -3892,7 +3905,7 @@ function Prediction() @{ At this point the hard work has been done: the array @code{predict} contains the predictions for all the ticker symbols. It is up to the -function @code{Report()} to find some nice words to introduce the +function @code{Report()} to find some nice words to present the desired information. @smallexample @@ -3974,8 +3987,11 @@ us about it! It is only for the sake of curiosity, of course. @code{:-)} @cindex BLAST, Basic Local Alignment Search Tool @cindex Hoare, C.A.R. @quotation -@i{Hoare's Law of Large Problems: Inside every large problem is a small - problem struggling to get out.} +@i{Inside every large problem is a small +problem struggling to get out.}@footnote{What C.A.R.@: Hoare +actually said was ``Inside every large program is a +small program struggling to get out.''} +@author With apologies to C.A.R.@: Hoare @end quotation Yahoo's database of stock market data is just one among the many large @@ -3994,7 +4010,9 @@ is a very long chain of four base nucleotides. It is the order of appearance (the sequence) of nucleotides which contains the information about the substance to be produced. Scientists in biotechnology often find a specific fragment, determine the nucleotide sequence, and need -to know where the sequence at hand comes from. This is where the large +to know where the sequence at hand comes from. + +This is where the large databases enter the game. At NCBI, databases store the knowledge about which sequences have ever been found and where they have been found. When the scientist sends his sequence to the BLAST service, the server @@ -4005,6 +4023,7 @@ the scientist. In order to make access simple, NCBI chose to offer their database service through popular Internet protocols. There are four basic ways to use the so-called BLAST services: +@c FIXME: Is all of this still true? @itemize @bullet @item The easiest way to use BLAST is through the web. Users may simply point @@ -4070,7 +4089,7 @@ K --> G T (keto) N --> A G C T (any) @end example Now you know the alphabet of nucleotide sequences. The last two lines -of the following example query show you such a sequence, which is obviously +of the following example query show such a sequence, which is obviously made up only of elements of the alphabet just described. Store this example query into a file named @file{protbase.request}. You are now ready to send it to the server with the demonstration client. @@ -4254,7 +4273,7 @@ book review on the Internet. @item -While Waterman's book can explain to you the algorithms employed internally +While Waterman's book explains the algorithms employed internally in the database search engines, most practitioners prefer to approach the subject differently. The applied side of Computational Biology is called Bioinformatics, and emphasizes the tools available for day-to-day @@ -4266,14 +4285,14 @@ books on Bioinformatics is The sequences @emph{gawk} and @emph{gnuawk} are in widespread use in the genetic material of virtually every earthly living being. Let us take this as a clear indication that the divine creator has intended -@command{gawk} to prevail over other scripting languages such as @command{perl}, -@command{tcl}, or @command{python} which are not even proper sequences. (:-) +@command{gawk} to prevail over other scripting languages such as @samp{perl}, +@samp{tcl}, or @samp{python} which are not even proper sequences. (:-) @end enumerate @node Links, GNU Free Documentation License, Some Applications and Techniques, Top @chapter Related Links -This section lists the URLs for various items discussed in this @value{CHAPTER}. +This section lists the URLs for various items discussed in this @value{DOCUMENT}. They are presented in the order in which they appear. @table @asis |