This chapter provides a (necessarily) brief intoduction to computer networking concepts. For many applications of @command{gawk} to TCP/IP networking, we hope that this is enough. For more advanced tasks, you will need deeper background, and it may be necessary to switch to lower-level programming in C or C++.
There are two real-life models for the way computers send messages to each other over a network. While the analogies are not perfect, they are close enough to convey the major concepts. These two models are the phone system (reliable byte-stream communications), and the postal system (best-effort datagrams).
When you make a phone call, the following steps occur:
The same steps occur in a duplex reliable computer networking connection. There is considerably more overhead in setting up the communications, but once it's done, data moves in both directions, reliably, in sequence.
Suppose you mail three different documents to your office on the other side of the country on two different days. Doing so entails the following.
The important characteristics of datagram communications, like those of the postal system are thus:
The price the user pays for the lower overhead of datagram communications is exactly the lower reliability; it is often necessary for user-level protocols that use datagram communications to add their own reliabilty features on top of the basic communications.
The Internet Protocol Suite (usually referred as just TCP/IP)(1) consists of a number of different protocols at different levels or "layers." For our purposes, three protocols provide the fundamental communications mechanisms. All other defined protocols are referred to as user-level protocols (e.g., HTTP, used later in this book).
All other user-level protocols use either TCP or UDP to do their basic communications. Examples are SMTP (Simple Mail Transfer Protocol), FTP (File Transfer Protocol) and HTTP (HyperText Transfer Protocol).
In the postal system, the address on an envelope indicates a physical location, such as a residence or office building. But there may be more than one person at the location; thus you have to further quantify the recipient by putting a person or company name on the envelope.
In the phone system, one phone number may represent an entire company, in which case you need a person's extension number in order to reach that individual directly. Or, when you call a home, you have to say, "May I please speak to ..." before talking to the person directly.
IP networking provides the concept of addressing. An IP address represents a particular computer, but no more. In order to reach the mail service on a system, or the FTP or WWW service on a system, you have to have some way to further specify which service you want. In the Internet Protocol suite, this is done with port numbers, which represent the services, much like an extension number used with a phone number.
Port numbers are 16-bit integers. Unix and Unix-like systems reserve ports below 1024 for "well known" services, such as SMTP, FTP, and HTTP. Numbers above 1024 may be used by any application, although there is no promise made that a particular port number is always available.
Two terms come up repeatedly when discussing networking: client and server. For now, we'll discuss these terms at the connection level, when first establishing connections between two processes on different systems over a network. (Once the connection is established, the higher level, or application level protocols, such as HTTP or FTP, determine who is the client and who is the server. Often, it turns out that the client and server are the same in both roles.)
The server is the system providing the service, such as the web server or email server. It is the host (system) which is connected to in a transaction. For this to work though, the server must be expecting connections. Much as there has to be someone at the office building to answer the phone(2), the server process (usually) has to be started first and waiting for a connection.
The client is the system requesting the service. It is the system initiating the connection in a transaction. (Just as when you pick up the phone to call an office or store.)
In the TCP/IP framework, each end of a connection is represented by a pair of (address, port) pairs. For the duration of the connection, the ports in use at each end are unique, and cannot be used simultaneously by other processes on the same system. (Only after closing a connection can a new one be built up on the same port. This is contrary to the usual behavior of fully developed web servers which have to avoid situations in which they are not reachable. We have to pay this price in order to enjoy the benefits of a simple communication paradigm in @command{gawk}.)
Furthermore, once the connection is established, communications are synchronous. I.e., each end waits on the other to finish transmitting, before replying. This is much like two people in a phone conversation. While both could talk simultaneously, doing so usually doesn't work too well.
In the case of TCP, the synchronicity is enforced by the protocol when sending data. Data writes block until the data have been received on the other end. For both TCP and UDP, data reads block until there is incoming data waiting to be read. This is summarized in the following table, where an "X" indicates that the given action blocks.
@ifnottex
Go to the first, previous, next, last section, table of contents.