1 Overview

The server that is implemented for this project has a simple structure that can be summarized as follows:

In addition, the server should have following implementations:

For a detailed explanation, please reference the demo here .

1.1 Hypertext Transfer Protocol in Nutshell

HTTP stands for Hypertext Transfer Protocol. It is used for transferring most of the files over the Web. This includes text files, PDF documents, images in different formats etc. In principle the objects transferred (called resources) can be anything, including dynamically generated content that is produced on the fly by a program or script.

The HTTP protocol is an example of client-server communication. At the highest level it works as follows:

An HTTP client makes a connection to an HTTP server using the TCP/IP protocol. The client sends a request to the server, together with some other information. For example, a request may be to retrieve a file from a client-specified position within the server’s file space. The server sends back a response. If the request is accepted by the server, this is the command’s output. If there is an error of some sort an error response is sent to the client. The client can optionally send zero or more requests (HTTP protocol version 1.1 only). The server closes the connection. An HTTP message (either request or response) is a text string consisting of a header and a body.

The header has the following pairwise attribute-value format:
INITIAL LINE
HEADER1 : V ALUE1
HEADER2 : V ALUE2

HEADERn : V ALUEn
The initial line consists of 3 words: the method word, a resource, and the protocol being used, which includes the protocol version. It is terminated by a CRLF. A CRLF is a carriage return character (1 byte, ASCII code 13 decimal), followed by a linefeed character (1 byte, ASCII code 10 decimal).

The remainder of the header contains information about the message. Each additional line consists of a word referring to a property, followed by a colon, followed by whitespace, followed by the value for that property. Property values can contain spaces. An example property of a request is the User-Agent property, whose value includes the program name and version of the client making the request. An example of a property of a response is the Content-type property, which describes the type of the content of the server response (e.g., text/plain, image/png, video/mpeg etc). A property that can occur in either the request or response is the Content-length property whose value is the length in bytes of the message body (if any).

An example request is
GET /A/B/C/file.html HTTP/1.1
Host: www.cs.iastate.edu
User-Agent: mywebclient/1.0
[empty line here]

In this example the client request consists of the GET command, which is used to retrieve the resource http://www.cs.iastate.edu/A/B/C/file.html. This corresponds to a file named file.html which resides at path A/B/C within the web server’s file space. The client also specifies that this request follows the version 1.1 HTTP protocol specification. The remaining header lines contain additional information about the request.

The request body is optional and is separated from the header by an empty line (just a CRLF with no text). In the previous example the body is empty.

After the request is received, the server sends back a response. The response follows the same template i.e., consists of a header and a body, with the header consisting an initial line, a number of lines in the same format as the request, and an empty line that acts as a terminator.

An example response is
HTTP/1.1 200 OK
Date: Thu, 05 Apr 2007 05:09:59 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Last-Modified: Thu, 26 Jan 2006 08:44:44 GMT
Content-Length: 5806
Connection: close
Content-Type: text/html; charset=ISO-8859-1
[empty line here]
[body containing a 5806-byte HTML document]

In this example the initial line indicates success. The first word on this line is the HTTP specification that this response will adhere to. The second word is a numerical response code meant to be easily parsable by the client. The remaining words are an informational message explaining the response meant to be human-readable, and may vary from server to server. Depending on their numeric value, response codes can be classified as follows: 1xx indicates an informational message. 2xx indicates a success of some sort. 3xx is a redirection response to another Web address. 4xx indicates an error due to the client. 5xx indicates an error due to the server. The response codes we’ll be using in the project for our server are:

1.2 Socket

To connect to a web server, both the client and the server must create a socket. Sockets are software constructs that allow two processes to communicate, either on the same machine or across the Internet. The system calls that were used for this project are the following.