HTTP protocol: operating principle

HTTP protocol operating principle

The HTTP protocol is the basis of the Web. It is he who is used to manage hypertext links on the Internet. Here are some details on how it works.

HTTP (HyperText Transfer Protocol) has been the most widely used protocol on the Internet since 1990. Version 0.9 was only intended to transfer data over the Internet (especially web pages written in HTML]Version 1.0 of the protocol (the most widely used ) makes it possible to transfer messages with headers describing the content of the message using a MIME type encoding.The purpose of the HTTP protocol is to allow a transfer of files (essentially in HTML format) localized thanks to a character string called URL between a browser (the client) and a web server (called by the way httpd on UNIX machines).

Communication between browser and server

Communication between the browser and the server takes place in two stages:

  • The browser performs a HTTP request.
  • The server processes the request then sends a HTTP response.

In reality, the communication takes more time if we consider the processing of the request by the server. Since we are only interested in the HTTP protocol, the processing on the server side will not be explained in the context of this article… If this subject interests you, refer to the article on the processing of CGIs .

HTTP request

An HTTP request is a set of lines sent to the server by the browser. She understands :

  • A query line: it is a line specifying the type of document requested, the method which must be applied, and the version of the protocol used. The line consists of three elements that must be separated by a space:
    • The method
    • The URL
    • The version of the protocol used by the client (usually HTTP/1.0)
  • Request header fields: this is a set of optional lines used to provide additional information about the request and/or the client (Browser, operating system, …). Each of these lines is composed of a name qualifying the type of header, followed by a colon (:) and the value of the header
  • The body of the request: it is a set of optional lines to be separated from the previous lines by an empty line and allowing for example a sending of data by a POST command when sending data to the server by a form

An HTTP request therefore has the following syntax ( means carriage return or line feed):

METHODE URL VERSION<crlf>     
EN-TETE : Valeur<crlf>     
.    
.    
.    
EN-TETE : Valeur<crlf>     
Ligne vide<crlf>     
CORPS DE LA REQUETE

Here is an example of an HTTP request:

GET [/ https://www.commentcamarche.net/] HTTP/1.0 Accept : text/html    
If-Modified-Since : Saturday, 15-January-2000 14:37:11 GMT    
User-Agent : Mozilla/4.0 (compatible; MSIE 5.0; Windows 95)

Orders

OrderedDescription
GETRequest for the resource located at the specified URL
HEADRequest for the header of the resource located at the specified URL
POSTSending data to the program located at the specified URL
PUTSend data to specified URL
DELETEDeleting the resource located at the specified URL

Headers

Header nameDescription
AcceptType of content accepted by the browser (for example text/html). See MIME types
Accept-CharsetCharacter set expected by the browser
Accept-EncodingData encryption accepted by the browser
Accept-LanguageLanguage expected by the browser (English by default)
AuthorizationIdentification of the browser to the server
Content-EncodingRequest body encoding type
Content-LanguageRequest body language type
Content-LengthRequest body length
Content-TypeRequest body content type (for example text/html). See MIME types
DateData transfer start date
forwardedUsed by intermediate machines between the browser and the server
FromAllows you to specify the customer’s email address
FromAllows you to specify that the document must be sent if it has been modified since a certain date
LinkRelationship between two URLs
Orig-URLOriginal request URL
ReferURL of the link from which the request was made
User AgentString giving information about the client, such as browser name and version, operating system

HTTP Response

An HTTP response is a set of lines sent to the browser by the server. She understands :

  • A status line: this is a line specifying the version of the protocol used and the status of the processing of the request using a code and an explanatory text. The line consists of three elements that must be separated by a space:
    • The version of the protocol used
    • The status code
    • The meaning of the code
  • Response header fields: this is a set of optional lines allowing to give additional information about the response and/or the server. Each of these lines is composed of a name qualifying the type of header, followed by a colon (:) and the value of the header
  • The response body: it contains the requested document

An HTTP response therefore has the following syntax ( means carriage return or line feed):

VERSION-HTTP CODE EXPLICATION<crlf>    
EN-TETE : Valeur<crlf>    
.    
.    
.    
EN-TETE : Valeur<crlf>     
Ligne vide<crlf>     
CORPS DE LA REPONSE

Here is an example of an HTTP response:

HTTP/1.0 200 OK    
Date : Sat, 15 Jan 2000 14:37:12 GMT Server : Microsoft-IIS/2.0 Content-Type : text/HTML    
Content-Length : 1245 Last-Modified : Fri, 14 Jan 2000 08:25:13 GMT

Response headers

Header nameDescription
Content-EncodingResponse body encoding type
Content-LanguageResponse body language type
Content-LengthResponse body length
Content-TypeResponse body content type (e.g. text/html).
DateData transfer start date
ExpiresData consumption deadline
forwardedUsed by intermediate machines between the browser and the server
LeaseRedirection to a new URL associated with the document
serverCharacteristics of the server that sent the response

Response Codes

These are the codes you see when the browser fails to provide you with the requested page. The response code consists of three digits: the first indicates the status class and the following the exact nature of the error.

CodedMessageDescription
10xInformation messageThese codes are not used in version 1.0 of the protocol
20xSuccessThese codes indicate the smooth running of the transaction
200OKThe request was completed successfully
201CREATEDIt follows a POST command, it indicates success, the body of the rest of the document is supposed to indicate the URL at which the newly created document should be located.
202ACCEPTEDThe request has been accepted, but the following procedure has not been completed
203PARTIAL INFORMATIONWhen this code is received in response to a GET command, it indicates that the response is not complete.
204NO RESPONSEThe server received the request but there is no information to return
205RESET CONTENTThe server tells the browser to delete the contents of the fields of a form
206PARTIALLY HAPPYThis is a response to a request with the header tidy. The server must indicate the header content-Range
30xRedirectThese codes indicate that the resource is no longer in the indicated location
301MOVEDThe requested data has been transferred to a new address
302FOUNDThe requested data is at a new URL, but may have moved since…
303METHODThis implies that the client should try a new address, preferably trying a method other than GET
304NOT MODIFIEDIf the client has made a conditional GET command (asking if the document has been modified since the last time) and the document has not been modified, it returns this code.
40xCustomer errorThese codes indicate that the request is incorrect
400BAD REQUESTThe query syntax is poorly worded or impossible to satisfy
401UNAUTHORIZEDThe message parameter gives specifications of acceptable forms of authorization. The client must reformulate his request with the correct authorization data
402PAYMENT REQUIREDThe customer must reformulate his request with the correct payment data
403FORBIDDENAccess to the resource is simply prohibited
404NOT FOUNDClassic! The server did not find anything at the specified address. Left without leaving an address… 🙂
50xServer errorThese codes indicate that there has been an internal server error
500INTERNAL ERRORThe server encountered an unexpected condition that prevented it from fulfilling the request (like what happens to their servers…)
501NOT IMPLEMENTEDThe server does not support the requested service (we can’t know how to do everything…)
502BAD GATEWAYThe server received an invalid response from the server it was trying to access while acting as a gateway or proxy
503SERVICE UNAVAILABLEThe server cannot answer you at the moment because the traffic is too heavy (all the lines of your correspondent are busy, please call back later)
504GATEWAY TIMEOUTThe response from the server was too long compared to the time the gateway was prepared to wait for it (the time allotted to you has now expired…)

For more information on the HTTP protocol, it is best to refer to RFC 1945 explaining the protocol in detail:

  • RFC 1945 – Hypertext Transfer Protocol — HTTP/1.0 (French translation)
  • RFC 1945 – Hypertext Transfer Protocol — HTTP/1.0 (original version)
  • RFC 2616 – Hypertext Transfer Protocol — HTTP/1.1 (original version)
  • Cookies

ccn1