The TCP/IP Guide - Version 3.0 (Contents) ` 1581 _ © 2001-2005 Charles M. Kozierok. All Rights Reserved.
Encoding was a significant issue for MIME, because it was created for the specific purpose
of sending non-text data using the old RFC 822 e-mail message standard. RFC 822
imposes several significant restrictions on the messages it carries, the most important of
which is that data must be encoded using 7-bit ASCII. RFC 822 messages are also limited
to lines of no more than 1000 characters that end in a “CRLF” sequence.
These limitations mean that arbitrary binary files, which have no concept of lines and
consist of bytes which can each contain a value from 0 to 255, cannot be sent using RFC
822 in their native format. In order for MIME to transfer these files, they must be encoded
using a method such as base64, which converts three 8-bit characters to a set of four 6-bit
characters that can be represented in ASCII. When this sort of transformation is done, the
MIME Content-Transfer-Encoding header is included in the message so the recipient can
reverse the encoding to return the data to its normal form.
Now, while this technique works, it is less efficient than sending the data directly in binary,
because base64 encoding increases the size of the message by 33% (three bytes are
encoded using four ASCII characters, each of which takes one byte to transmit). HTTP
messages are transmitted directly between client and server over a TCP connection, and
do not use the RFC 822 standard. Thus, binary data can be sent between HTTP clients and
servers without the need for base64 encoding or other transformation techniques. Since it is
more efficient to send the data unencoded, this may be one reason why HTTP’s developers
decided to not make the protocol strictly MIME compliant.
HTTP's Two-Level Encoding Scheme
This would seem to be an area where HTTP was simpler than MIME—since there is no
need to encode the entity, there is no need for the Content-Transfer-Encoding header, and
we have one less thing to worry about. Ha, nice try! ☺ It is true that HTTP could have
simply been designed so that all entities were just sent one byte at a time with no need to
specify encodings. But the developers of the protocol recognized that this would have made
the protocol inflexible. There are situations where it might be useful to transform or encode
an entity or message for transmission, and then reverse the operation upon receipt.
This effort to make HTTP flexible resulted in a system of representing encodings that is
actually more complicated than MIME’s. The key to understanding it is to recognize that
HTTP/1.1 actually splits MIME’s notion of a “content transfer encoding” into two different
encoding levels:
☯ Content Encoding: This is an encoding that is applied specifically to the entity carried
in an HTTP message, to prepare or package it prior to transmission. Content
encodings are said to be “end-to-end”, because the encoding of the entity is done
once before it sent by the client or server, and only decoded upon receipt by the
ultimate recipient: server or client. When this type of encoding is done, the method is
identified in the special Content-Encoding entity header. A client may also specify what
content encodings it can handle, using the Accept-Encoding header, as we will see in
the topic on content negotiation.