Overview of MIME Messages
[previous]
[next]
[table of contents] [index]
This section starts by explaining the overall purposes of MIME.
Next it shows a simple MIME message and introduces
the basic parts of a MIME message.
With MIME, you can:
Let's look at a MIME message as it would arrive in your
MH inbox.
The message in the next Example
was encoded by the sender; this
is how it is actually stored on disk by the recipient's MTA (mail
transfer agent), waiting for someone to read it.
This is also how the message will look to a person who does
not have a MIME-capable MUA (mail user agent):
Example: Encoded MIME message
From: Jerry Peek <jerry@ora.com>
To: carlos@entelfam.cl
Subject: Un d'ia dif'icil
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Carlos, estoy en la casa de mi amigo. Pero, =A1qu=E9 d=EDa dif=EDcil!
Tom=E9 un taxi entre al aeropuerto y el hotel. "=A1Tenga cuidado!",
me dijo el chofer. "=A1Esta parte de la ciudad es peligrosa!" Fue
evidente para m=ED. Yo o=EDa que la ciudad tiene partes malas, y esa
parte pareci=F3 as=ED. Los edificios ten=EDan barrotes sobre sus
ventanas, hubo gente sospechosa en la calle, y todas las puertas estaban
cerrada con llave. "=A1Vaya con di=F3s!" =C9l se fue.
Before we dig into that message, here's one thing you should be aware of.
RFC 1521, the MIME specification that MH supports, only tells how to
transfer non-ASCII text and graphics in the message body.
Although RFC 1521 adds fields like MIME-Version: to a message
header -- it doesn't tell how to put non-ASCII text in a message
header.
So, in my message Subject:, I had to write the word día
as d'ia.
(In the body, that word would have been encoded d=EDa.)
RFC 1522 -- which MH doesn't support (at least, not yet) -- is a standard
for non-ASCII text in the header.
The header field MIME-Version: 1.0 in
the previous Example
tells that the message is in MIME format.
It's a signal to mail programs that the
message meets the requirements of RFC 1521.
(Unfortunately, a few mail programs add that header field to messages
that aren't in MIME format.)
Here's an overview of the other header fields that a MIME message
may have.
I'll toss in some MIME philosophy along the way.
We can't cover everything in the MIME spec; it's close to 100 pages long!
Other sections of this chapter, and sections of later Chapters,
have more MIME information.
The Section More About MIME
explains how to find all the gory details.
-
Content-type: tells you what kind of data is in the message.
The content can be text, image (still pictures), video,
and audio.
Another content type is message, which means that the content is
structured in the standard RFC 822 format; this can be used for
forwarding messages.
The application content type is designed to be sent to an external
program -- for example, text for a PostScript printer or viewer.
Finally, a message can be multipart, with several separate body parts.
It's possible for the parts to be of different types (the Section on
Multipart Messages
explains the MIME syntax for multipart messages).
For example, a message could start with a text part to describe what's
in the message, then an audio part with a message from a photographer,
followed by five of the photographer's pictures (in five image parts).
It's also possible for each of the multipart body parts to be another
complete multipart message -- with its own parts.
That is, MIME messages can be recursive.
A Content-type: must have a subtype.
The type and subtype names have a slash (/) between them.
For instance, image/gif is a picture in the GIF format; the type is
image and the subtype is gif.
The MIME Reference Guide lists many of the
common content types and subtypes.
Finally, a Content-type: can have optional parameters at the end,
starting with a semicolon (;).
For example, the charset= parameter in
Content-type: text/plain; charset=iso-8859-1
says that the message body uses the ISO-8859-1 character set.
(The default charset is us-ascii.)
The default Content-type: is text/plain.
-
The Content-transfer-encoding: field tells how the message
data was encoded for transfer.
Encoding gets the message safely from the person who sent it, across mail
transfer agents and gateways, into your mailbox.
The Section MIME Encoding
lists the values that can go in the
Content-transfer-encoding: field.
-
The Content-ID: and Content-Description: header fields
help to describe what's in the message body.
These two fields are most useful for the parts of a multipart message.
The Content-ID: is a unique string, similar to the RFC 822 field
Message-ID:, that no other message in the world is likely to have.
A typical Content-ID: value is <1283.780402430.1@ora.com>.
Each part of a multipart message has its own Content-ID:.
Its main use is identifying
external parts
and cached body parts.
The Content-Description: field describes the content in words, like
Report on Zeta Meeting.
The RFC 822 Subject: field describes the whole message.
You can add a Content-Description: in the message header,
but it's more useful for describing a part of a multipart message.
The list of content subtypes changes a lot.
Quite a few subtypes are for non-UNIX computers.
This book doesn't cover every content subtype; examples use some
of the most common subtypes from RFC 1521.
Your MH setup probably doesn't need to support all subtypes:
MIME messages are designed to be readable by all existing
RFC 822-compatible mail programs.
(Although, of course, MUAs that don't understand MIME won't be
able to interpret the MIME-specific parts.)
The messages may be sent through all kinds of networks and gateways.
So, MIME encodes messages that have non-ASCII parts.
The Content-transfer-encoding:
field tells the recipient the way a message
was encoded -- and how to decode it.
(Encoding usually isn't required for plain ASCII text.)
For instance,
the previous Example message,
from me to a friend in Chile, is in Spanish (more or less :-)).
Spanish uses characters like
¡ and ñ that aren't part of the ASCII character set.
My MIME-capable MUA could encode the
message's non-ASCII characters to pass safely through an
ASCII-only mail transfer system.
For instance, the character ¡
was encoded as the three-character sequence =A1.
When the message gets to Chile, my friend's MIME-capable
mail reader will translate the message's encoded characters to the correct
Spanish characters.
One important feature of MIME's encodings is that they are designed to
leave as much of the message in plain ASCII text as possible.
In general, MIME only translates the characters that some email transfer
systems would munge.
So, if the recipient doesn't have a MIME-capable MUA,
the encoded text in the message will probably still make some sense.
(It's possible to encode text messages so that people can't read
them without decoding.
But, unless you want to hide words or have another reason, one of the
less-severe encodings will probably do the job.
In general, MH chooses human-readable encoding for text messages.)
Of course, when MIME encodes a binary file (like a digitized picture)
that people can't read in the first place, the encoded data won't be
any easier for a person to read.
MIME encoding is designed to get the data safely through almost every
known mail transfer system and gateway.
One of the major wins in MIME is that it was designed to work everywhere,
including "broken" and "brain-damaged" systems.
Instead of trying to impose a new standard on mail transfer systems, MIME
works with existing systems -- and adapts to their eccentricities.
Although you don't need to understand how encoding works to use MIME,
you should have a general idea of the types of encoding.
So, if you'd like to skip the technical details in the following
section, please do skim it and learn the types of encoding.
There are five encodings:
-
7bit is the default.
It means that the message contents are plain ASCII text.
Lines must be "short" (1000 characters or less) and end with
CR-LF (carriage return plus linefeed).
-
quoted-printable
is used for text that is mostly 7-bit but which has a
small percentage of 8-bit characters.
(There's quoted-printable text in
the next Example.)
For instance, characters with the eighth bit on in the ISO-8859-n
sets should be encoded as quoted-printable.
Each 8-bit character is encoded into three 7-bit characters: =
(an equal sign) and the hexadecimal value of the character.
So the ISO-8859-1 character ñ,
which has the hex value F1 (that's 11110001 binary),
would be encoded as =F1.
To keep the message readable on non-MIME readers, characters that don't have
the eighth bit set generally shouldn't be encoded.
The = character itself must be encoded, though; it's encoded as
=3D.
Also, space and tab characters at the ends of lines must be encoded
(as =20 and =09, respectively); this keeps broken gateways
from eating them.
If a line ends with = followed by CR-LF, those characters are
ignored; this lets you continue ("wrap") a long line.
Lines must be no more than 76 characters long, not counting the final
CR-LF.
Longer lines will be broken when the message is encoded and joined
again by decoding.
Quoted-printable text was designed to be (mostly) readable by people
with non-MIME mail programs.
-
base64
is used for data and other text that was never meant to be
read by humans -- or must be preserved verbatim.
Every 3 octets (24 bits) are encoded into a 4-character sequence.
The 64-character set was chosen carefully.
It comes from ASCII characters that aren't munged by known gateways or
transfer systems.
-
8bit
is data made of 8-bit characters with "short" lines that end
in CR-LF.
This isn't too useful yet because 8bit data can't be shipped reliably
over standard SMTP mail transport.
The new ESMTP standard (RFC 1651) and 8bitMIMEtransport
extension (RFC 1652) handle 8-bit MIME messages.
-
binary
is like 8bit, but without CR-LF line boundaries.
One of MIME's main goals is to make different email
programs work with each other.
To make interoperability more likely, the MIME designers tried to avoid
having lots of different content-types.
They tried even harder to avoid lots of different encodings.
MIME content types and subtypes, as well as encodings, are registered with
the IANA (Internet Assigned Numbers Authority).
"Experimental" unofficial values start with X-, like X-pbm.
A few experimental content-types and subtypes are in wide use.
But in general, so that as many people as possible can read your message,
try to avoid inventing new content types and subtypes.
If you get the urge to create, the
comp.mail.mime newsgroup and the info-mime mailing list
are great places to work it out.
|