First strawman for UTF-8 headers proposal

Discussion:

Paul Hoffman / IMC

2003-11-26 18:57:57 UTC

Greetings again. At the Minneapolis meeting, I proposed that if
people were interested in John's proposal to encode the addresses in
a message as UTF-8, they might be interested in making all the
headers UTF-8. (The proposal was initially sparked by Pete Resnick.)
Following the thread from the past few weeks, I have come up with the
following strawman. If no one finds any huge problems with this, I'll
turn it into an Internet Draft in a few weeks.

All comments welcome!

--Paul Hoffman

- The dual motivations are to allow UTF-8 everywhere in the headers and
to not bounce any messages just because they originated with UTF-8
headers.

- Allows current users who have all-ASCII mailbox names to step up
to UTF-8 headers easily.

- Updated sending MUAs will create all headers in UTF-8.

- Transmission is protected by a new ESMTP command, UTF-8-HEADERS.

- Everyone who has a UTF-8 mailbox name MUST also have an all-ASCII
mailbox name that is equivalent.

- The terminal SMTP server is responsible for knowing whether or not the
message store can handle UTF-8 headers.

- If a receiving SMTP server does not support UTF-8-HEADERS, the sending
SMTP client downgrades all headers and continues to send the message.

- Free text fields are downgraded using quoted-printable encoding;
SHOULD be into UTF-8 charset. Downgrading MUST only be done if
necessary.

- Downgrading email addresses that only contain UTF-8 in the domain name
is done with IDNA.

- For every address in a message with a UTF-8 mailbox name, the mail
initiator tries to create a mapping in a new header, Address-maps:. A
message only has one Address-map: header; the header has a string of
maps. The header is only for addresses that have an UTF-8 mailbox name;
it SHOULD NOT be used for addresses that have all-ASCII mailbox names,
even if those addresses have UTF-8 domain names.

- If the initiator has a UTF-8 mailbox name, the initiator MUST also
have an all-ASCII mailbox, and the all-ASCII address MUST appear in the
map header.

- If the initiator knows the mapping for any recipient (through caching
or an address book), they SHOULD put it in the map header. If they
don't include a mapping and the message hits a non-UTF-8-HEADERS
SMTP server, the message will bounce.

- The Address-map: header is downgraded using Base64 for mailbox
names, IDNA for domain names.

- Example:
Address-map: José@example.com,jose-***@example.com;
törbjø***@fältström.se,***@fältström.se
If passed to a non-UTF-8-HEADERS system, this header gets downgraded
to:
Address-map: Sm9zw6k=@example.com,jose-***@example.com;
dMO2cmJqw7hybg==@xn--fltstrm-5wa1o.se,***@xn--fltstrm-5wa1o.se

- Intermediate SMTP servers MAY change the values in the Address-map:
header (such as to add one that is missing or to correct a mapping), but
SHOULD only do so for local domains. This might be a bad idea and might
be removed.

- Terminal SMTP servers should write messages addressed to either the
UTF-8 address or the all-ASCII address into the same mailbox, but this
is not mandatory.

- POP and IMAP might be updated to allow one request to bring in two or
more mailboxes; otherwise, users will have to do two separate requests.

- Digital certificates for addresses that have UTF-8 LHSs should contain
both addresses; this is already supported in PKIX and OpenPGP.

- Other headers that include mailbox names and domain names will need
further definition for downgrading.

- MUAs are encouraged to cache address mappings they see, probably with
a user-settable time-to-live.

- Terminal SMTP servers MAY look into the headers of a message to
determine whether they should upgrade a downgraded set of headers to
UTF-8. This is easy to determine: if the Address-map: header contains
only ASCII, it was downgraded. Upgrading is particularly useful on
bounce messages caused by bad mappings.

- It might be good to have a protocol for determining mappings, but it
is not defined here.

- It might be better to have just one mapping per Address-map: header
and have multiple Address-map: headers per message.

--Paul Hoffman, Director
--Internet Mail Consortium

Simon Josefsson

2003-11-27 19:50:27 UTC