if you really want utf-8 headers...

Discussion:

Keith Moore

2004-01-01 06:34:38 UTC

Okay, I still see zero justification for utf-8 headers. The
improvement in transmission and storage efficiency is miniscule. They
make both user agents and mail transports more complex and less
reliable, because MTAs need to have conversion code (which will break
messages and cause delivery failures) and UAs need to be able to handle
old messages that use RFC 2047 (resulting in multiple code paths and
additional failure modes).

(That, and they don't address the problem that this group is trying to
solve...)

But if you believe that the very long term benefit of utf-8 headers (by
which I mean that whatever benefit might result from using utf-8 - and
it's by no means certain - won't be realized for a very long time)
somehow outweighs the very high near-term cost, then may I suggest that
the place to do the upgrade and negotiation is not in the mail
transport, but at the message store and message submission.

That is, the major benefit of using utf-8 headers would be to make life
easier for user agents and IMAP servers (for searching). They don't
benefit the transport at all. But I could imagine POP and IMAP options
that said "give me utf-8 headers instead of headers with RFC 2047
and/or IMAAs in them", and I could imagine simplified UAs that would
only talk to POP and IMAP servers that implemented that option.(I'd
hate the lack of interoperability between new simplified UAs and old
POP and IMAP servers, but there's already some precedent for UAs
insisting on nonstandard or optional features in POP and IMAP.)

Message stores could implement this in a variety of ways - they could
store the message as received and convert on-the-fly as necessary; they
could convert the header to utf-8 on receipt; etc. I could also
imagine a SUBMISSION server option that said "translate utf-8 headers
to proper on-the-wire format before forwarding them to their
destination" and UAs that would only submit messages to SUBMISSION
servers that advertised that option via EHLO. Messages sent through
SMTP or other transports would still, for the time being, be in ASCII.

I see several "nice" things about doing it this way:
- it isolates the complexity to portions of the system (the message
store and submission server) that are "close to" the portions of the
system (UAs and message stores) that benefit the most, which means that
users who benefit (if they do realize a benefit) will be in a better
position to get those portions upgraded.
- it is less disruptive because it affects fewer components of the mail
system at once.
- it isolates conversion to a small number of interfaces rather than
allowing conversion to potentially occur at any interface between one
MTA, gateway, firewall, filter, etc. and another, some of which offer
no opportunity for feature negotiation.
- It bounds the number of conversions that a message will undergo, and
thus bounds the potential for delivery failure and message corruption.
- it's easy to try on an experimental basis without impacting the
infrastructure

And if you also wanted to experiment with transporting utf-8
end-to-end, you could always define a SRV record for "direct utf-8 mail
delivery" and have the utf-8 SUBMISSION servers be aware of it, using
that in preference to MX. You could even use this as an means to
replace SMTP with something simpler, rather than making SMTP more
complex.

Keith

p.s. I said something like this in Minneapolis but some amplification
might be useful. I still think that even in the long term there's very
marginal benefit in going to utf-8 headers as long as we've got so many
other baroque irregularities in 2822 and MIME. Of course I understand
the potential for second-system effect, but if you just do utf-8
headers without changing anything else in the message format you're
paying a lot in upgrade cost to only simplify one fairly minor aspect
of the system.

Universal adoption of IMAAs is anything but assured. The largest age
group of the world population is fairly young (say, less than 21 years
old) . Many of these people have grown up with cheap travel and good
communications, and an international popular culture. They are used
to dealing with people from other countries, and in multiple languages.
Many of these people may find that IMAAs don't benefit them so much
and that it's easier to get all email at an ASCII address (or for that
matter at an E164 number using ENUM) than it is to deal with IMAAs.

I'm not trying to argue that we shouldn't try to define IMAAs - clearly
they will be useful to some people - I'm saying that IMAAs by
themselves probably don't justify a vast upgrade to the infrastructure.

John C Klensin

2004-01-01 17:40:59 UTC