Can we back up a bit and ask some basic questions? Analternate model

Discussion:

Can we back up a bit and ask some basic questions? Analternate model

Roy Badami

2003-02-15 15:40:40 UTC

Your document is well argued. We certainly shouldn't blindly assume
that just because the ACE vs just-send-8 issue was argued to death in
the IDN WG, the trade-offs between the two approaches when applied to
IMAs will automatically be the same as those for IDNs. (Though I can
also understand that this group probably really doesn't want to go
there again.)

But I'd urge you to consider the four scenarios I just put forward in
the thread "What is IMAA: some scenarios for deployment"

Scenario 1a: with IMAA there's a reasonable hope of basic support with
just an updated mail client, and better support with minor updates to
the ISPs sign-up systems. With UTF8ADDRESSES this will require in
addition a major upgrade to the ISP's mail infrastructure.

Scenario 1b: IMAA and UTF8ADDRESSES both require a major upgrade to
the ISP's infrastructure.

Scenario 2a: IMAA requires only an upgrade to the mail clients;
UTF8ADDRESSES requires an upgrade to the clients, the organization's
MTA, and the ISP's mail infrastructure (so that the backup MX will
continue to work).

Scenario 2b: IMAA requires upgrades to the mail clients and, as
currently specified in the draft, an upgrade to the organization's MTA
(though the need to upgrade the MTA might disappear depending on the
design decisions we take in IMAA). UTF8ADDRESSES requires upgrades to
the clients, the MTA and the ISP's infrastructure.

-roy

Roy Badami

2003-02-15 17:48:04 UTC

I'd like to make a further comment on the UTF8ADDRESSES proposal.

I don't necessarily think that UTF8ADDRESSES is a bad idea. There is
far more scope for both IMAA and UTF8ADDRESSES to co-exist than there
was for two approaches to coexist in IDNs.

The e-mail infrastructure is far more easily extensible than the DNS
infrastructure. RFC (2)821 allows us the ESMTP extension mechanism,
and RFC (2)822 allows us to add additional headers. These mechanisms
are both powerful and general, and may well be useful to furthering
the goal of IMAs in a number of ways (about which I have more to say,
but that will have to wait for another message).

Compare it with the situation for mail bodies. The original MIME WG
chose not to extend SMTP to require it to be 8-bit-clean, but to
define quoting mechanisms to allow the transport of 8-bit mail over a
7-bit transport.

Subsequently, 8BITMIME was defined, to allow consenting MTAs to
exchange 8-bit mail unencoded over 8-bit-clean channels.

I think that there will clearly be a demand for both mechanisms to
exist, and I'm sure both kinds of mechanisms will be standardised. As
with MIME, initial deployment will probably primarily use an ACSII
encoding, but over time the native 8-bit approach will become more and
more prevalent.

It's unclear to me whether this group should confine itself to the
ASCII-encoded protocol, or should address both approaches
simultaneously.

I realize that many will probably wish to confine themselves to IMAA,
but I think there is a benefit in releasing both solutions to the
world simultaneously, so that people can choose which they wish to
deploy.

-roy

Paul Hoffman / IMC

2003-02-15 18:37:46 UTC

At 5:48 PM +0000 2/15/03, Roy Badami wrote:
>I realize that many will probably wish to confine themselves to IMAA,
>but I think there is a benefit in releasing both solutions to the
>world simultaneously, so that people can choose which they wish to
>deploy.

The IETF has a long history of very bad outcomes when we release two
very different ways to do the same thing. In the email world, the
fact that S/MIME and PGP are both IETF standards has pretty much
prevented any sender from being able to assume what the recipient can
receive. The fact that there are two standardized PKIX certificate
enrollment protocols has caused it to be almost impossible to roll
out secure email or VPNs in a reasonable fashion. And so on.

--Paul Hoffman, Director
--Internet Mail Consortium

Roy Badami

2003-02-15 19:32:21 UTC

The IETF has a long history of very bad outcomes when we release two
very different ways to do the same thing. In the email world, the
fact that S/MIME and PGP are both IETF standards has pretty much
prevented any sender from being able to assume what the recipient can
receive. The fact that there are two standardized PKIX certificate
enrollment protocols has caused it to be almost impossible to roll
out secure email or VPNs in a reasonable fashion. And so on.

I'm not sure that you understand what I was proposing. A better
analogy would be to pose the question: would it have been a good thing
for ESMTP and 8BITMIME to have been defined concurrently with the base
MIME standards? (That isn't intended as a loaded question; "No, it
wouldn't have made any difference" is a perfectly valid answer.)

I'm proposing that the two solutions are defined as part of an
encompassing IMA Architecture, not independently without regard to
interoperability.

This is what I imagine the world will be like a few years from now:

All IMA-aware systems will support IMAA (this will be a mandatory part
of some IMA Architecture standard).

Many systems, particularly those in parts of the world where IMAs are
popular, will use ESMTP extensions to exchange IMAs in native UTF-8,
and to exchange messages in an extended format that allows native UTF-8
addresses in the headers.

Systems that receive a message with UTF-8 addresses and need to relay
it to a system that doesn't support the requisite ESMTP extensions
will need to apply ToASCII to both the envelope and header addresses
before forwarding the message. This is analogous to the (admitedly
inconsistently implemented) requirement that a system which receives
an 8BITMIME message converts it to a suitable 7-bit encoding if the
destination system doesn't support 8BITMIME.

Maybe defining a complete IMA Architecture is too much to tackle at
once. As I envision it, IMAA will probably be the only manadatory
part of the IMA Architecture, so maybe there's a benefit in getting
that done quickly.

However, I think there may be benefits in putting some though into the
bigger picture of the IMA Architecture, even if we decide to
concentrate on IMAA for the present.

One example: if we were for sake of argument to decide that ILPs
should be case insensitive, but that it would be desirable (but not
absolutely essential) to make them case preserving, our decision as to
whether to put any effort into making IMAA preserve the case of
local-parts might be strongly influenced by whether we believe that
another component of the ultimate IMA Architecture would generally
carry e-mail addresses in ILP-aware and IDN-aware slots.

Only half jokingly, I propose that we repurpose the acronym IMAA to
mean IMA Architecture, and come up with a new name for the protocol
described in the base document. Part of the reason for this is that I
don't actually think that IMAA is a good name for the base document
protocol, since it has consequences that go beyond end-user
applications.

-roy

John C Klensin

2003-02-15 21:46:35 UTC

--On Saturday, 15 February, 2003 19:32 +0000 Roy Badami
<***@gnomon.org.uk> wrote:

> I'm not sure that you understand what I was proposing. A
> better analogy would be to pose the question: would it have
> been a good thing for ESMTP and 8BITMIME to have been defined
> concurrently with the base MIME standards? (That isn't
> intended as a loaded question; "No, it wouldn't have made any
> difference" is a perfectly valid answer.)

How would "basically, they were" grab you as an answer?

> I'm proposing that the two solutions are defined as part of an
> encompassing IMA Architecture, not independently without
> regard to interoperability.
>
> This is what I imagine the world will be like a few years from
> now:
>
> All IMA-aware systems will support IMAA (this will be a
> mandatory part of some IMA Architecture standard).
>
> Many systems, particularly those in parts of the world where
> IMAs are popular, will use ESMTP extensions to exchange IMAs
> in native UTF-8, and to exchange messages in an extended
> format that allows native UTF-8 addresses in the headers.
>
> Systems that receive a message with UTF-8 addresses and need
> to relay it to a system that doesn't support the requisite
> ESMTP extensions will need to apply ToASCII to both the
> envelope and header addresses before forwarding the message.
> This is analogous to the (admitedly inconsistently
> implemented) requirement that a system which receives an
> 8BITMIME message converts it to a suitable 7-bit encoding if
> the destination system doesn't support 8BITMIME.
>...

Ok, I see where you are headed. Let me try to summarize many
months of moaning in the IDN WG, plus some email experience,
including with 8BITMIME downgrading, which I'm glad you cited.
Disclaimer: unlike Paul and Adam, I'm not an IDNA co-author, and
am widely believed even be an IDNA-hater (not true), so you
don't get to assume that I'm biased in favor of IDNA-derived
solutions.

* UTF-8, while more common and better known than
punycode, is really not a very efficient encoding,
especially for Asian languages. Indeed, under a number
of conditions, it is a less efficient encoding. So,
other than aesthetics and the belief that large benefits
will accrue from its being closer to the internal form
used by several (many?) systems, there is no really
strong case to using it instead of punycode.

* There are more efficient encodings than either, and
they use all eight bits of octets, but they are even
more strange (less familiar and used in other places)
than punycode. Several of them are members of the
"start from 16-bit UCS-2 Unicode (or 32-bit UCS-4 10646)
strings and compress" family.

* Where we have two ways to do something, bad things
often happen. Paul identified one of them -- industry
looks at the two possibilities, throws up its collective
hands about interoperability and either does nothing or
does something that won't interwork with many systems.
The other is that they get mixed up. The scenario you
outline is a nearly-guaranteed recipe (as the
complexities of 8BITMIME downgrading has been) for an
over-clever MTA author to say "if I can send IMAA
without negotiation, and negotiation fails, I can either
go to all that downgrading trouble, which might not work
anyway, or I can just send the 8bit stuff, which might
get through. The latter is a lot less work, so..."

If negotiation is needed, then we should negotiate, regardless
of the agreed mail transport format. If the best mail transport
format is punycode, we should use it, whether the transport
environment permits 8bit or not. But alternate ways to do the
same thing, especially when they don't provide significantly
different functionality, tend to cause far more problems than
they are worth.

I covered another aspect of this in an off-list note to you and
Paul a short time ago. If either of you believe that it
contains any profound insights that would be helpful to others,
please feel free to forward it to the list.

john

Roy Badami

2003-02-15 22:18:05 UTC

> > I'm not sure that you understand what I was proposing. A
> > better analogy would be to pose the question: would it have
> > been a good thing for ESMTP and 8BITMIME to have been defined
> > concurrently with the base MIME standards? (That isn't
> > intended as a loaded question; "No, it wouldn't have made any
> > difference" is a perfectly valid answer.)
>
> How would "basically, they were" grab you as an answer?

Thanks, that's interesting. I'd always assumed that 8BITMIME was
defined much later (probably because MTAs get updated more slowly than
MUAs, so I first encountered it much later).

For the record, I'm largely happy with both IDNA and the base IMAA propo

> * UTF-8, while more common and better known than
> punycode, is really not a very efficient encoding [...]

> * There are more efficient encodings than either [...]

I have to say that I don't believe coding efficiency is incredibly
important to e-mail, particularly coding efficiently of addresses
(except insofar as we need to allow useful IMAs within existing
protocols that contain length restrictions).

The main reason why I think this will inevitably happen in the future
(regardless of whether this forum mandates it) is that in the long
term we will move to a message body which (by default) is just a block
of UTF-8, with no requirements for special coding in any headers or in
the body. Once this happens, and punycode is unnecessary within the
message, it would seem to me to make sence to eliminate it from
(2)821, too.

I guess this is outside the scope of the IMA list, however...

> over-clever MTA author to say "if I can send IMAA
> without negotiation, and negotiation fails, I can either
> go to all that downgrading trouble, which might not work
> anyway, or I can just send the 8bit stuff, which might
> get through. The latter is a lot less work, so..."

I'm not sure that the situations are comparable. The reason that MTA
authors do this with 8-bit to 7-bit conversion is partly that it
almost invariably works, so they can get away with it. Indeed, Dan
Bernstein has some arguments (that I don't entirely agree with) that
it works _better_ than following the RFCs (ie at least one author of a
major MTA made a considered decision to disregard this particular
requirement because in his opinion his approach interoperated better
with the rest of the Internet).

I don't believe the situation would be the same with IMAs -- I
strongly suspect that just-send-8 for RFC-(2)821 commands and
RFC-(2)822 headers simply won't work in most cases...

-roy

Martin Duerst

2003-02-15 23:37:24 UTC

At 22:18 03/02/15 +0000, Roy Badami wrote:

>I have to say that I don't believe coding efficiency is incredibly
>important to e-mail, particularly coding efficiently of addresses
>(except insofar as we need to allow useful IMAs within existing
>protocols that contain length restrictions).
>
>The main reason why I think this will inevitably happen in the future
>(regardless of whether this forum mandates it) is that in the long
>term we will move to a message body which (by default) is just a block
>of UTF-8,

I think this is the direction we are moving to, but not very
quickly. Similar for the WWW, we are seeing more and more UTF-8,
but again, not extremely quickly.

>with no requirements for special coding in any headers or in
>the body.

Something like Content-Type: text/plain;charset=utf-8
will be present for a VERY long time. But maybe that's
not what you mean by encoding.

>Once this happens, and punycode is unnecessary within the
>message, it would seem to me to make sence to eliminate it from
>(2)821, too.

I think this is one way to argue, but a) I don't think there
is any plan for using ACE explicitly within the message body
(it can always be used, but it will be just a random sequence
of ASCII letters); b) The motivation for uniform encoding
is much stronger in the headers than in the body (I'm very
happy that nobody has brought up proposals yet for using
a variety of legacy encodings, with labeling, in the header);
c) If we think we have a good feel about where we are going,
then it may be a lot cheaper to try to go there faster and
on the most direct way we can find rather than waste time.

Regards, Martin.

Roy Badami

2003-02-16 13:48:42 UTC

>The main reason why I think this will inevitably happen in the future
>(regardless of whether this forum mandates it) is that in the long
>term we will move to a message body which (by default) is just a block
>of UTF-8,

I think this is the direction we are moving to, but not very
quickly. Similar for the WWW, we are seeing more and more UTF-8,
but again, not extremely quickly.

Sorry, there was a typo in my comment above. I meant to say:

we will move to a _message_ which (by default) is just a block of UTF-8

ie, not only will out content-transfer-encoding be 8-bit, but we'll
dispense with the escaping mechanism in subject and other headers. At
this point it makes sense to dispense with the ACE in the headers,
too. A message once again becomes just a piece of text you can view
(most of) in a text editor.

>with no requirements for special coding in any headers or in
>the body.

Something like Content-Type: text/plain;charset=utf-8
will be present for a VERY long time. But maybe that's
not what you mean by encoding.

Indeed. I meant that quoted-printable and base64 will go away, at
least for plain text.

I think this is one way to argue, but a) I don't think there
is any plan for using ACE explicitly within the message body
(it can always be used, but it will be just a random sequence
of ASCII letters);

Sorry, I intended to refer to the entire message, where there is a
clear plan to use ACE.

b) The motivation for uniform encoding is much stronger in the
headers than in the body (I'm very happy that nobody has brought up
proposals yet for using a variety of legacy encodings, with labeling,
in the header);

Agreed.

c) If we think we have a good feel about where we are
going, then it may be a lot cheaper to try to go there faster and on
the most direct way we can find rather than waste time.

Having thought about it further, the kind of solution I was
envisioning would have to wait for a new message format to be defined,
in which the headers were 8-bit. Making this change just for
addresses doesn't make sense, and defining the native UTF-8 message
format is clearly outside the scope of the present discussions.

-roy

Martin Duerst

2003-02-16 15:33:21 UTC

At 13:48 03/02/16 +0000, Roy Badami wrote:

>Sorry, there was a typo in my comment above. I meant to say:
>
> we will move to a _message_ which (by default) is just a block of UTF-8

Thanks for the clarification.

>ie, not only will out content-transfer-encoding be 8-bit, but we'll
>dispense with the escaping mechanism in subject and other headers. At
>this point it makes sense to dispense with the ACE in the headers,
>too. A message once again becomes just a piece of text you can view
>(most of) in a text editor.

Yes, therefore allowing people around the globe to do all those
things with emails that people in the ASCII-only world have done
all the time: use text editors (as you say), write simple scripts
(with the emphasis on simple) to process their email, and so on.
Great!

>Having thought about it further, the kind of solution I was
>envisioning would have to wait for a new message format to be defined,
>in which the headers were 8-bit. Making this change just for
>addresses doesn't make sense, and defining the native UTF-8 message
>format is clearly outside the scope of the present discussions.

Well, I agree that we should concentrate on addresses here,
but looking ahead is part of good engineering. So even if
this happens in two steps (UTF8ADDRESS and UTF8HEADER),
we can think about the interactions. And if we find
out that it would be almost as easy to do both at the same,
and maybe just as one extension, then I don't think we
should feel restricted to not do it.

Actually, my current guess is that it's almost as much
effort to do both things in one extension as to do them
separately:

- Widening code paths to 8 bits has to be done only once,
and can be done completely.
- Negotiation is done on one item, rather than on several.
This significantly reduces code complexity.

The main problem is how to distinguish between header parts
that are addresses and those that are other text.

Regards, Martin.

Jeffrey J Zahari

2003-02-19 03:51:13 UTC

----- Original Message -----
From: "Martin Duerst" <***@w3.org>
To: "Roy Badami" <***@gnomon.org.uk>
Cc: <john-***@jck.com>; <ietf-***@imc.org>
Sent: Monday, February 17, 2003 12:33 AM
Subject: Re: Can we back up a bit and ask some basicquestions?Analternate
model

>
> At 13:48 03/02/16 +0000, Roy Badami wrote:
>
> >Sorry, there was a typo in my comment above. I meant to say:
> >
> > we will move to a _message_ which (by default) is just a block of
UTF-8
> >Having thought about it further, the kind of solution I was
> >envisioning would have to wait for a new message format to be defined,
> >in which the headers were 8-bit. Making this change just for
> >addresses doesn't make sense, and defining the native UTF-8 message
> >format is clearly outside the scope of the present discussions.
>
> Well, I agree that we should concentrate on addresses here,
> but looking ahead is part of good engineering. So even if
> this happens in two steps (UTF8ADDRESS and UTF8HEADER),
> we can think about the interactions. And if we find
> out that it would be almost as easy to do both at the same,
> and maybe just as one extension, then I don't think we
> should feel restricted to not do it.
>
> Actually, my current guess is that it's almost as much
> effort to do both things in one extension as to do them
> separately:
>

What you're envisioning is something brought up previously, that IMAA should
update 2821/2822.

I assume UTF8ADDRESS refers to 2821 level email addresses and UTF8HEADER
refers to email addresses within 2822 headers. What happens if an
intermediate legacy smtp server cannot handle UTF8ADDRESS, and a receiver's
MUA cannot handle messages with UTF8HEADER?

With the IMAA-ACE approaches ( or until 2821/2 is altered/implemented ),
unless the MTA requires non opaque lhs, this seems like the most efficient
path to internationalised emails.

jeffrey j zahari

Martin Duerst

2003-02-19 14:45:48 UTC

At 12:51 03/02/19 +0900, Jeffrey J Zahari wrote:

>----- Original Message -----
>From: "Martin Duerst" <***@w3.org>

> > Well, I agree that we should concentrate on addresses here,
> > but looking ahead is part of good engineering. So even if
> > this happens in two steps (UTF8ADDRESS and UTF8HEADER),
> > we can think about the interactions. And if we find
> > out that it would be almost as easy to do both at the same,
> > and maybe just as one extension, then I don't think we
> > should feel restricted to not do it.
> >
> > Actually, my current guess is that it's almost as much
> > effort to do both things in one extension as to do them
> > separately:

>What you're envisioning is something brought up previously, that IMAA should
>update 2821/2822.

Well, as far as I understand, defining an SMTP service extension
does not constitute an update to 2821/2822 itself.

>I assume UTF8ADDRESS refers to 2821 level email addresses and UTF8HEADER
>refers to email addresses within 2822 headers.

Well, overall, there would actually be three things:
- 2821 email addresses
- 2822 email addresses
- 2822 other text (where encoded words are used currently)
[there may also be 2821 other text, but I'm not aware of such]

So overall, we might need three different extensions. But as I said,
I don't think it makes too much sense to allow 2822 email addresses
in UTF-8 but to restrict other text to encoded words (although
I have heard others think about such proposals). Of course, because
our main focus is on email addresses, it also doesn't make much
sense to solve the problem for 2822 other text, but not for 2822
email addresses.

I understand less about the relationship between 2821 and 2822
functionality, but it may also turn out that they are related
enough that it doesn't make sense to define two different extensions.

>What happens if an
>intermediate legacy smtp server cannot handle UTF8ADDRESS, and a receiver's
>MUA cannot handle messages with UTF8HEADER?

In the scenarios we are discussing here, that would be negotiated
as any other SMTP extension. The only problem may be that there is
not really a negotiation between the last MTA and the receiving MUA.

>With the IMAA-ACE approaches ( or until 2821/2 is altered/implemented ),
>unless the MTA requires non opaque lhs, this seems like the most efficient
>path to internationalised emails.

Well, adding another layer of encoding such as proposed in IMAA-ACE
can in some way be quite efficient, but it complicates things
forever if there is no alternative. Also, it's not clear how
quickly we will arrive at a solution.

If I think about IDNS, then my original proposal was published in
December 1996, and it took overall more than six years to come to
the point where actual deployment can now start. So I'm not so
confident that this one will be so very quick.

Regards, Martin.

Paul Hoffman / IMC

2003-02-19 16:47:11 UTC

At 9:45 AM -0500 2/19/03, Martin Duerst wrote:
>Well, as far as I understand, defining an SMTP service extension
>does not constitute an update to 2821/2822 itself.

It requires an update to 2822 if that extension will change the rules
for message format.

>>What happens if an
>>intermediate legacy smtp server cannot handle UTF8ADDRESS, and a receiver's
>>MUA cannot handle messages with UTF8HEADER?
>
>In the scenarios we are discussing here, that would be negotiated
>as any other SMTP extension. The only problem may be that there is
>not really a negotiation between the last MTA and the receiving MUA.

That's not a small "only problem". It means that messages might be
rejected or accepted at different times and therefore the sender will
not be able to predict whether or not he can send a message. Note
that this problem manifests not only for To: addresses, but also
From: and Cc: addresses (and probably others...). You can't predict
whether or not your inclusion of your own IMAA in the From: field or
the Cc: field will cause a message to be bounced.

--Paul Hoffman, Director
--Internet Mail Consortium

Roy Badami

2003-02-19 20:03:13 UTC

> That's not a small "only problem". It means that messages might be
> rejected or accepted at different times and therefore the sender will
> not be able to predict whether or not he can send a message. Note
> that this problem manifests not only for To: addresses, but also
> From: and Cc: addresses (and probably others...). You can't predict
> whether or not your inclusion of your own IMAA in the From: field or
> the Cc: field will cause a message to be bounced.

I think the only way this would be viable is if it was mandatory to
convert to IMAA-ACE rather than bounce. The MUA issue would be up to
local sites. Until they'd upgraded all their MUAs, they could simply
convert all mail to IMAA-ACE, or they could make decisions on a per
mailbox basis.

The point is that it *is* viable to regard IMAA-ACE as a transition
strategy to a native UTF-8 format (at least as far as SMTP based mail
goes) because we have a robust negotiation mechanism. In this way,
IMAs are different from IDNs. Support for the IMAA-ACE would still
have to remain for a *long* time, of course (perhaps decades).

I'm not necessarily saying it's a good idea, just noting that it is
possible.

-roy

Paul Hoffman / IMC

2003-02-19 21:40:34 UTC

At 8:03 PM +0000 2/19/03, Roy Badami wrote:
>I think the only way this would be viable is if it was mandatory to
>convert to IMAA-ACE rather than bounce.

So we need two mechanisms instead of one? And the advantage of that is...?

For those of you who didn't follow the IDN WG for the past few years,
this is highly analogous to the debate that happened there. The whole
idea of a "transition" sounds great until you realize that the second
format is going to be with us forever. Given that the transition
strategy is harder than simply going with IMAA-ACE, there has to be a
good reason for it.

I don't consider "UTF-8 is good" to be a good enough reason. (And
before anyone here calls me "anti-UTF-8", please look at the top of
the first page of the UTF-8 RFC.)

>The point is that it *is* viable to regard IMAA-ACE as a transition
>strategy to a native UTF-8 format (at least as far as SMTP based mail
>goes) because we have a robust negotiation mechanism. In this way,
>IMAs are different from IDNs.

Right: we were smart enough not to do that in the IDN WG.

>I'm not necessarily saying it's a good idea, just noting that it is
>possible.

Generally, a stronger argument than that is needed.

--Paul Hoffman, Director
--Internet Mail Consortium

Jeffrey J Zahari

2003-02-20 03:39:10 UTC

----- Original Message -----
From: "Paul Hoffman / IMC" <***@imc.org>
To: <ietf-***@imc.org>
Sent: Thursday, February 20, 2003 6:40 AM
Subject: Re: Can we back up a bit and ask some basicquestions?Analternate
model

>
> At 8:03 PM +0000 2/19/03, Roy Badami wrote:
> >I think the only way this would be viable is if it was mandatory to
> >convert to IMAA-ACE rather than bounce.
>
> So we need two mechanisms instead of one? And the advantage of that is...?
>
> For those of you who didn't follow the IDN WG for the past few years,
> this is highly analogous to the debate that happened there. The whole
> idea of a "transition" sounds great until you realize that the second
> format is going to be with us forever. Given that the transition
> strategy is harder than simply going with IMAA-ACE, there has to be a
> good reason for it.
>
> I don't consider "UTF-8 is good" to be a good enough reason. (And
> before anyone here calls me "anti-UTF-8", please look at the top of
> the first page of the UTF-8 RFC.)
>
> >The point is that it *is* viable to regard IMAA-ACE as a transition
> >strategy to a native UTF-8 format (at least as far as SMTP based mail
> >goes) because we have a robust negotiation mechanism. In this way,
> >IMAs are different from IDNs.
>
> Right: we were smart enough not to do that in the IDN WG.
>

The process of query and reply of domain names differ from the mechanism
provided by 2821 in that there is an opportunity for the sending MTA to
negotiate the encoding of destination mailbox names with the receiving MTA.
In that sense, 2821 UTF8ADDRESS can exist as a separate proposal to IMAA.

Here is how both can exist: because IMAA-ACE approach implicitly assumes
2821 IMAA-ACE, a sender/receiver MTA can, using smtp extensions, specify
UTF8ADDRESS or IMAAADDRESS, leaving it up to the intermediate MTA to do the
appropriate conversions if necessary. It is assumed that servers advertising
UTF8ADDRESS have the wherewithal to IDNA the RHS for dns resolution.

jeffrey j zahari

Marc Mutz

2003-02-20 13:19:54 UTC

On Thursday 20 February 2003 04:39, Jeffrey J Zahari wrote:
<snip>
> Here is how both can exist: because IMAA-ACE approach implicitly
> assumes 2821 IMAA-ACE, a sender/receiver MTA can, using smtp
> extensions, specify UTF8ADDRESS or IMAAADDRESS, leaving it up to the
> intermediate MTA to do the appropriate conversions if necessary. It
> is assumed that servers advertising UTF8ADDRESS have the wherewithal
> to IDNA the RHS for dns resolution.
<snip>

Any UTF8ADDRESS extension to SMTP is a way to make the SMTP local part
(and domain?) slots explicitly IMA-aware. This is completely orthogonal
to IMAA-ACE. A server supporting UTF8ADDRESS would be required to
encode all addresses in IMAA-ACE if the next hop doesn't announce the
UTF8ADDRESS extension.

An UTF8ADDRESS SMTP extension doesn't solve the problem for any other
IMA-unaware slot. Insofar it's usefulness is limited (though the same
can be said about 8BITMIME, of course). It's a convenience to - yes to
whom? While 8BITMIME corresponds to the "8bit" CTE in MIME messages and
thus saves (if supported) applying a CTE at the MUA level, UTF8ADDRESS
lacks such support outside of SMTP, since rfc2822 slots will still be
IMAA-unaware.

So: Who is going to benefit from UTF8ADDRESS (other than aesthetics)?

Marc

--
Ein Grundrecht auf Sicherheit steht bewusst nicht in der Verfassung.
-- Sabine Leutheusser-Schnarrenberger (ehem. Bundesjustizministerin)

J-F C. (Jefsey) Morfin

2003-02-20 15:44:05 UTC

At 14:19 20/02/03, Marc Mutz wrote:
>Who is going to benefit from UTF8ADDRESS (other than aesthetics)?

users.
It happens that they do not care about UTF8ADDRESS, but they DO care about
aesthetics.

Paul Hoffman / IMC

2003-02-20 16:58:15 UTC

This thread has gone towards making guesses about how IMAA-UTF8 would
be specified, and different people have different guesses.

When there is a complete Internet Draft on IMAA-UTF8, we can discuss
it sensibly; until then, we can't. If John or Martin or some other
proponent of the idea wants to make a draft, please do so. It would
be quite appropriate to discuss it on this mailing list. But until
then, could we curtail the guessing?

--Paul Hoffman, Director
--Internet Mail Consortium

Edmon Chung

2003-02-20 17:32:53 UTC

I do have a draft for it actually based on ESMTP.
Should I just send it to this list or do we have a draft archive?... or
should I send to ietf for archival?...
Edmon

----- Original Message -----
From: "Paul Hoffman / IMC" <***@imc.org>
To: <ietf-***@imc.org>
Sent: Thursday, February 20, 2003 11:58 AM
Subject: Re: Can we back up a bit and ask some basic questions?An alternate
model

>
> This thread has gone towards making guesses about how IMAA-UTF8 would
> be specified, and different people have different guesses.
>
> When there is a complete Internet Draft on IMAA-UTF8, we can discuss
> it sensibly; until then, we can't. If John or Martin or some other
> proponent of the idea wants to make a draft, please do so. It would
> be quite appropriate to discuss it on this mailing list. But until
> then, could we curtail the guessing?
>
> --Paul Hoffman, Director
> --Internet Mail Consortium
>

Martin Duerst

2003-02-20 18:28:52 UTC

Great! Please submit it as an Internet-Draft and copy this list.

Regards, Martin.

At 12:32 03/02/20 -0500, Edmon Chung wrote:

>I do have a draft for it actually based on ESMTP.
>Should I just send it to this list or do we have a draft archive?... or
>should I send to ietf for archival?...
>Edmon
>
>
>----- Original Message -----
>From: "Paul Hoffman / IMC" <***@imc.org>
>To: <ietf-***@imc.org>
>Sent: Thursday, February 20, 2003 11:58 AM
>Subject: Re: Can we back up a bit and ask some basic questions?An alternate
>model
>
>
> >
> > This thread has gone towards making guesses about how IMAA-UTF8 would
> > be specified, and different people have different guesses.
> >
> > When there is a complete Internet Draft on IMAA-UTF8, we can discuss
> > it sensibly; until then, we can't. If John or Martin or some other
> > proponent of the idea wants to make a draft, please do so. It would
> > be quite appropriate to discuss it on this mailing list. But until
> > then, could we curtail the guessing?
> >
> > --Paul Hoffman, Director
> > --Internet Mail Consortium
> >

Edmon Chung

2003-02-20 20:58:27 UTC

I have just submitted the draft on IMA based on SMTP and POP extensions to
the IETF. You could also check it out at
http://www.dnsii.org/draft-ietf-chung-imax-00.txt

Because I wrote it about 2.5 years ago some of the stuff might need some
update. Anyway, comments and discussions would be very much appreciated.
:-)

Edmon

----- Original Message -----
From: "Martin Duerst" <***@w3.org>
To: "Edmon Chung" <***@neteka.com>; <ietf-***@imc.org>; "Paul Hoffman /
IMC" <***@imc.org>
Sent: Thursday, February 20, 2003 1:28 PM
Subject: Re: Can we back up a bit and ask some basic questions?An alternate
model

>
> Great! Please submit it as an Internet-Draft and copy this list.
>
> Regards, Martin.
>
> At 12:32 03/02/20 -0500, Edmon Chung wrote:
>
> >I do have a draft for it actually based on ESMTP.
> >Should I just send it to this list or do we have a draft archive?... or
> >should I send to ietf for archival?...
> >Edmon
> >
> >
> >----- Original Message -----
> >From: "Paul Hoffman / IMC" <***@imc.org>
> >To: <ietf-***@imc.org>
> >Sent: Thursday, February 20, 2003 11:58 AM
> >Subject: Re: Can we back up a bit and ask some basic questions?An
alternate
> >model
> >
> >
> > >
> > > This thread has gone towards making guesses about how IMAA-UTF8 would
> > > be specified, and different people have different guesses.
> > >
> > > When there is a complete Internet Draft on IMAA-UTF8, we can discuss
> > > it sensibly; until then, we can't. If John or Martin or some other
> > > proponent of the idea wants to make a draft, please do so. It would
> > > be quite appropriate to discuss it on this mailing list. But until
> > > then, could we curtail the guessing?
> > >
> > > --Paul Hoffman, Director
> > > --Internet Mail Consortium
> > >
>
>

Simon Josefsson

2003-02-20 21:41:03 UTC

"Edmon Chung" <***@neteka.com> writes:

> I have just submitted the draft on IMA based on SMTP and POP extensions to
> the IETF. You could also check it out at
> http://www.dnsii.org/draft-ietf-chung-imax-00.txt

This looks good IMHO.

Apparently, this seems to be a back-wards compatible way to make SMTP
and POP-3 accept internationalized mail addresses, thus making it
possible to gradually phase out the 7-bit legacy compatible punycode
hack. I'm not sure the M-* headers is such a good idea though,
perhaps it is better to simply make this a way to enable non-ASCII in
SMTP and POP3 and leave header internationalization up to another
standard. The M-From also seem to break the SMTP and RFC822 envelope
header dichotomy (MAIL FROM doesn't need to be the same as From:).

Add a IMAP capability in the same vein and we are set.

Edmon Chung

2003-02-20 23:06:56 UTC

Good to hear from you.
I actually agree with you about the M- headers thing... i wasnt so sure to
begin with.
But in order to phase out the ACE fallback, there needs to be new header
fields that could use email addresses in other forms than ACE... What are
your thoughts?

Edmon

----- Original Message -----
From: "Simon Josefsson" <***@extundo.com>
To: "Edmon Chung" <***@neteka.com>
Cc: <ietf-***@imc.org>
Sent: Thursday, February 20, 2003 4:41 PM
Subject: Re: Can we back up a bit and ask some basic questions?An alternate
model

>
> "Edmon Chung" <***@neteka.com> writes:
>
> > I have just submitted the draft on IMA based on SMTP and POP extensions
to
> > the IETF. You could also check it out at
> > http://www.dnsii.org/draft-ietf-chung-imax-00.txt
>
> This looks good IMHO.
>
> Apparently, this seems to be a back-wards compatible way to make SMTP
> and POP-3 accept internationalized mail addresses, thus making it
> possible to gradually phase out the 7-bit legacy compatible punycode
> hack. I'm not sure the M-* headers is such a good idea though,
> perhaps it is better to simply make this a way to enable non-ASCII in
> SMTP and POP3 and leave header internationalization up to another
> standard. The M-From also seem to break the SMTP and RFC822 envelope
> header dichotomy (MAIL FROM doesn't need to be the same as From:).
>
> Add a IMAP capability in the same vein and we are set.
>
>

Simon Josefsson

2003-02-20 23:19:09 UTC

"Edmon Chung" <***@neteka.com> writes:

> Good to hear from you.
> I actually agree with you about the M- headers thing... i wasnt so sure to
> begin with.
> But in order to phase out the ACE fallback, there needs to be new header
> fields that could use email addresses in other forms than ACE... What are
> your thoughts?

I don't think the issues need to be linked -- it would be possible to
phase out punycoded address in SMTP MAIL FROM but keep them in message
headers. It seems to me that internationalization of RFC 2821 and
POP3 and IMAP is independent of internationalization of RFC 2822. It
seems unlikely that it will be possible to move away from punycode in
RFC 2822 soon because it is a stored format rather than a interactive
protocol like RFC 2821. In a protocol you can agree on new
non-back-wards compatible behaviour for a single session (such as your
proposal) because all parties that are interested in the session are
present and can negotiate, but in a storage format any new features
should either be developed within the original or specification, or a
completely new version of the format should be developed, because all
parties that will see a certain RFC 2821 message is not present and
able to interact with the sender to negotiate what features to use.

A separate issue: I think your document should say that the strings
passed in MAIL FROM (whether as ACE or UTF-8) should be processed by
the IMA stringprep profile.

Jeffrey J Zahari

2003-02-21 02:52:33 UTC

The charset identifier is redundant within the examples. The use of the Q/B
encoded like words as MAIL FROM nouns identifies the charset. Similarly, the
xn-- identifies the ACE. Unless 8 bit is used, this is not needed.

The use of the M-headers should be renamed as X- headers for experimental
usage. But, isn't the split between 2821 and 2822 a design decision to keep
the message payload separate from the transport mechanism ? This looks like
a tie in between 2821 & 2822.

In general, smtp implementations should keep data from 2821 separate from
the 2822 message object, storing them internally as some form of meta data.

jeffrey j zahari

----- Original Message -----
From: "Simon Josefsson" <***@extundo.com>
To: "Edmon Chung" <***@neteka.com>
Cc: <ietf-***@imc.org>
Sent: Friday, February 21, 2003 8:19 AM
Subject: Re: Can we back up a bit and ask some basic questions?An alternate
model

>
> "Edmon Chung" <***@neteka.com> writes:
>
> > Good to hear from you.
> > I actually agree with you about the M- headers thing... i wasnt so sure
to
> > begin with.
> > But in order to phase out the ACE fallback, there needs to be new header
> > fields that could use email addresses in other forms than ACE... What
are
> > your thoughts?
>
> I don't think the issues need to be linked -- it would be possible to
> phase out punycoded address in SMTP MAIL FROM but keep them in message
> headers. It seems to me that internationalization of RFC 2821 and
> POP3 and IMAP is independent of internationalization of RFC 2822. It
> seems unlikely that it will be possible to move away from punycode in
> RFC 2822 soon because it is a stored format rather than a interactive
> protocol like RFC 2821. In a protocol you can agree on new
> non-back-wards compatible behaviour for a single session (such as your
> proposal) because all parties that are interested in the session are
> present and can negotiate, but in a storage format any new features
> should either be developed within the original or specification, or a
> completely new version of the format should be developed, because all
> parties that will see a certain RFC 2821 message is not present and
> able to interact with the sender to negotiate what features to use.
>
> A separate issue: I think your document should say that the strings
> passed in MAIL FROM (whether as ACE or UTF-8) should be processed by
> the IMA stringprep profile.
>
>

Simon Josefsson

2003-02-21 12:16:39 UTC

Simon Josefsson <***@extundo.com> writes:

> A separate issue: I think your document should say that the strings
> passed in MAIL FROM (whether as ACE or UTF-8) should be processed by
> the IMA stringprep profile.

Sorry for following up to myself, but on second thought I think this
was a poor suggestion. The stringprep processing should be performed
by the receiver. The sender may do it, but shouldn't be required to.

My main reason is that stringprep is Unicode specific, and this
proposal parametrizes the character set, which is a good property.
Thus, stringprep would only be applicable to the UTF-8 case, which
would be confusing. The alternative, to restrict the proposal to only
UTF-8 would be less useful. Another alternative would be to add a
meta-charset "IMAA" which is UTF-8 with IMAA stringprep processing,
but I see no gain from it.

Another reason is that a robust receiver will perform stringprep
processing anyway, to be sure to catch non-IMAA aware clients, which
would be used when someone uses e.g. telnet to a SMTP port.

James Seng

2003-02-24 04:49:04 UTC

Few comments:

1. Please submit this as an proper IETF I-D, thank you.

2. There are two separate issues, 2821 & 2822. IMAA deals along 2822 (and
probably above it) whereas your proposal deals with 2821 specifically.
Please dont violate laying.

3. Your proposal did not address with 2821 servers who did not response
IMAX.

4. Dont forget the lessons on 8BITMIME and/or the lack of it.

-James Seng

----- Original Message -----
From: "Edmon Chung" <***@neteka.com>
To: <ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>; "Martin
Duerst" <***@w3.org>
Sent: Friday, February 21, 2003 4:58 AM
Subject: Re: Can we back up a bit and ask some basic questions?An alternate
model

>
> I have just submitted the draft on IMA based on SMTP and POP extensions to
> the IETF. You could also check it out at
> http://www.dnsii.org/draft-ietf-chung-imax-00.txt
>
> Because I wrote it about 2.5 years ago some of the stuff might need some
> update. Anyway, comments and discussions would be very much appreciated.
> :-)
>
> Edmon
>
>
>
> ----- Original Message -----
> From: "Martin Duerst" <***@w3.org>
> To: "Edmon Chung" <***@neteka.com>; <ietf-***@imc.org>; "Paul Hoffman /
> IMC" <***@imc.org>
> Sent: Thursday, February 20, 2003 1:28 PM
> Subject: Re: Can we back up a bit and ask some basic questions?An
alternate
> model
>
>
> >
> > Great! Please submit it as an Internet-Draft and copy this list.
> >
> > Regards, Martin.
> >
> > At 12:32 03/02/20 -0500, Edmon Chung wrote:
> >
> > >I do have a draft for it actually based on ESMTP.
> > >Should I just send it to this list or do we have a draft archive?... or
> > >should I send to ietf for archival?...
> > >Edmon
> > >
> > >
> > >----- Original Message -----
> > >From: "Paul Hoffman / IMC" <***@imc.org>
> > >To: <ietf-***@imc.org>
> > >Sent: Thursday, February 20, 2003 11:58 AM
> > >Subject: Re: Can we back up a bit and ask some basic questions?An
> alternate
> > >model
> > >
> > >
> > > >
> > > > This thread has gone towards making guesses about how IMAA-UTF8
would
> > > > be specified, and different people have different guesses.
> > > >
> > > > When there is a complete Internet Draft on IMAA-UTF8, we can discuss
> > > > it sensibly; until then, we can't. If John or Martin or some other
> > > > proponent of the idea wants to make a draft, please do so. It would
> > > > be quite appropriate to discuss it on this mailing list. But until
> > > > then, could we curtail the guessing?
> > > >
> > > > --Paul Hoffman, Director
> > > > --Internet Mail Consortium
> > > >
> >
> >
>

Edmon Chung

2003-02-24 07:05:44 UTC

Hi James,

----- Original Message -----
From: "James Seng" <***@pobox.org.sg>
> 1. Please submit this as an proper IETF I-D, thank you.

has been done, please check:
http://www.ietf.org/internet-drafts/draft-chung-imax-00.txt

> 2. There are two separate issues, 2821 & 2822. IMAA deals along 2822 (and
> probably above it) whereas your proposal deals with 2821 specifically.
> Please dont violate laying.

yup, have been addressed by a couple of ppl.
Will be updated in -01, where section 3 will be eliminated altogether.

> 3. Your proposal did not address with 2821 servers who did not response
> IMAX.

You are right. I forgot to add it in, it would be the same as in the second
example in Section 2.2 though. But yes, I will add it in.

> 4. Dont forget the lessons on 8BITMIME and/or the lack of it.

I was actually thinking that IMAX would encourage the support and
annoucement by server of ESMTP+8bitmime support. :-)

Edmon

>
> -James Seng
>
> ----- Original Message -----
> From: "Edmon Chung" <***@neteka.com>
> To: <ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>; "Martin
> Duerst" <***@w3.org>
> Sent: Friday, February 21, 2003 4:58 AM
> Subject: Re: Can we back up a bit and ask some basic questions?An
alternate
> model
>
>
> >
> > I have just submitted the draft on IMA based on SMTP and POP extensions
to
> > the IETF. You could also check it out at
> > http://www.dnsii.org/draft-ietf-chung-imax-00.txt
> >
> > Because I wrote it about 2.5 years ago some of the stuff might need some
> > update. Anyway, comments and discussions would be very much
appreciated.
> > :-)
> >
> > Edmon
> >
> >
> >
> > ----- Original Message -----
> > From: "Martin Duerst" <***@w3.org>
> > To: "Edmon Chung" <***@neteka.com>; <ietf-***@imc.org>; "Paul Hoffman
/
> > IMC" <***@imc.org>
> > Sent: Thursday, February 20, 2003 1:28 PM
> > Subject: Re: Can we back up a bit and ask some basic questions?An
> alternate
> > model
> >
> >
> > >
> > > Great! Please submit it as an Internet-Draft and copy this list.
> > >
> > > Regards, Martin.
> > >
> > > At 12:32 03/02/20 -0500, Edmon Chung wrote:
> > >
> > > >I do have a draft for it actually based on ESMTP.
> > > >Should I just send it to this list or do we have a draft archive?...
or
> > > >should I send to ietf for archival?...
> > > >Edmon
> > > >
> > > >
> > > >----- Original Message -----
> > > >From: "Paul Hoffman / IMC" <***@imc.org>
> > > >To: <ietf-***@imc.org>
> > > >Sent: Thursday, February 20, 2003 11:58 AM
> > > >Subject: Re: Can we back up a bit and ask some basic questions?An
> > alternate
> > > >model
> > > >
> > > >
> > > > >
> > > > > This thread has gone towards making guesses about how IMAA-UTF8
> would
> > > > > be specified, and different people have different guesses.
> > > > >
> > > > > When there is a complete Internet Draft on IMAA-UTF8, we can
discuss
> > > > > it sensibly; until then, we can't. If John or Martin or some other
> > > > > proponent of the idea wants to make a draft, please do so. It
would
> > > > > be quite appropriate to discuss it on this mailing list. But until
> > > > > then, could we curtail the guessing?
> > > > >
> > > > > --Paul Hoffman, Director
> > > > > --Internet Mail Consortium
> > > > >
> > >
> > >
> >
>
>

Paul Hoffman / IMC

2003-02-24 15:59:54 UTC

At 2:05 AM -0500 2/24/03, Edmon Chung wrote:
>http://www.ietf.org/internet-drafts/draft-chung-imax-00.txt

This doesn't address any of the issues that I raised with John's
proposal of IMAA-UTF8, and in fact brings in many more horrible
problems like bad charset mappings and forcing a client fallback.

What is the actual deployment advantage of a ESMTP extension over
just plain IMAA-ACE? If the MUA or the MTA that is about to write to
the message store needs to be able to act differently based on
different input, which is true in all the proposals so far, wouldn't
a solution that has zero effect on SMTP be better than one that
requires a complete infrastructure upgrade (IMAA-UTF8) or one that
has an optional infrastructure upgrade and requires an IMAA-ACE
"fallback"?

--Paul Hoffman, Director
--Internet Mail Consortium

Simon Josefsson

2003-02-24 18:53:25 UTC

Paul Hoffman / IMC <***@imc.org> writes:

> At 2:05 AM -0500 2/24/03, Edmon Chung wrote:
>>http://www.ietf.org/internet-drafts/draft-chung-imax-00.txt
>
> This doesn't address any of the issues that I raised with John's
> proposal of IMAA-UTF8, and in fact brings in many more horrible
> problems like bad charset mappings and forcing a client fallback.

That is a problem? Not supporting anything else than UTF-8 is a
problem with IDNA and IMAA in the real world. The bad charset
mappings exist with IDNA/IMAA too, only that they are disguised by an
assumption in the specifications. As long as not every machine on the
Internet uses UTF-8 you must handle conversion and fallback at some
point. I'd advocate solutions that tries to face and solve that
problem, instead of hiding the problem by assuming the real world only
uses Unicode.

(I'm not saying the proposed fallback mechanism is the perfect one
though, I'm sure it can be improved.)

> What is the actual deployment advantage of a ESMTP extension over just
> plain IMAA-ACE?

It makes SMTP support non-ASCII for email addresses. Compared to
using IMAA-ACE, the advantage is that an ESMTP extension doesn't
require the use of a punycode encoder/decoder. What would the actual
deployment advantage of using IMAA-ACE over an ESMTP extension with an
ACE fallback be?

> If the MUA or the MTA that is about to write to the message store
> needs to be able to act differently based on different input, which
> is true in all the proposals so far, wouldn't a solution that has
> zero effect on SMTP be better than one that requires a complete
> infrastructure upgrade (IMAA-UTF8) or one that has an optional
> infrastructure upgrade and requires an IMAA-ACE "fallback"?

If you want to deploy as fast as possible and only require changes in
transport end points, yes. If you want to upgrade to a clean and
easily maintined design in the long-term, no.

An optional infrastructure upgrade and ACE fallback sounds good to me.
Then in 20 years time where all machines to use Unicode, we can relax
the ACE fallback into a MAY and eventually get rid of it.

Paul Hoffman / IMC

2003-02-24 19:14:15 UTC

At 7:53 PM +0100 2/24/03, Simon Josefsson wrote:
> > What is the actual deployment advantage of a ESMTP extension over just
>> plain IMAA-ACE?
>
>It makes SMTP support non-ASCII for email addresses. Compared to
>using IMAA-ACE, the advantage is that an ESMTP extension doesn't
>require the use of a punycode encoder/decoder.

Are you saying that using a punycode decoder when writing to a
message store is *harder* than doing an ESMTP extension that might
involve bouncing or dropping mail? That seems kind of extreme, given
that the punycode decoding is completely optional. And I don't
understand why you talk about a punycode encoder; that is never
needed by the SMTP server in IMAA-ACE.

> What would the actual
>deployment advantage of using IMAA-ACE over an ESMTP extension with an
>ACE fallback be?

That there would be no required change to the deployed base of SMTP
servers out there, some of which are in hardware and cannot be
upgraded. Internet mail is already deployed; forcing a change when it
isn't needed is just plain bad design. Localizing the protocol change
to one place makes it easier to deploy and makes it more predicable
for end users.

Do I need to go on?

>If you want to deploy as fast as possible and only require changes in
>transport end points, yes. If you want to upgrade to a clean and
>easily maintined design in the long-term, no.

In what way is using a new ESMTP extension more "easily maintained"?
That certainly is not the experience in the SMTP world so far.

Clean is in the eye of the beholder. You and I like UTF-8, but many
people don't. Forcing them to use our preferred charset isn't a good
practice if it can be avoided.

--Paul Hoffman, Director
--Internet Mail Consortium

Simon Josefsson

2003-02-24 19:53:07 UTC

Paul Hoffman / IMC <***@imc.org> writes:

> At 7:53 PM +0100 2/24/03, Simon Josefsson wrote:
>> > What is the actual deployment advantage of a ESMTP extension over just
>>> plain IMAA-ACE?
>>
>>It makes SMTP support non-ASCII for email addresses. Compared to
>>using IMAA-ACE, the advantage is that an ESMTP extension doesn't
>>require the use of a punycode encoder/decoder.
>
> Are you saying that using a punycode decoder when writing to a message
> store is *harder* than doing an ESMTP extension that might involve
> bouncing or dropping mail? That seems kind of extreme, given that the
> punycode decoding is completely optional. And I don't understand why
> you talk about a punycode encoder; that is never needed by the SMTP
> server in IMAA-ACE.

I'm saying that when implementing a MTA it is easier if I don't have
to implement punycode in order to support non-ASCII.

How would an ESMTP extension with an ACE fallback (i.e., IMAX) involve
bouncing or dropping mail?

Punycode decoding is not optional if the MTA wants to support
non-ASCII. If the MTA doesn't want to support non-ASCII (for logging,
for aliases, for routing, etc), none of this is relevant anyway and
the MTA can continue to live in the old 7bit world and noone would
notice or care.

A punycode encoder is required if the MTA handle non-ASCII data in
decoded, normal, format. Like in the user interface for /etc/aliases,
/etc/mail/virtusertable etc. If it doesn't handle non-ASCII in normal
format, it might as well not support non-ASCII at all since the user
would never notice the different. In theory, I agree that a
(probably) compliant MTA could be developed that didn't include a
punycode encoder, but it would be limited.

>> What would the actual
>>deployment advantage of using IMAA-ACE over an ESMTP extension with an
>>ACE fallback be?
>
> That there would be no required change to the deployed base of SMTP
> servers out there, some of which are in hardware and cannot be
> upgraded. Internet mail is already deployed; forcing a change when it
> isn't needed is just plain bad design. Localizing the protocol change
> to one place makes it easier to deploy and makes it more predicable
> for end users.
>
> Do I need to go on?

Yes, please. Why would an ESMTP extension with an ACE fallback (e.g.,
IMAX) require any changes to the deployed base of SMTP servers?

>>If you want to deploy as fast as possible and only require changes in
>>transport end points, yes. If you want to upgrade to a clean and
>>easily maintined design in the long-term, no.
>
> In what way is using a new ESMTP extension more "easily maintained"?
> That certainly is not the experience in the SMTP world so far.

It appears easier to implement a MTA that handle non-ASCII data, than
to implement a MTA that handle non-ASCII data AND punycode
decoding/encoding of that data.

> Clean is in the eye of the beholder. You and I like UTF-8, but many
> people don't. Forcing them to use our preferred charset isn't a good
> practice if it can be avoided.

I agree completely. This is one of my problems with IDNA and IMAA, it
forces Unicode on everyone.

Paul Hoffman / IMC

2003-02-24 23:24:57 UTC

At 8:53 PM +0100 2/24/03, Simon Josefsson wrote:
>I'm saying that when implementing a MTA it is easier if I don't have
>to implement punycode in order to support non-ASCII.

And you don't have to. IMAA-ACE works with no changes to the MTA.
IMAX forces changes, including changing the maximum line lengths for
the MAIL FROM and RCPT TO commands. That's pretty non-trivial.

>How would an ESMTP extension with an ACE fallback (i.e., IMAX) involve
>bouncing or dropping mail?

The second paragraph of section 2.3 sure sounds like it would bounce
things instead of doing an ACE fallback.

>Punycode decoding is not optional if the MTA wants to support
>non-ASCII.

Where in the IMAA document does it say that? I believe you are
completely wrong here.

>A punycode encoder is required if the MTA handle non-ASCII data in
>decoded, normal, format. Like in the user interface for /etc/aliases,
>/etc/mail/virtusertable etc.

Neither of those are controlled by the MTA. This is getting pretty silly.

> If it doesn't handle non-ASCII in normal
>format, it might as well not support non-ASCII at all since the user
>would never notice the different.

You are mixing up the MTA and the MUA.

> In theory, I agree that a
>(probably) compliant MTA could be developed that didn't include a
>punycode encoder, but it would be limited.

You have mixed up compliance with marketability.

> > In what way is using a new ESMTP extension more "easily maintained"?
>> That certainly is not the experience in the SMTP world so far.
>
>It appears easier to implement a MTA that handle non-ASCII data, than
>to implement a MTA that handle non-ASCII data AND punycode
>decoding/encoding of that data.

But you keep talking about the need to handle fallback. Handling two
protocols is not easier than handling one in any universe.

> > Clean is in the eye of the beholder. You and I like UTF-8, but many
> > people don't. Forcing them to use our preferred charset isn't a good
> > practice if it can be avoided.
>
>I agree completely. This is one of my problems with IDNA and IMAA, it
>forces Unicode on everyone.

Unicode is not a charset.

--Paul Hoffman, Director
--Internet Mail Consortium

Edmon Chung

2003-02-25 00:03:11 UTC

Hi Paul,

----- Original Message -----
From: "Paul Hoffman / IMC" <***@imc.org>
> And you don't have to. IMAA-ACE works with no changes to the MTA.
> IMAX forces changes, including changing the maximum line lengths for
> the MAIL FROM and RCPT TO commands. That's pretty non-trivial.

if you think that changing the max line lengths is to big, I will take it
out. Just thought that it would also be a good time to upgrade that part,
especially due to the use of Punycode infact! ;-)
It really isn't a "MUST".

> >How would an ESMTP extension with an ACE fallback (i.e., IMAX) involve
> >bouncing or dropping mail?
>
> The second paragraph of section 2.3 sure sounds like it would bounce
> things instead of doing an ACE fallback.

This describes the situation today! That is, an IDN/IMA-unaware client
tries to send out a to/from a multilingual address. It has nothing to do
with the IMAX architecture. I just hoped that it was clear to people about
this reality and not shy away from it. If you think it is actually more
confusing, I will take away the description.

> >Punycode decoding is not optional if the MTA wants to support
> >non-ASCII.
>
> Where in the IMAA document does it say that? I believe you are
> completely wrong here.

Yes it does say that support for ACE (to be updated to Punycode in -01 or
when the RFC is out, as I said I wrote this 2 years ago...) is mandatory.
ACE and UTF8 is mandatory. Please refer to last paragraph of section 2.2.

Edmon

Paul Hoffman / IMC

2003-02-25 00:33:22 UTC

At 7:03 PM -0500 2/24/03, Edmon Chung wrote:
>if you think that changing the max line lengths is to big, I will take it
>out.

Gratuitous changes to deployed standards are never appreciated.

> > The second paragraph of section 2.3 sure sounds like it would bounce
> > things instead of doing an ACE fallback.
>
>This describes the situation today!

No, it doesn't. The situation today is that sending non-ASCII in SMTP
is forbidden.

> > Where in the IMAA document does it say that? I believe you are
>> completely wrong here.
>
>Yes it does say that support for ACE (to be updated to Punycode in -01 or
>when the RFC is out, as I said I wrote this 2 years ago...) is mandatory.
>ACE and UTF8 is mandatory. Please refer to last paragraph of section 2.2.

I was asking about IMAA, not IMAX. Simon claims that Punycode is
required for IMAA, and I asked where in IMAA it says that.

--Paul Hoffman, Director
--Internet Mail Consortium

Edmon Chung

2003-02-25 01:06:36 UTC

.157]>
Mime-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Return-path: <owner-ietf-***@mail.imc.org>
Received: from mail.proper.com ([208.184.76.45] helo=above.proper.com)
by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
id 18nTdd-0002IC-00
for <gii-ietf-***@gmane.org>; Tue, 25 Feb 2003 02:12:05 +0100
Received: (from ***@localhost)
by above.proper.com (8.11.6/8.11.3) id h1P16sn09059
for ietf-imaa-bks; Mon, 24 Feb 2003 17:06:54 -0800 (PST)
Received: from neteka.com (www.namesbeyond.com [216.220.34.103])
by above.proper.com (8.11.6/8.11.3) with SMTP id h1P16rd09055
for <ietf-***@imc.org>; Mon, 24 Feb 2003 17:06:53 -0800 (PST)
To: <ietf-***@imc.org>
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-imaa/mail-archive/>
List-Unsubscribe: <mailto:ietf-imaa-***@imc.org?body=unsubscribe>
List-ID: <ietf-imaa.imc.org>

----- Original Message -----
From: "Paul Hoffman / IMC" <***@imc.org>
> Gratuitous changes to deployed standards are never appreciated.

ok.

> > > The second paragraph of section 2.3 sure sounds like it would bounce
> > > things instead of doing an ACE fallback.
> >
> >This describes the situation today!
>
> No, it doesn't. The situation today is that sending non-ASCII in SMTP
> is forbidden.

I think that the SMTP states that a server "MAY" / "SHOULD" reject instead
of "MUST". It also specifies that it the local part MUST be interpreted by
the host.

Anyway, I understand where you are coming from, so I would stop the
sentence as:

If no charset is specified, the server SHOULD assume that the client is not
IMAX compliant.

Would this be ok?

Edmon

Simon Josefsson

2003-02-25 12:50:24 UTC

Paul Hoffman / IMC <***@imc.org> writes:

> At 8:53 PM +0100 2/24/03, Simon Josefsson wrote:
>>I'm saying that when implementing a MTA it is easier if I don't have
>>to implement punycode in order to support non-ASCII.
>
> And you don't have to. IMAA-ACE works with no changes to the MTA. IMAX
> forces changes, including changing the maximum line lengths for the
> MAIL FROM and RCPT TO commands. That's pretty non-trivial.

Perhaps IMAX can be modified so it doesn't require those changes?

>>How would an ESMTP extension with an ACE fallback (i.e., IMAX) involve
>>bouncing or dropping mail?
>
> The second paragraph of section 2.3 sure sounds like it would bounce
> things instead of doing an ACE fallback.

I don't get that impression. It sounds to me that unless IMAX is
used, the interpretation and handling of the mail addresses is out of
scope of IMAX. Perhaps it would be good to clarify that section so
whatever the intention was, it is made specific?

>>Punycode decoding is not optional if the MTA wants to support
>>non-ASCII.
>
> Where in the IMAA document does it say that? I believe you are
> completely wrong here.

Are you saying that if I implement a MTA and want to support non-ASCII
mail addresses in the places where MTAs use ASCII mail addresses
today, that MTA need not implement punycode decoding?

If so, how would you translate an incoming punycoded string into
non-ASCII data that is stored in the log file, for instance?

If you are saying that the MTA should put the IMAA encoded mail
address in the log file, I'd say then that MTA doesn't support
non-ASCII. An essential feature of supporting non-ASCII is to make it
possible for the user of the application to actually see the
characters. ASCII encoding them and displaying them to the user
doesn't make the application support non-ASCII in practice. It would
be like claiming to support Unicode in a terminal emulator when it
only displayed Base64 encoding of the UTF-8 encoded Unicode code
points.

>>A punycode encoder is required if the MTA handle non-ASCII data in
>>decoded, normal, format. Like in the user interface for /etc/aliases,
>>/etc/mail/virtusertable etc.
>
> Neither of those are controlled by the MTA. This is getting pretty silly.

That was not a generic example, it was an example for one MTA
implementation: Sendmail. It uses and control those files.

>> If it doesn't handle non-ASCII in normal
>>format, it might as well not support non-ASCII at all since the user
>>would never notice the different.
>
> You are mixing up the MTA and the MUA.

I wasn't clear. I meant the user of the MTA, i.e., the administrator.
Administrators have non-ASCII requirements too.

>> In theory, I agree that a
>>(probably) compliant MTA could be developed that didn't include a
>>punycode encoder, but it would be limited.
>
> You have mixed up compliance with marketability.

Perhaps. I'd like to consider that as being open to what practical
requirements exists before designing a solution.

MTA implementations, nor internationalization solutions for MTAs,
exist in a vacuum. If it is impossible to implement an
internationalized product and being compliant, the specification has a
problem.

>> > In what way is using a new ESMTP extension more "easily maintained"?
>>> That certainly is not the experience in the SMTP world so far.
>>
>>It appears easier to implement a MTA that handle non-ASCII data, than
>>to implement a MTA that handle non-ASCII data AND punycode
>>decoding/encoding of that data.
>
> But you keep talking about the need to handle fallback. Handling two
> protocols is not easier than handling one in any universe.

True. Yes, the fallback is a problem. Hm. Perhaps those interested
in non-ASCII need to require the use of modern software at the
receiver and the sender, then implementations doesn't need to
implement the fall back case.

>> > Clean is in the eye of the beholder. You and I like UTF-8, but many
>> > people don't. Forcing them to use our preferred charset isn't a good
>> > practice if it can be avoided.
>>
>>I agree completely. This is one of my problems with IDNA and IMAA, it
>>forces Unicode on everyone.
>
> Unicode is not a charset.

I'm not sure if you genuinely missed my point due to this
misunderstanding, but assuming you did, let me correct myself: replace
"Unicode" with "Any charset encoding format of Unicode". I'm sorry
that I cannot express this in any clearer way, perhaps someone who
manages to comprehend what I mean can formulate this in a more precise
way so my point gets through.

I note that IMAA talks about representing characters using the Unicode
"character set". I use charset as a short-hand for character set, but
apparently that must be wrong if IMAA and what you say is consistent.
What distinct definitions of "character set" and "charset" do you use?

Edmon Chung

2003-02-25 15:42:10 UTC

----- Original Message -----
From: "Simon Josefsson" <***@extundo.com>
> Paul Hoffman / IMC <***@imc.org> writes:
>
> > At 8:53 PM +0100 2/24/03, Simon Josefsson wrote:
> >>I'm saying that when implementing a MTA it is easier if I don't have
> >>to implement punycode in order to support non-ASCII.
> >
> > And you don't have to. IMAA-ACE works with no changes to the MTA. IMAX
> > forces changes, including changing the maximum line lengths for the
> > MAIL FROM and RCPT TO commands. That's pretty non-trivial.
>
> Perhaps IMAX can be modified so it doesn't require those changes?

Absolutely. I just thought it would be good to lengthen the fields...
actually for punycode... It is not critical to IMAX, and I will take it out.

> >>How would an ESMTP extension with an ACE fallback (i.e., IMAX) involve
> >>bouncing or dropping mail?
> >
> > The second paragraph of section 2.3 sure sounds like it would bounce
> > things instead of doing an ACE fallback.
>
> I don't get that impression. It sounds to me that unless IMAX is
> used, the interpretation and handling of the mail addresses is out of
> scope of IMAX. Perhaps it would be good to clarify that section so
> whatever the intention was, it is made specific?

I have changed 2.3 to reflect it. I will submit an updated draft based on
our discussions so far.

Edmon

Paul Hoffman / IMC

2003-02-25 16:37:48 UTC

At 1:50 PM +0100 2/25/03, Simon Josefsson wrote:
> > Where in the IMAA document does it say that? I believe you are
>> completely wrong here.
>
>Are you saying that if I implement a MTA and want to support non-ASCII
>mail addresses in the places where MTAs use ASCII mail addresses
>today, that MTA need not implement punycode decoding?

You are (again) confusing the protocol with the implementation. The
protocol does not require these things; the implementation might.

>If so, how would you translate an incoming punycoded string into
>non-ASCII data that is stored in the log file, for instance?

MTA implementations that want to write into log files already need
Punycode decoding for the host names. Your complaint here is invalid.

>If you are saying that the MTA should put the IMAA encoded mail
>address in the log file, I'd say then that MTA doesn't support
>non-ASCII.

You are free to say that. Others would disagree. In the case of IMAX,
what would you want in your log file. All UTF-8? That means you need
converters from every accepted charset to UTF-8. Careful sysadmins
would probably want to know *exactly* what came in, not some
converted form, but that means that their log file would have
multiple charsets in it, which would make display a mess. A
reasonable option is to store the addresses as ACE and to have a
log-file viewer that converts on display (and has an option for not
converting).

Again, this is an implementation issue, not a protocol issue.

> An essential feature of supporting non-ASCII is to make it
>possible for the user of the application to actually see the
>characters. ASCII encoding them and displaying them to the user
>doesn't make the application support non-ASCII in practice. It would
>be like claiming to support Unicode in a terminal emulator when it
>only displayed Base64 encoding of the UTF-8 encoded Unicode code
>points.

IMAA describes in detail when and how to display the Unicode form to
the user; IMAX mostly glosses over this.

> >>A punycode encoder is required if the MTA handle non-ASCII data in
>>>decoded, normal, format. Like in the user interface for /etc/aliases,
>>>/etc/mail/virtusertable etc.
>>
>> Neither of those are controlled by the MTA. This is getting pretty silly.
>
>That was not a generic example, it was an example for one MTA
>implementation: Sendmail. It uses and control those files.

And, again, you are mixing up protocols with implementations.

> >> If it doesn't handle non-ASCII in normal
>>>format, it might as well not support non-ASCII at all since the user
>>>would never notice the different.
>>
>> You are mixing up the MTA and the MUA.
>
>I wasn't clear. I meant the user of the MTA, i.e., the administrator.
>Administrators have non-ASCII requirements too.

Correct, and IMAA describes when and how to convert for display.

>MTA implementations, nor internationalization solutions for MTAs,
>exist in a vacuum. If it is impossible to implement an
>internationalized product and being compliant, the specification has a
>problem.

Of course. Nothing in IMAA makes it "impossible to implement an
internationalized product".

> > But you keep talking about the need to handle fallback. Handling two
>> protocols is not easier than handling one in any universe.
>
>True. Yes, the fallback is a problem. Hm. Perhaps those interested
>in non-ASCII need to require the use of modern software at the
>receiver and the sender, then implementations doesn't need to
>implement the fall back case.

That's not what the IMAX document says. If you want to propose a
ESMTP extension with no fallback, either change IMAX or create your
own Internet Draft. In either case, you will have to say explicitly
how this will interact with SMTP servers that do not support the new
protocol, how bounces would be handled, how users would know if they
could send a message, and so on. I think when you write that, if you
do so honestly, you will see that it would be silly to propose such a
solution.

> >> > Clean is in the eye of the beholder. You and I like UTF-8, but many
>>> > people don't. Forcing them to use our preferred charset isn't a good
>>> > practice if it can be avoided.
>>>
> >>I agree completely. This is one of my problems with IDNA and IMAA, it
>>>forces Unicode on everyone.
>>
>> Unicode is not a charset.
>
>I'm not sure if you genuinely missed my point due to this
>misunderstanding, but assuming you did, let me correct myself: replace
>"Unicode" with "Any charset encoding format of Unicode".

I think I hear you saying that you think that the protocols should
allow any repertoire and any encoding of those repertoires. If so, we
certainly disagree. The IETF is not very keen on creating protocols
for which there would be limited and unpredictable interoperability.
Other standards group might not be so picky.

--Paul Hoffman, Director
--Internet Mail Consortium

Simon Josefsson

2003-02-25 21:13:56 UTC

Paul Hoffman / IMC <***@imc.org> writes:

> At 1:50 PM +0100 2/25/03, Simon Josefsson wrote:
>>>> Punycode decoding is not optional if the MTA wants to support
>>>> non-ASCII.
>>> Where in the IMAA document does it say that? I believe you are
>>> completely wrong here.
>>
>>Are you saying that if I implement a MTA and want to support non-ASCII
>>mail addresses in the places where MTAs use ASCII mail addresses
>>today, that MTA need not implement punycode decoding?
>
> You are (again) confusing the protocol with the implementation. The
> protocol does not require these things; the implementation might.

Right, I was talking about the implementation, I tried to make that
clear by saying "the MTA" rather than "the specification". Isn't (one
of) the goal of the IMAA protocol to make it possible for MTA
implementations to support non-ASCII? Then whether the implementation
or the specification is generating the requirement seems like an
academic point. The end result is that punycode decoding is required
in the implementation, which is what I consider the problem. If a
solution that didn't involve encoding techniques such as punycode
could be developed, I think that should be preferred.

>>If so, how would you translate an incoming punycoded string into
>>non-ASCII data that is stored in the log file, for instance?
>
> MTA implementations that want to write into log files already need
> Punycode decoding for the host names. Your complaint here is invalid.

Obviously we are interpreting IMAX differently, or you wouldn't say
that. Now that you write this I would agree that IMAX is unclear on
one thing: does IMAX make the RHS of the email address a (in IDNA
terminology) a IDN-aware domain name slot? I think it should. It
doesn't make sense to negotiate non-ASCII and then simply don't take
advantage of that and use IDNA for the RHS, treating it as a
IDN-unaware domain name slot.

IMAX authors, perhaps add an example (and text to go with it) that
illustrates non-ASCII RHS too.

MAIL FROM:<UTF-8=E4=E8@=E6=E96=E87.com>

if this is what you intend? The alternative would be

MAIL FROM:<UTF-8=E4=***@xn--foo-bar.com>

but then IMHO the whole point of IMAX falls: that you can support
non-ASCII using raw charset encodings instead of application specific
encodings.

I interpreted IMAX as providing a IDN-aware domain name slot for the
RHS too, where you could send non-punycoded data.

>>If you are saying that the MTA should put the IMAA encoded mail
>>address in the log file, I'd say then that MTA doesn't support
>>non-ASCII.
>
> You are free to say that. Others would disagree. In the case of IMAX,
> what would you want in your log file. All UTF-8? That means you need
> converters from every accepted charset to UTF-8. Careful sysadmins
> would probably want to know *exactly* what came in, not some converted
> form, but that means that their log file would have multiple charsets
> in it, which would make display a mess. A reasonable option is to
> store the addresses as ACE and to have a log-file viewer that converts
> on display (and has an option for not converting).
>
> Again, this is an implementation issue, not a protocol issue.

Yes. But it is an important point. A internationalization solution
that doesn't consider these practical issues is of only theoretical
value.

I would want the log file to contain characters that can be read
without special IDNA/IMAA/IMAX aware programs. I.e., if the system
uses UTF-8 as the system encoding, I'd want the log file to be in
UTF-8. If the system uses ISO-8859-1, the log file should be in
ISO-8859-1 (and the application must cope with data that can't be
represented somehow).

Yes, the application must know how to convert alien (but charset
tagged) data into the system charset. But IDNA and IMAA have the same
characteristic: it require the application to convert Unicode (which
is the only charset IDNA/IMAA accept) to the system charset. So I
cannot see where the big difference lies?

I agree careful sysadmins want to see exactly what came in. The only
way to represent that, unless the system uses the same charset as the
data that came in, is to print the charset of the incoming data and
the byte sequence. The same is true today on a ISO-8859-1 system that
receives Unicode via IDNA.

It seems we disagree that it is reasonable to require users to use
special applications to view log files, or edit configuration files,
etc. Personally, I don't use applications that have configuration
files or log files that can't be manipulated using text operations. I
do suppose many Microsoft Windows users would find your approach
acceptable though, since that's what they are accustomed to. IMHO a
solution must be able to accomodate both users.

>> An essential feature of supporting non-ASCII is to make it
>>possible for the user of the application to actually see the
>>characters. ASCII encoding them and displaying them to the user
>>doesn't make the application support non-ASCII in practice. It would
>>be like claiming to support Unicode in a terminal emulator when it
>>only displayed Base64 encoding of the UTF-8 encoded Unicode code
>>points.
>
> IMAA describes in detail when and how to display the Unicode form to
> the user; IMAX mostly glosses over this.

Yes, IMAX is not a final document so this isn't surprising. Although
for IMAX, those issues are simpler since IMAX allows implementations
to use charsets that the system already support natively.

>> >>A punycode encoder is required if the MTA handle non-ASCII data in
>>>>decoded, normal, format. Like in the user interface for /etc/aliases,
>>>>/etc/mail/virtusertable etc.
>>>
>>> Neither of those are controlled by the MTA. This is getting pretty silly.
>>
>>That was not a generic example, it was an example for one MTA
>>implementation: Sendmail. It uses and control those files.
>
> And, again, you are mixing up protocols with implementations.

I'm sorry, I'll try to make it more clear when I talk about the
implementation or the specification. If you are saying that we should
simply ignore all implementation related aspects in a proposed
solution, then I guess I simply don't agree with that. I'll continue
to relate a proposal to the real world.

>> >> If it doesn't handle non-ASCII in normal
>>>>format, it might as well not support non-ASCII at all since the user
>>>>would never notice the different.
>>>
>>> You are mixing up the MTA and the MUA.
>>
>>I wasn't clear. I meant the user of the MTA, i.e., the administrator.
>>Administrators have non-ASCII requirements too.
>
> Correct, and IMAA describes when and how to convert for display.

Right. This is what cause the dependence on punycode decoding. Since
administrators not only view non-ASCII but input non-ASCII too,
punycode encoding is required too.

>>MTA implementations, nor internationalization solutions for MTAs,
>>exist in a vacuum. If it is impossible to implement an
>>internationalized product and being compliant, the specification has a
>>problem.
>
> Of course. Nothing in IMAA makes it "impossible to implement an
> internationalized product".

Cool. Then, perhaps, what we have is two solutions that can implement
an internationalized product. I'm trying to convince myself which of
them is the better approach.

>> > But you keep talking about the need to handle fallback. Handling two
>>> protocols is not easier than handling one in any universe.
>>
>>True. Yes, the fallback is a problem. Hm. Perhaps those interested
>>in non-ASCII need to require the use of modern software at the
>>receiver and the sender, then implementations doesn't need to
>>implement the fall back case.
>
> That's not what the IMAX document says.

Right, I proposed something new.

> If you want to propose a ESMTP extension with no fallback, either
> change IMAX or create your own Internet Draft. In either case, you
> will have to say explicitly how this will interact with SMTP servers
> that do not support the new protocol, how bounces would be handled,
> how users would know if they could send a message, and so on. I
> think when you write that, if you do so honestly, you will see that
> it would be silly to propose such a solution.

Discarding it as silly seems a bit premature to me. Having such a
proposal, that discusses all the consequences you mention seems like a
valuable contribution to this discussion. But I guess it is easier to
advocate one solution if the competition are discarded early on...

>> >> > Clean is in the eye of the beholder. You and I like UTF-8, but many
>>>> > people don't. Forcing them to use our preferred charset isn't a good
>>>> > practice if it can be avoided.
>>>>
>> >>I agree completely. This is one of my problems with IDNA and IMAA, it
>>>>forces Unicode on everyone.
>>>
>>> Unicode is not a charset.
>>
>>I'm not sure if you genuinely missed my point due to this
>>misunderstanding, but assuming you did, let me correct myself: replace
>>"Unicode" with "Any charset encoding format of Unicode".
>
> I think I hear you saying that you think that the protocols should
> allow any repertoire and any encoding of those repertoires. If so, we
> certainly disagree. The IETF is not very keen on creating protocols
> for which there would be limited and unpredictable
> interoperability. Other standards group might not be so picky.

That is stretching it a bit, I think. I believe that a solution worth
its salt should consider existing habits, and whether we like it or
not there is more than charset used on the Internet. MIME appears to
acknowledge this and is rather successful. HTML acknowledge this and
is rather successful. Same for HTTP. Come to think of it, I can't
recall any successful internationalization product the IETF has
produced to counter my examples, can you help me?

If you are speaking for IETF, I find it interesting that RFC 2277
"IETF Policy on Character Sets and Languages" says that protocols MAY
allow use of any repertoire. It doesn't say that it is a bad idea to
allow more than one charset. I agree with that document, let's
require the use of UTF-8 in protocols, but allow negotiation of other
charsets to smooth transition and deployment.

Paul Hoffman / IMC

2003-02-25 22:56:03 UTC

At 10:13 PM +0100 2/25/03, Simon Josefsson wrote:
>Isn't (one
>of) the goal of the IMAA protocol to make it possible for MTA
>implementations to support non-ASCII?

Asking ludicrous questions is not a good form in technical
discussions. Of course that is a goal. And IMAA does that already.

>The end result is that punycode decoding is required
>in the implementation, which is what I consider the problem. If a
>solution that didn't involve encoding techniques such as punycode
>could be developed, I think that should be preferred.

Then you are not talking about IMAX. If you have some other protocol
in mind that doesn't require punycode but still guarantees mail
delivery, please write an Internet Draft for it. (If you are thinking
of a protocol that doesn't require punycode but would instead simply
bounce or lose mail that was sent to MTAs that didn't understand the
new protocol, please don't bother writing an Internet draft...)

>Now that you write this I would agree that IMAX is unclear on
>one thing:

One of many things....

> > You are free to say that. Others would disagree. In the case of IMAX,
>> what would you want in your log file. All UTF-8? That means you need
>> converters from every accepted charset to UTF-8. Careful sysadmins
>> would probably want to know *exactly* what came in, not some converted
>> form, but that means that their log file would have multiple charsets
>> in it, which would make display a mess. A reasonable option is to
>> store the addresses as ACE and to have a log-file viewer that converts
>> on display (and has an option for not converting).
>>
>> Again, this is an implementation issue, not a protocol issue.
>
>Yes. But it is an important point. A internationalization solution
>that doesn't consider these practical issues is of only theoretical
>value.

So the current SMTP, POP, IMAP, and HTTP protocols is only of
theoretical value. Oh, well.

>I would want the log file to contain...

Fine. Ask your vendor to include that feature. This is not part of a
protocol specification.

>It seems we disagree that it is reasonable to require users to use
>special applications to view log files, or edit configuration files,
>etc.

No. We disagree as to whether this is part of the protocol. Few (if
any) IETF protocols cover this.

> > IMAA describes in detail when and how to display the Unicode form to
>> the user; IMAX mostly glosses over this.
>
>Yes, IMAX is not a final document so this isn't surprising.

Neither is IMAA. At least the IMAA authors admit where the open issues are.

>I'm sorry, I'll try to make it more clear when I talk about the
>implementation or the specification. If you are saying that we should
>simply ignore all implementation related aspects in a proposed
>solution, then I guess I simply don't agree with that. I'll continue
>to relate a proposal to the real world.

This is a discussion of a potential IETF protocol. Please hold your
discussion to things that could be included in an IETF protocol. If
you don't like the way the IETF makes protocols, there are other
standards organizations in which you might want to be active instead
of the IETF. Or, you can take your concerns about the way we create
protocols to the main IETF mailing list and see if there is enough
support for your views to change the way the IETF works.

> > Correct, and IMAA describes when and how to convert for display.
>
>Right. This is what cause the dependence on punycode decoding. Since
>administrators not only view non-ASCII but input non-ASCII too,
>punycode encoding is required too.

Wrong, yet again. There is nothing in the IMAA document about how the
administrator views documents. IMAA is about mail transport, not
system administration.

> > If you want to propose a ESMTP extension with no fallback, either
>> change IMAX or create your own Internet Draft. In either case, you
>> will have to say explicitly how this will interact with SMTP servers
>> that do not support the new protocol, how bounces would be handled,
> > how users would know if they could send a message, and so on. I
>> think when you write that, if you do so honestly, you will see that
>> it would be silly to propose such a solution.
>
>Discarding it as silly seems a bit premature to me. Having such a
>proposal, that discusses all the consequences you mention seems like a
>valuable contribution to this discussion. But I guess it is easier to
>advocate one solution if the competition are discarded early on...

We disagree here. This discussion has been happening for over 10
years with respect to ESMTP extensions. I don't consider that to be
"early on".

> > I think I hear you saying that you think that the protocols should
>> allow any repertoire and any encoding of those repertoires. If so, we
>> certainly disagree. The IETF is not very keen on creating protocols
>> for which there would be limited and unpredictable
>> interoperability. Other standards group might not be so picky.
>
>That is stretching it a bit, I think. I believe that a solution worth
>its salt should consider existing habits, and whether we like it or
>not there is more than charset used on the Internet. MIME appears to
>acknowledge this and is rather successful. HTML acknowledge this and
>is rather successful. Same for HTTP. Come to think of it, I can't
>recall any successful internationalization product the IETF has
>produced to counter my examples, can you help me?

You just helped yourself.

-MIME headers (RFC 2047) often display unreadable gibberish for
charsets that the recipient can't decode, even when using
quoted-printable (commonly called "quoted-unreadble" by people in the
mail world).

-HTML shows unintelligible gibberish if the charset used and stated
in the document cannot be displayed by the user. People see this
every day.

-HTTP fails compeletly if the client lists charsets that it can read
and none of those are charsets that the server can write. This is
uncommon because this feature is rarely used because of the high
failure rate.

The third case is analogous to what you are proposing with "fail to
deliver if the charset is not supported".

>If you are speaking for IETF,

I'm not, and you should assume that anyone other than Harald
Alvestrand or Leslie Daigle who speaks for the IETF is bluffing or
lying.

> I find it interesting that RFC 2277
>"IETF Policy on Character Sets and Languages" says that protocols MAY
>allow use of any repertoire.

Really? It says that? Could you quote the sentence for us here? I
couldn't find the word "repertoire" anywhere in RFC 2277.

> It doesn't say that it is a bad idea to
>allow more than one charset. I agree with that document, let's
>require the use of UTF-8 in protocols, but allow negotiation of other
>charsets to smooth transition and deployment.

You should take this up with Harald Alvestrand, the author of RFC
2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
the IESG) approved it to be on standards track.

--Paul Hoffman, Director
--Internet Mail Consortium

Mark Davis

2003-02-25 23:36:10 UTC

> > It doesn't say that it is a bad idea to
> >allow more than one charset. I agree with that document, let's
> >require the use of UTF-8 in protocols, but allow negotiation of other
> >charsets to smooth transition and deployment.

> You should take this up with Harald Alvestrand, the author of RFC
> 2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
> the IESG) approved it to be on standards track.

I want to point out a very important feature here. While IDN does not use
UTF-8, the contents are algorithmically mappable to UTF-8. That is *very*
different from allowing arbitrary charsets.

There is a huge problem with using arbitrary charsets; they don't
interoperate well. They may not be supported on the recipient platform, or
if supported, even the 'same' charset (such as SJIS) is interpreted in
different ways on different platforms. If the on-the-wire protocol is UTF-8
(or algorithmically mappable to UTF-8) then senders and recipients only need
to deal with one charset.

Mark

Edmon Chung

2003-02-26 02:20:19 UTC

Hi Mark,

I think you are right. That is why in the IMAX description, UTF8 is
mandated. The thinking is similar to XML among other things. And in order
to not reinvent the wheel, a fall back to punycode is suggested. What are
your thoughts overall on the doc?

BTW, I have updated the draft to -01 and changed a number of stuff. Most
notably taking out section 3 as suggested by everyone... including myself
:-)

You can find it at: http://www.dnsii.org/draft-ietf-chung-imax-01.txt

(Paul, I havent changed the optional parameter word "CHARSET" yet, but I
think you are right and I will do so in the next version)

(James, I have sent it to the IETF, but I dont know when they will get it
posted... just in case you ask.)

Edmon

----- Original Message -----
From: "Mark Davis" <***@jtcsv.com>
To: <ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
Sent: Tuesday, February 25, 2003 6:36 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> > > It doesn't say that it is a bad idea to
> > >allow more than one charset. I agree with that document, let's
> > >require the use of UTF-8 in protocols, but allow negotiation of other
> > >charsets to smooth transition and deployment.
>
> > You should take this up with Harald Alvestrand, the author of RFC
> > 2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
> > the IESG) approved it to be on standards track.
>
> I want to point out a very important feature here. While IDN does not use
> UTF-8, the contents are algorithmically mappable to UTF-8. That is *very*
> different from allowing arbitrary charsets.
>
> There is a huge problem with using arbitrary charsets; they don't
> interoperate well. They may not be supported on the recipient platform, or
> if supported, even the 'same' charset (such as SJIS) is interpreted in
> different ways on different platforms. If the on-the-wire protocol is
UTF-8
> (or algorithmically mappable to UTF-8) then senders and recipients only
need
> to deal with one charset.
>
> Mark
>
>

Mark Davis

2003-02-26 03:08:29 UTC

> What are your thoughts overall on the doc?

Sadly, I am up to my ears in Unicode 4.0 work right now, and am only able to
keep half an ear open to this mailing list. I should have more time in a
couple of weeks.

Mark
________
***@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Edmon Chung" <***@neteka.com>
To: "Mark Davis" <***@jtcsv.com>; <ietf-***@imc.org>; "Paul Hoffman
/ IMC" <***@imc.org>
Sent: Tuesday, February 25, 2003 18:20
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> Hi Mark,
>
> I think you are right. That is why in the IMAX description, UTF8 is
> mandated. The thinking is similar to XML among other things. And in
order
> to not reinvent the wheel, a fall back to punycode is suggested. What are
> your thoughts overall on the doc?
>
> BTW, I have updated the draft to -01 and changed a number of stuff. Most
> notably taking out section 3 as suggested by everyone... including myself
> :-)
>
> You can find it at: http://www.dnsii.org/draft-ietf-chung-imax-01.txt
>
> (Paul, I havent changed the optional parameter word "CHARSET" yet, but I
> think you are right and I will do so in the next version)
>
> (James, I have sent it to the IETF, but I dont know when they will get it
> posted... just in case you ask.)
>
> Edmon
>
>
>
> ----- Original Message -----
> From: "Mark Davis" <***@jtcsv.com>
> To: <ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
> Sent: Tuesday, February 25, 2003 6:36 PM
> Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)
>
>
> >
> > > > It doesn't say that it is a bad idea to
> > > >allow more than one charset. I agree with that document, let's
> > > >require the use of UTF-8 in protocols, but allow negotiation of other
> > > >charsets to smooth transition and deployment.
> >
> > > You should take this up with Harald Alvestrand, the author of RFC
> > > 2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
> > > the IESG) approved it to be on standards track.
> >
> > I want to point out a very important feature here. While IDN does not
use
> > UTF-8, the contents are algorithmically mappable to UTF-8. That is
*very*
> > different from allowing arbitrary charsets.
> >
> > There is a huge problem with using arbitrary charsets; they don't
> > interoperate well. They may not be supported on the recipient platform,
or
> > if supported, even the 'same' charset (such as SJIS) is interpreted in
> > different ways on different platforms. If the on-the-wire protocol is
> UTF-8
> > (or algorithmically mappable to UTF-8) then senders and recipients only
> need
> > to deal with one charset.
> >
> > Mark
> >
> >
>
>

Martin Duerst

2003-02-26 17:59:47 UTC

At 21:20 03/02/25 -0500, Edmon Chung wrote:

>Hi Mark,
>
>I think you are right. That is why in the IMAX description, UTF8 is
>mandated. The thinking is similar to XML among other things.

There is a huge difference between headers (where having
a single encoding is very important, because the tagging
overhead is high, there are conversion problems,...) and
bodies. Parallels to XML work for other body formats, but
are really not adequate for headers.

Also, XML is already 5 years old. Something new should look
ahead, and not try to carry too much unnecessary old stuff.

Regards, Martin.

Edmon Chung

2003-02-26 23:38:43 UTC

So you think its better to mandate some other encoding rather than UTF8?
Edmon

----- Original Message -----
From: "Martin Duerst" <***@w3.org>
To: "Edmon Chung" <***@neteka.com>; "Mark Davis" <***@jtcsv.com>;
<ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
Sent: Wednesday, February 26, 2003 12:59 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> At 21:20 03/02/25 -0500, Edmon Chung wrote:
>
> >Hi Mark,
> >
> >I think you are right. That is why in the IMAX description, UTF8 is
> >mandated. The thinking is similar to XML among other things.
>
> There is a huge difference between headers (where having
> a single encoding is very important, because the tagging
> overhead is high, there are conversion problems,...) and
> bodies. Parallels to XML work for other body formats, but
> are really not adequate for headers.
>
> Also, XML is already 5 years old. Something new should look
> ahead, and not try to carry too much unnecessary old stuff.
>
>
> Regards, Martin.
>

Martin Duerst

2003-02-27 15:59:45 UTC

At 18:38 03/02/26 -0500, Edmon Chung wrote:
>So you think its better to mandate some other encoding rather than UTF8?
>Edmon

No, what I mean is that nothing else than UTF-8 should be used,
except for something like punycode for downgrading. So nothing
like Big5, no negotiation on charsets, and so on.

Regards, Martin.

>----- Original Message -----
>From: "Martin Duerst" <***@w3.org>
>To: "Edmon Chung" <***@neteka.com>; "Mark Davis" <***@jtcsv.com>;
><ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
>Sent: Wednesday, February 26, 2003 12:59 PM
>Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)
>
>
> >
> > At 21:20 03/02/25 -0500, Edmon Chung wrote:
> >
> > >Hi Mark,
> > >
> > >I think you are right. That is why in the IMAX description, UTF8 is
> > >mandated. The thinking is similar to XML among other things.
> >
> > There is a huge difference between headers (where having
> > a single encoding is very important, because the tagging
> > overhead is high, there are conversion problems,...) and
> > bodies. Parallels to XML work for other body formats, but
> > are really not adequate for headers.
> >
> > Also, XML is already 5 years old. Something new should look
> > ahead, and not try to carry too much unnecessary old stuff.
> >
> >
> > Regards, Martin.
> >

Edmon Chung

2003-02-27 16:58:20 UTC

I see. The reason support for other charset seems to make sense is that I
can see that a lot of times IMA would be used within a same region, say
china MTA to china MTA, which means that by simply using GB would make it
easier in most cases. It is therefore going to be true that in most cases
using the local encoding will be a much more efficient transport than UTF8.
Because the the IMAX capable MTA should really annouce which encoding it
supports, in real transaction, there wouldn't really be negotiation.
Perhaps I should change it so that if there is no annoucement of additional
charset support then the MUA MUST use UTF8 to start with and avoid having to
negotiate further. Would that be better?
Edmon

----- Original Message -----
From: "Martin Duerst" <***@w3.org>
To: "Edmon Chung" <***@neteka.com>; "Mark Davis" <***@jtcsv.com>;
<ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
Sent: Thursday, February 27, 2003 10:59 AM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

> At 18:38 03/02/26 -0500, Edmon Chung wrote:
> >So you think its better to mandate some other encoding rather than UTF8?
> >Edmon
>
> No, what I mean is that nothing else than UTF-8 should be used,
> except for something like punycode for downgrading. So nothing
> like Big5, no negotiation on charsets, and so on.
>
> Regards, Martin.
>
>
>
>
>
> >----- Original Message -----
> >From: "Martin Duerst" <***@w3.org>
> >To: "Edmon Chung" <***@neteka.com>; "Mark Davis"
<***@jtcsv.com>;
> ><ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
> >Sent: Wednesday, February 26, 2003 12:59 PM
> >Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)
> >
> >
> > >
> > > At 21:20 03/02/25 -0500, Edmon Chung wrote:
> > >
> > > >Hi Mark,
> > > >
> > > >I think you are right. That is why in the IMAX description, UTF8 is
> > > >mandated. The thinking is similar to XML among other things.
> > >
> > > There is a huge difference between headers (where having
> > > a single encoding is very important, because the tagging
> > > overhead is high, there are conversion problems,...) and
> > > bodies. Parallels to XML work for other body formats, but
> > > are really not adequate for headers.
> > >
> > > Also, XML is already 5 years old. Something new should look
> > > ahead, and not try to carry too much unnecessary old stuff.
> > >
> > >
> > > Regards, Martin.
> > >
>
>

Edmon Chung

2003-02-27 19:42:56 UTC

I see. But why would someone convert to charset A to charset B? Except for
from (or to) a particular local encoding to UTF8. Since charsets registered
should be able to convert between ISO10646, there shouldnt be any problem.
Edmon

----- Original Message -----
From: "Paul Hoffman / IMC" <***@imc.org>
To: "Edmon Chung" <***@neteka.com>
Sent: Thursday, February 27, 2003 1:42 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

> At 1:30 PM -0500 2/27/03, Edmon Chung wrote:
> >I am not sure what you mean.
> >The document did refer to using the charsets registered at IANA for MIME.
> >Are the charsets registered there not a good set to work with?
>
> If you expect someone to convert from charset A to charset B
> reliably, you have to standardize the mapping table. There are lots
> of examples of mapping tables in use today that differ. Without that,
> someone sending with one charset and expecting the mail to be put in
> exactly the mailbox they mean might be in for a bad surprise.
>
> This was discussed on the IDN mailing list.
>
> --Paul Hoffman, Director
> --Internet Mail Consortium
>

Mark Davis

2003-02-27 22:42:41 UTC

Just because a charset is registered does not mean it is always interpreted
the same way: in fact, quite the contrary. See
http://oss.software.ibm.com/icu/charset/index.html for examples.

Mark
________
***@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Edmon Chung" <***@neteka.com>
To: <ietf-***@imc.org>
Sent: Thursday, February 27, 2003 11:42
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> I see. But why would someone convert to charset A to charset B? Except
for
> from (or to) a particular local encoding to UTF8. Since charsets
registered
> should be able to convert between ISO10646, there shouldnt be any problem.
> Edmon
>
>
>
> ----- Original Message -----
> From: "Paul Hoffman / IMC" <***@imc.org>
> To: "Edmon Chung" <***@neteka.com>
> Sent: Thursday, February 27, 2003 1:42 PM
> Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)
>
>
> > At 1:30 PM -0500 2/27/03, Edmon Chung wrote:
> > >I am not sure what you mean.
> > >The document did refer to using the charsets registered at IANA for
MIME.
> > >Are the charsets registered there not a good set to work with?
> >
> > If you expect someone to convert from charset A to charset B
> > reliably, you have to standardize the mapping table. There are lots
> > of examples of mapping tables in use today that differ. Without that,
> > someone sending with one charset and expecting the mail to be put in
> > exactly the mailbox they mean might be in for a bad surprise.
> >
> > This was discussed on the IDN mailing list.
> >
> > --Paul Hoffman, Director
> > --Internet Mail Consortium
> >
>
>

Edmon Chung

2003-02-27 19:45:58 UTC

I dont agree with the forward looking part.
The big reason as far as I believe that there should be a charset parameter
is because of forward looking. Today we move from English only to
multilingual... if the design in the first place allowed the charset
parameter, we wouldnt have the problem at all... realizing that, adding the
feature would allow better transition to whatever we might face in terms of
charset in the future. Your thoughts on this?
Edmon

----- Original Message -----
From: "Martin Duerst" <***@w3.org>
To: "Paul Hoffman / IMC" <***@imc.org>; <ietf-***@imc.org>
Sent: Thursday, February 27, 2003 1:37 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> At 09:54 03/02/27 -0800, Paul Hoffman / IMC wrote:
>
> >At 11:58 AM -0500 2/27/03, Edmon Chung wrote:
> >>I see. The reason support for other charset seems to make sense is that
I
> >>can see that a lot of times IMA would be used within a same region, say
> >>china MTA to china MTA, which means that by simply using GB would make
it
> >>easier in most cases.
> >
> >And much more difficult and prone to errors in the others unless you
> >standardize all of the charset mappings. (Which I assume you aren't
> >proposing to do....)
>
> I agree with Paul that having a large variety of charsets is a big
problem.
> But I don't think the problem of mappings is the main aspect of it.
> The mapping differences usually apply for characters that are not that
> much used in the specific encoding.
> The main problem with using legacy encodings is that it's not at all
> forward-looking. It's introducing more legacy instead of removing
> legacy.
>
> Regards, Martin.
>
>

Claus Färber

2003-02-28 00:00:00 UTC

Edmon Chung <***@neteka.com> schrieb/wrote:

> I dont agree with the forward looking part.
> The big reason as far as I believe that there should be a charset parameter
> is because of forward looking. Today we move from English only to
> multilingual... if the design in the first place allowed the charset
> parameter, we wouldnt have the problem at all...

No, we would have a lot of problems now.

For example, someone wants to send me a mail to cfä***@muc.de, but his
mailer uses ISO-8859-15, which the MDA of my ISP does not know (although
it knows ISO-8859-1 and UTF-8). The message bounces.

Claus
--
http://www.faerber.muc.de/

Edmon Chung

2003-03-01 00:09:52 UTC

Why would the message bounce?
His mailer would use UTF8.
Edmon

----- Original Message -----
From: "Claus Färber" <list-ietf-i18n-***@faerber.muc.de>
To: <ietf-***@imc.org>
Sent: Thursday, February 27, 2003 7:00 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

Edmon Chung <***@neteka.com> schrieb/wrote:

> I dont agree with the forward looking part.
> The big reason as far as I believe that there should be a charset
parameter
> is because of forward looking. Today we move from English only to
> multilingual... if the design in the first place allowed the charset
> parameter, we wouldnt have the problem at all...

No, we would have a lot of problems now.

For example, someone wants to send me a mail to cfä***@muc.de, but his
mailer uses ISO-8859-15, which the MDA of my ISP does not know (although
it knows ISO-8859-1 and UTF-8). The message bounces.

Claus
--
http://www.faerber.muc.de/

Martin Duerst

2003-02-27 19:47:53 UTC

Hello Edmon,

At 11:58 03/02/27 -0500, Edmon Chung wrote:
>I see. The reason support for other charset seems to make sense is that I
>can see that a lot of times IMA would be used within a same region, say
>china MTA to china MTA, which means that by simply using GB would make it
>easier in most cases. It is therefore going to be true that in most cases
>using the local encoding will be a much more efficient transport than UTF8.

'easier' and 'efficient' are not very clear to me here. If we want IMAX
to be really interoperable, we have to have at least one encoding that
every IMAX-aware piece of software supports. The obvious choice for this
is UTF-8. [Even if punycode is the only encoding that we require every
IMAX-aware piece of software to support (for backwards compatibility),
the software has to support Unicode.]

So just doing things in UTF-8 looks easier to me than also using
all these legacy encodings.

Of course some people will say that just doing punycode only is even easier.
The point is that with UTF-8, we are investing in our future, and
creating a clear upgrade path, whereas bringing in legacy encodings
is more like 'sidegrading'.

>Because the the IMAX capable MTA should really annouce which encoding it
>supports, in real transaction, there wouldn't really be negotiation.
>Perhaps I should change it so that if there is no annoucement of additional
>charset support then the MUA MUST use UTF8 to start with and avoid having to
>negotiate further. Would that be better?

That would be a step in the right direction. But I suggest to directly
go to a model where UTF-8 and punycode (or whatever fallback we choose)
are the only choices.

Regards, Martin.

Edmon Chung

2003-02-27 19:59:22 UTC

I see. You probably havent seen my latest reply on the "Forward looking"
part yet.
However, I know where you are coming from. Perhaps we should introduce the
"charset" parameter but limit it to only UTF8 for now therefore we will have
the flexibility to add in the future... it will also distinguish between
regular mailfrom and rcptto commands.
Your thoughts?
Edmon

----- Original Message -----
From: "Martin Duerst" <***@w3.org>
To: "Edmon Chung" <***@neteka.com>; "Mark Davis" <***@jtcsv.com>;
<ietf-***@imc.org>; "Paul Hoffman / IMC" <***@imc.org>
Sent: Thursday, February 27, 2003 2:47 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> Hello Edmon,
>
> At 11:58 03/02/27 -0500, Edmon Chung wrote:
> >I see. The reason support for other charset seems to make sense is that
I
> >can see that a lot of times IMA would be used within a same region, say
> >china MTA to china MTA, which means that by simply using GB would make it
> >easier in most cases. It is therefore going to be true that in most
cases
> >using the local encoding will be a much more efficient transport than
UTF8.
>
> 'easier' and 'efficient' are not very clear to me here. If we want IMAX
> to be really interoperable, we have to have at least one encoding that
> every IMAX-aware piece of software supports. The obvious choice for this
> is UTF-8. [Even if punycode is the only encoding that we require every
> IMAX-aware piece of software to support (for backwards compatibility),
> the software has to support Unicode.]
>
> So just doing things in UTF-8 looks easier to me than also using
> all these legacy encodings.
>
> Of course some people will say that just doing punycode only is even
easier.
> The point is that with UTF-8, we are investing in our future, and
> creating a clear upgrade path, whereas bringing in legacy encodings
> is more like 'sidegrading'.
>
>
> >Because the the IMAX capable MTA should really annouce which encoding it
> >supports, in real transaction, there wouldn't really be negotiation.
> >Perhaps I should change it so that if there is no annoucement of
additional
> >charset support then the MUA MUST use UTF8 to start with and avoid having
to
> >negotiate further. Would that be better?
>
> That would be a step in the right direction. But I suggest to directly
> go to a model where UTF-8 and punycode (or whatever fallback we choose)
> are the only choices.
>
>
> Regards, Martin.
>

Simon Josefsson

2003-02-26 11:38:00 UTC

Paul Hoffman / IMC <***@imc.org> writes:

> At 10:13 PM +0100 2/25/03, Simon Josefsson wrote:
>>Isn't (one
>>of) the goal of the IMAA protocol to make it possible for MTA
>>implementations to support non-ASCII?
>
> Asking ludicrous questions is not a good form in technical
> discussions. Of course that is a goal. And IMAA does that already.

My question was sincere. IMAX appears to be a solution for
internationalization of MTAs, at the SMTP layer. It does not propose
solving the internationalization problem for MUAs. SMTP is an
interactive protocol between two end-entities, and can therefor
negotiate non-ASCII support, which is different from RFC (2)822 where
all entities that will handle the stored data is not able to interact
with the creator of that data to negotiate non-ASCII. IMAX takes
advantage of this difference. I believe it would be possible to
design a internationalization solution for RFC (2)822 that would be
distinct from a SMTP internationalization solution. Those two
distinctions could be investigated in parallel and evaluated on their
own merits. If you think this is ludicrous and want this to be a
productive discussion, please take the question seriously and explain
in technical terms why your proposal is better.

>>The end result is that punycode decoding is required
>>in the implementation, which is what I consider the problem. If a
>>solution that didn't involve encoding techniques such as punycode
>>could be developed, I think that should be preferred.
>
> Then you are not talking about IMAX. If you have some other protocol
> in mind that doesn't require punycode but still guarantees mail
> delivery, please write an Internet Draft for it. (If you are thinking
> of a protocol that doesn't require punycode but would instead simply
> bounce or lose mail that was sent to MTAs that didn't understand the
> new protocol, please don't bother writing an Internet draft...)

Why not? That seems to be one serious alternative solution to IMAA.
I can only interprete your dismissal of alternative solutions without
a serious analysis that you either have done this analysis already and
know the answers or that you don't want to see alternative ideas
discussed. In the former case, I think it would be useful to read
your analysis.

>>Now that you write this I would agree that IMAX is unclear on
>>one thing:
>
> One of many things....

Perhaps your experience in this area could be applied at improving the
specification? I'm sure having two serious alternative to look at
would help the discussion.

>> > You are free to say that. Others would disagree. In the case of IMAX,
>>> what would you want in your log file. All UTF-8? That means you need
>>> converters from every accepted charset to UTF-8. Careful sysadmins
>>> would probably want to know *exactly* what came in, not some converted
>>> form, but that means that their log file would have multiple charsets
>>> in it, which would make display a mess. A reasonable option is to
>>> store the addresses as ACE and to have a log-file viewer that converts
>>> on display (and has an option for not converting).
>>>
>>> Again, this is an implementation issue, not a protocol issue.
>>
>>Yes. But it is an important point. A internationalization solution
>>that doesn't consider these practical issues is of only theoretical
>>value.
>
> So the current SMTP, POP, IMAP, and HTTP protocols is only of
> theoretical value. Oh, well.

Those protocols do consider practical issues. A simple proof that
they aren't of theoretical value is that they are used in practice.

>>I would want the log file to contain...
>
> Fine. Ask your vendor to include that feature. This is not part of a
> protocol specification.

I'm the vendor, and I'm here to understand how to implement it. If
the protocol specification doesn't give guidance or have considered
how it will be implemented, I fear it will not work. If you take this
fear seriously and want to prove me wrong, please explain how it can
be done in the real world. This exercise would be useful when
comparing IMAA with IMAX since it would give a complete picture.

>>It seems we disagree that it is reasonable to require users to use
>>special applications to view log files, or edit configuration files,
>>etc.
>
> No. We disagree as to whether this is part of the protocol. Few (if
> any) IETF protocols cover this.

Few (if any) IETF protocols have designs that makes this a problem. A
ESMTP extension with tagged charsets might not make this a problem,
but IMAA does. If the IMAA design makes you propose that a reasonable
approach is to implement special applications for viewing log files or
edit configurations, a valid critique of IMAA would be that
alternative solutions would not generate these problems. Dismissing
this critique because the IMAA protocol doesn't clearly state the
consequences of its design or declare it out of scope isn't
productive.

>> > IMAA describes in detail when and how to display the Unicode form to
>>> the user; IMAX mostly glosses over this.
>>
>>Yes, IMAX is not a final document so this isn't surprising.
>
> Neither is IMAA. At least the IMAA authors admit where the open issues are.

Are you accusing the IMAX authors are holding out on what the open
issues are? I can't tell, but of course as an evaluator of both
specifications I'd appreciate if you could disclose those problems.

>>I'm sorry, I'll try to make it more clear when I talk about the
>>implementation or the specification. If you are saying that we should
>>simply ignore all implementation related aspects in a proposed
>>solution, then I guess I simply don't agree with that. I'll continue
>>to relate a proposal to the real world.
>
> This is a discussion of a potential IETF protocol. Please hold your
> discussion to things that could be included in an IETF protocol. If
> you don't like the way the IETF makes protocols, there are other
> standards organizations in which you might want to be active instead
> of the IETF. Or, you can take your concerns about the way we create
> protocols to the main IETF mailing list and see if there is enough
> support for your views to change the way the IETF works.

I'm sorry, I was under the impression that the IETF worried about how
protocols are implemented in practice too.

>> > Correct, and IMAA describes when and how to convert for display.
>>
>>Right. This is what cause the dependence on punycode decoding. Since
>>administrators not only view non-ASCII but input non-ASCII too,
>>punycode encoding is required too.
>
> Wrong, yet again. There is nothing in the IMAA document about how the
> administrator views documents. IMAA is about mail transport, not
> system administration.

You said that a reasonable approach to implement a non-ASCII solution
based on IMAA was to implement special applications for viewing log
files and editing configuration files. I don't consider this a
reasonable solution, and thus object to IMAA based on this. In a
discussion, the productive response to this criticism would be to
explain how IMAA can accommodate other views as well. If IMAA cannot
accommodate this view, all it would take to say that and I'll be
enlightened. I agree the IMAA document doesn't answer my question,
that's why I brought it up.

>> > If you want to propose a ESMTP extension with no fallback, either
>>> change IMAX or create your own Internet Draft. In either case, you
>>> will have to say explicitly how this will interact with SMTP servers
>>> that do not support the new protocol, how bounces would be handled,
>> > how users would know if they could send a message, and so on. I
>>> think when you write that, if you do so honestly, you will see that
>>> it would be silly to propose such a solution.
>>
>>Discarding it as silly seems a bit premature to me. Having such a
>>proposal, that discusses all the consequences you mention seems like a
>>valuable contribution to this discussion. But I guess it is easier to
>>advocate one solution if the competition are discarded early on...
>
> We disagree here. This discussion has been happening for over 10 years
> with respect to ESMTP extensions. I don't consider that to be "early
> on".

If an idea can't be dismissed after 10 years of discussion, perhaps
there is some merit with that idea. Since technical solutions are
continuously proposed based on the idea, perhaps it would be useful to
document why that idea is a bad one, if you believe that.

>> > I think I hear you saying that you think that the protocols should
>>> allow any repertoire and any encoding of those repertoires. If so, we
>>> certainly disagree. The IETF is not very keen on creating protocols
>>> for which there would be limited and unpredictable
>>> interoperability. Other standards group might not be so picky.
>>
>>That is stretching it a bit, I think. I believe that a solution worth
>>its salt should consider existing habits, and whether we like it or
>>not there is more than charset used on the Internet. MIME appears to
>>acknowledge this and is rather successful. HTML acknowledge this and
>>is rather successful. Same for HTTP. Come to think of it, I can't
>>recall any successful internationalization product the IETF has
>>produced to counter my examples, can you help me?
>
> You just helped yourself.
>
> -MIME headers (RFC 2047) often display unreadable gibberish for
> charsets that the recipient can't decode, even when using
> quoted-printable (commonly called "quoted-unreadble" by people in the
> mail world).
>
> -HTML shows unintelligible gibberish if the charset used and stated in
> the document cannot be displayed by the user. People see this every
> day.
>
> -HTTP fails compeletly if the client lists charsets that it can read
> and none of those are charsets that the server can write. This is
> uncommon because this feature is rarely used because of the high
> failure rate.
>
> The third case is analogous to what you are proposing with "fail to
> deliver if the charset is not supported".

So you are saying that the IETF is "not very keen on creating
protocols" like MIME, HTML and HTTP? That's an interesting
proposition.

>>If you are speaking for IETF,
>
> I'm not, and you should assume that anyone other than Harald
> Alvestrand or Leslie Daigle who speaks for the IETF is bluffing or
> lying.

Right, I didn't want to make that assumption in this case as it would
be offensive.

>> I find it interesting that RFC 2277
>>"IETF Policy on Character Sets and Languages" says that protocols MAY
>>allow use of any repertoire.
>
> Really? It says that? Could you quote the sentence for us here? I
> couldn't find the word "repertoire" anywhere in RFC 2277.

The word "repertoire" is indeed not used. The following section
(§3.1, page 3 of the document, if you want to look it up) says that
other charsets or other encoding schemes may be used.

,----
| Protocols MAY specify, in addition, how to use other charsets or
| other character encoding schemes for ISO 10646, such as UTF-16, but
| lack of an ability to use UTF-8 is a violation of this policy; such a
| violation would need a variance procedure ([BCP9] section 9) with
| clear and solid justification in the protocol specification document
| before being entered into or advanced upon the standards track.
`----

I note that punycode is a encoding scheme, and thus IDNA and IMAA
violates this by lacking an ability to use UTF-8.

>> It doesn't say that it is a bad idea to
>>allow more than one charset. I agree with that document, let's
>>require the use of UTF-8 in protocols, but allow negotiation of other
>>charsets to smooth transition and deployment.
>
> You should take this up with Harald Alvestrand, the author of RFC
> 2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
> the IESG) approved it to be on standards track.

Perhaps he is busy with other things, but I will ask if the policy in
RFC 2277 doesn't apply any more, or where the variance procedure steps
for the IDN working group are documented. Thanks for the suggestion.

Paul Hoffman / IMC

2003-02-26 20:57:34 UTC

At 12:38 PM +0100 2/26/03, Simon Josefsson wrote:
>My question was sincere. IMAX appears to be a solution for
>internationalization of MTAs, at the SMTP layer. It does not propose
>solving the internationalization problem for MUAs.

Yes, it does. It shows exactly how an MUA should display ACE names.

> SMTP is an
>interactive protocol between two end-entities, and can therefor
>negotiate non-ASCII support, which is different from RFC (2)822 where
>all entities that will handle the stored data is not able to interact
>with the creator of that data to negotiate non-ASCII. IMAX takes
>advantage of this difference. I believe it would be possible to
>design a internationalization solution for RFC (2)822 that would be
>distinct from a SMTP internationalization solution.

We did that with IMAA. If you have a different proposal, please write
an Internet Draft for it.

> Those two
>distinctions could be investigated in parallel and evaluated on their
>own merits.

Yes, but we need Internet Drafts before we can do that.

> If you think this is ludicrous and want this to be a
>productive discussion, please take the question seriously and explain
>in technical terms why your proposal is better.

How many times should this be done? IMAA is certainly going to be
simpler than any proposal that requires changes to both MTAs and MUAs
because it localizes the changes to one place (the MUA). It allows
other entities in the Internet Mail system to easily use the
internationalized email addresses without having to know anything
about multiple charsets and repertoires.

>(If you are thinking
> > of a protocol that doesn't require punycode but would instead simply
>> bounce or lose mail that was sent to MTAs that didn't understand the
>> new protocol, please don't bother writing an Internet draft...)
>
>Why not?

Because no one who cares about Internet mail wants to start bouncing
mail messages unpredictably.

Seriously, if you want to do that, don't do it here. Start your own
mailing list. I'm quite willing to have folks who propose different
solutions that are as reliable as IMAA-ACE discuss them here, because
then we can pick just one. But people proposing to make Internet mail
unreliable aren't welcome.

>I can only interprete your dismissal of alternative solutions without
>a serious analysis that you either have done this analysis already and
>know the answers or that you don't want to see alternative ideas
>discussed.

The former.

> In the former case, I think it would be useful to read
>your analysis.

No analysis needed. A "new and improved" mail system that is less
reliable is a non-starter.

> >>I would want the log file to contain...
>>
>> Fine. Ask your vendor to include that feature. This is not part of a
>> protocol specification.
>
>I'm the vendor, and I'm here to understand how to implement it. If
>the protocol specification doesn't give guidance or have considered
>how it will be implemented, I fear it will not work.

Then you're not a useful vendor. Others will be able to easily figure
out where they want to write raw ACE blobs and where they want to
convert them into Unicode characters (and, hopefully, which encoding
to use for the Unicode characters).

>I note that punycode is a encoding scheme, and thus IDNA and IMAA
>violates this by lacking an ability to use UTF-8.

Right.

> > You should take this up with Harald Alvestrand, the author of RFC
>> 2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
>> the IESG) approved it to be on standards track.
>
>Perhaps he is busy with other things,

He posted 152 messages to the IDN WG mailing list, some of which were
on this very topic. It seems likely that he was paying attention...

> but I will ask if the policy in
>RFC 2277 doesn't apply any more, or where the variance procedure steps
>for the IDN working group are documented. Thanks for the suggestion.

Let us know what you find out.

--Paul Hoffman, Director
--Internet Mail Consortium

Simon Josefsson

2003-02-26 23:10:32 UTC

Paul Hoffman / IMC <***@imc.org> writes:

> At 12:38 PM +0100 2/26/03, Simon Josefsson wrote:
>>My question was sincere. IMAX appears to be a solution for
>>internationalization of MTAs, at the SMTP layer. It does not propose
>>solving the internationalization problem for MUAs.
>
> Yes, it does. It shows exactly how an MUA should display ACE names.

Are you referring to the M-* headers? Those were (rightly) dropped
from IMAX, I believe. IMAX doesn't mention the term "MUA" at all.
The section regarding M-* headers definitely does not show "exactly"
how an MUA display the ACE names.

>> Those two
>>distinctions could be investigated in parallel and evaluated on their
>>own merits.
>
> Yes, but we need Internet Drafts before we can do that.

There are two Internet Drafts, with different approaches. We don't
need more to investigate these two.

>> If you think this is ludicrous and want this to be a
>>productive discussion, please take the question seriously and explain
>>in technical terms why your proposal is better.
>
> How many times should this be done?

The IMAX draft is only a few weeks old, if you have discussed it many
times before please provide a reference.

> IMAA is certainly going to be simpler than any proposal that
> requires changes to both MTAs and MUAs because it localizes the
> changes to one place (the MUA).

Earlier you said my question whether IMAA was an internationalization
solution for MTAs was ludicrous yet you now say IMAA doesn't require
any changes to the MTA. Clearly, if you want to internationalization
support in the MTA, you will have to modify it. Let's take a step
back:

Compare the situation for the MTA with IMAX: if you want
internationalization support in the MTA, can you can implement IMAX,
if you don't want or care about it, don't implement. Neither choice
will disrupt existing Internet mail services.

Having asserted that a MTA without support for IMAA or IMAX will not
disrupt existing services, for the remaining discussion we can assume
that the MTA do want to be an internationalized product. I'll call it
an I18NMTA to help keep things apart. What I'm trying to understand
now whether IMAA or IMAX is the better choice for the I18NMTA. Some
propositions:

* IMAA requires the I18NMTA to implement punycode. IMAX doesn't
(assuming my suggested clarification about treating RHS as a IDNA
aware domain name slot is adopted).

* You claim that under the IMAA design it is reasonable to implement
separate applications for viewing log files and editing
configuration files in the I18NMTA. IMAX doesn't require this as it
uses the system's native character set.

* IMAA requires the I18NMTA to support Unicode. While Unicode is a
good thing, it can be difficult to implement in existing systems.
It is potentially disruptive to the Internet Mail system, using your
terminology. My idea of using a IMAX solution without fallback do
not require this. No, I haven't described this idea in an Internet
Draft, so you don't have to challenge the proposition, but I'd
appreciate if you did.

You are welcome to add propositions that are to IMAA's advantage.

> It allows other entities in the Internet Mail system to easily use
> the internationalized email addresses without having to know
> anything about multiple charsets and repertoires.

That isn't true. Not all systems are using Unicode, but IMAA requires
that they implement Unicode. Clearly that is forcing them to know
about multiple charsets.

>>(If you are thinking
>> > of a protocol that doesn't require punycode but would instead simply
>>> bounce or lose mail that was sent to MTAs that didn't understand the
>>> new protocol, please don't bother writing an Internet draft...)
>>
>>Why not?
>
> Because no one who cares about Internet mail wants to start bouncing
> mail messages unpredictably.

Of course not, that is obvious. How did you infer the bouncing would
be unpredictable?

> Seriously, if you want to do that, don't do it here. Start your own
> mailing list. I'm quite willing to have folks who propose different
> solutions that are as reliable as IMAA-ACE discuss them here, because
> then we can pick just one. But people proposing to make Internet mail
> unreliable aren't welcome.

If you believe IMAX would make Internet mail unreliable, please
explain why.

>>I can only interprete your dismissal of alternative solutions without
>>a serious analysis that you either have done this analysis already and
>>know the answers or that you don't want to see alternative ideas
>>discussed.
>
> The former.
>
>> In the former case, I think it would be useful to read
>>your analysis.
>
> No analysis needed. A "new and improved" mail system that is less
> reliable is a non-starter.

"New and improved" is a loose term. IMAA could be considered a "new
and improved" mail system. I believe analysis is needed if you want
to make good decisions.

>>>>> You are free to say that. Others would disagree. In the case of IMAX,
>>>>> what would you want in your log file. All UTF-8? That means you need
>>>>> converters from every accepted charset to UTF-8. Careful sysadmins
>>>>> would probably want to know *exactly* what came in, not some converted
>>>>> form, but that means that their log file would have multiple charsets
>>>>> in it, which would make display a mess. A reasonable option is to
>>>>> store the addresses as ACE and to have a log-file viewer that converts
>>>>> on display (and has an option for not converting).
>>>>>
>>>>> Again, this is an implementation issue, not a protocol issue.
>>>>
>>>> Yes. But it is an important point. A internationalization solution
>>>> that doesn't consider these practical issues is of only theoretical
>>>> value.
>>>>
>>>> I would want the log file to contain characters that can be read
>>>> without special IDNA/IMAA/IMAX aware programs. I.e., if the system
>>>> uses UTF-8 as the system encoding, I'd want the log file to be in
>>>> UTF-8. If the system uses ISO-8859-1, the log file should be in
>>>> ISO-8859-1 (and the application must cope with data that can't be
>>>> represented somehow).
>>>>
>>> Fine. Ask your vendor to include that feature. This is not part of a
>>> protocol specification.
>>
>>I'm the vendor, and I'm here to understand how to implement it. If
>>the protocol specification doesn't give guidance or have considered
>>how it will be implemented, I fear it will not work.
>
> Then you're not a useful vendor. Others will be able to easily figure
> out where they want to write raw ACE blobs and where they want to
> convert them into Unicode characters (and, hopefully, which encoding
> to use for the Unicode characters).

And convert them into the system's native character set too, I'm sure.

>>>>> I think I hear you saying that you think that the protocols should
>>>>> allow any repertoire and any encoding of those repertoires. If so, we
>>>>> certainly disagree. The IETF is not very keen on creating protocols
>>>>> for which there would be limited and unpredictable
>>>>> interoperability. Other standards group might not be so picky.
>>>>
>>>> That is stretching it a bit, I think. I believe that a solution worth
>>>> its salt should consider existing habits, and whether we like it or
>>>> not there is more than charset used on the Internet. MIME appears to
>>>> acknowledge this and is rather successful. HTML acknowledge this and
>>>> is rather successful. Same for HTTP. Come to think of it, I can't
>>>> recall any successful internationalization product the IETF has
>>>> produced to counter my examples, can you help me?
>>>>
>>>> If you are speaking for IETF, I find it interesting that RFC 2277
>>>> "IETF Policy on Character Sets and Languages" says that protocols MAY
>>>> allow use of any repertoire. It doesn't say that it is a bad idea to
>>>> allow more than one charset. I agree with that document, let's
>>>> require the use of UTF-8 in protocols, but allow negotiation of other
>>>> charsets to smooth transition and deployment.
>>>
>>> You should take this up with Harald Alvestrand, the author of RFC
>>> 2277. Note that IDN chose not to use UTF-8, and Harald (as chair of
>>> the IESG) approved it to be on standards track.
>>
>>Perhaps he is busy with other things,
>
> He posted 152 messages to the IDN WG mailing list, some of which were
> on this very topic. It seems likely that he was paying attention...
>
>> but I will ask if the policy in
>>RFC 2277 doesn't apply any more, or where the variance procedure steps
>>for the IDN working group are documented. Thanks for the suggestion.
>
> Let us know what you find out.

I found out that RFC 2277 hasn't been obsoleted. So that means you
were wrong saying (see first paragraph of quoted text), that the IETF
is not keen on creating protocols that allow any repertoire and any
repertoire. The quoted text from RFC 2277 I provided earlier says
that they MAY do this.

For reference, Harald Tveit Alvestrand <***@alvestrand.no> writes:

> simon,
> the "escape clause", if you want one, is that DNS names are not, in
> many senses of the word, text; they're names.
> And RFC 2277 says, in extenso:
>
> 2. Where to do internationalization
>
> Internationalization is for humans. This means that protocols are not
> subject to internationalization; text strings are. Where protocol
> elements look like text tokens, such as in many IETF application
> layer protocols, protocols MUST specify which parts are protocol and
> which are text. [WR 2.2.1.1]
>
> Names are a problem, because people feel strongly about them, many of
> them are mostly for local usage, and all of them tend to leak out of
> the local context at times. RFC 1958 [RFC 1958] recommends US-ASCII
> for all globally visible names.
>
> This document does not mandate a policy on name internationalization,
> but requires that all protocols describe whether names are
> internationalized or US-ASCII.
>
> So IDN is really carrying internationalization outside the scope of
> RFC 2277.
>
> The more basic reason is that the IETF is about doing what's right,
> not what the rules say you have to do - in this particular case, using
> UTF-8
> was debated up the wazoo and far beyond, and the group concluded that
> using Punycode rather than raw UTF-8 encoding was the Right Decision,
> and the IESG backed them on that.
>
> Using up processing time to write a BCP to cover this variance from
> RFC 2277 would not be useful - especially since RFC 2277 can be read
> to say that they didn't have to do this anyway.
>
> Feel free to forward this message wherever you feel like...
>
> Harald
>
> --On 26. februar 2003 12:51 +0100 Simon Josefsson <***@extundo.com> wrote:
>
>> Harald,
>>
>> I'm sure you are busy, but I'd appreciate if you could take time to
>> answer this question. It was suggested on the IMAA list by Paul
>> Hoffman to ask you how to reconcile RFC 2277 with the approval of IDN.
>> In particular, RFC 2277 says:
>>

Jeffrey J Zahari

2003-02-27 08:00:51 UTC

----- Original Message -----
From: "Simon Josefsson" <***@extundo.com>
To: "Paul Hoffman / IMC" <***@imc.org>
Cc: <ietf-***@imc.org>
Sent: Thursday, February 27, 2003 7:10 AM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)
>
> Paul Hoffman / IMC <***@imc.org> writes:
>
> > At 12:38 PM +0100 2/26/03, Simon Josefsson wrote:
> >>My question was sincere. IMAX appears to be a solution for
> >>internationalization of MTAs, at the SMTP layer. It does not propose
> >>solving the internationalization problem for MUAs.
> >
> > Yes, it does. It shows exactly how an MUA should display ACE names.
>
> * IMAA requires the I18NMTA to implement punycode. IMAX doesn't
> (assuming my suggested clarification about treating RHS as a IDNA
> aware domain name slot is adopted).
>
> * You claim that under the IMAA design it is reasonable to implement
> separate applications for viewing log files and editing
> configuration files in the I18NMTA. IMAX doesn't require this as it
> uses the system's native character set.
>
> * IMAA requires the I18NMTA to support Unicode. While Unicode is a
> good thing, it can be difficult to implement in existing systems.
> It is potentially disruptive to the Internet Mail system, using your
> terminology. My idea of using a IMAX solution without fallback do
> not require this. No, I haven't described this idea in an Internet
> Draft, so you don't have to challenge the proposition, but I'd
> appreciate if you did.
>

I actually have some trouble understanding this. IMAA would not require
I18NMTA. It uses ACE straight off the bat, and the implementation would
require updates to the MUA only and not the MTA. The only problem anyone
would face would be to ensure that the input of the names into whatever
mapping tables/control files are of the correct ACE format.

> You are welcome to add propositions that are to IMAA's advantage.
>
> > It allows other entities in the Internet Mail system to easily use
> > the internationalized email addresses without having to know
> > anything about multiple charsets and repertoires.
>
> That isn't true. Not all systems are using Unicode, but IMAA requires
> that they implement Unicode. Clearly that is forcing them to know
> about multiple charsets.
>

Because IMAA is done right at the start of the email process, before the
first SMTP MUA-MTA transaction, the addresses are already in ACE. Once again
the only problem anyone would face would be to ensure that input of the
names into whatever mapping tables/control files are of the correct ACE
format. The only other reason why this could be a problem would be if an
implementation required these files to be in something like UTF-8 or other
similar formats.

jeffrey j zahari

D. J. Bernstein

2003-02-25 02:44:17 UTC

Simon Josefsson writes:
> Like in the user interface for /etc/aliases, /etc/mail/virtusertable etc.

Right.

Addresses are stored in many different locations, possibly with many
different encodings. When a program copies an address from one location
to another location with a different encoding, or compares addresses in
two locations with different encodings, it has to convert between the
encodings. Two examples:

* An SMTP server compares an SMTP RCPT to a configuration file
specifying acceptable RCPTs. If the address is encoded as GoofyCode
in SMTP, but as UTF-8 in the file, then the SMTP server has to
convert between UTF-8 and GoofyCode.

* A system administrator uses the ``more'' program to feed the
configuration file to ``xterm,'' which displays it on the screen.
If the address is actually encoded as GoofyCode in the file, but
UTF-8 in the xterm input, then the ``more'' program has to convert
between GoofyCode and UTF-8.

Any failure to do these conversions---or confusion over which encoding
is used for a particular location---produces failures for the users.

The cost of programming and deploying all these conversions is _the_
fundamental obstacle to moving beyond ASCII. Novice programmers might
not be aware of how many different locations and conversions we're
talking about here, so I've given a partial list of locations at the
end of this message.

The big advantage of UTF-8 is that it can and will be used everywhere,
eliminating all of these conversions. Non-universal encodings such as
GoofyCode don't have the same benefit; GoofyCode can't and won't be used
as the xterm input format, for example.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

$DEFAULT in MDAs: partial mailbox name
$EXT in MDAs: mailbox name
$HOST in MDAs: domain name
$LOCALDOMAIN in DNS clients: domain name
$SENDER in MDAs: domain name
$SENDER in MDAs: mailbox name
.fetchmailrc, various locations: domain name
.fetchmailrc, various locations: login name (often includes mailbox name)
.qmail*: file names (partial mailbox names)
.ssh/known_hosts: domain name
/etc/aliases, various locations: domain name
/etc/aliases, various locations: mailbox name
/etc/hosts second column: domain name
/etc/hosts.allow, various locations: domain name
/etc/hosts.allow, various locations: port-113 name (often mailbox name)
/etc/namedb/named.conf, various locations: domain name
/etc/resolv.conf, search line: domain name
/etc/virtusertable, various locations: mailbox name
/public/file: domain name
/service/dnscache/root/servers: domain name
/service/tinydns/root/data, SOA hostmaster address: mailbox name
/service/tinydns/root/data, various locations: domain name
BIND log files, various locations: domain name
DNS packet, SOA hostmaster address: domain name
DNS packet, SOA hostmaster address: mailbox name (attached to domain name)
DNS packet: query domain name
DNS packet: record domain name
DNS registration form: domain name
HTTP Host field: domain name
IMAP messages, To and Cc and so on: domain name
IMAP messages, To and Cc and so on: mailbox name
POP USER commands: login name (often includes mailbox name and domain name)
POP messages, To and Cc and so on: domain name
POP messages, To and Cc and so on: mailbox name
SMTP HELO commands: domain name
SMTP MAIL and RCPT commands: domain name
SMTP MAIL and RCPT commands: mailbox name
SMTP messages, To and Cc and so on: domain name
SMTP messages, To and Cc and so on: mailbox name
add-host command line: domain name
add-mx command line: domain name
add-ns command line: domain name
dig command line: domain name
dig output, in SOA hostmaster address: mailbox name
dig output: domain name
dnscache log files, various locations: domain name
ezmlm subscription UI: mailbox name (includes domain name)
gethostbyname(), first argument: domain name
h_name and h_aliases: domain name
host command line: domain name
host output, in SOA hostmaster address: mailbox name
host output: domain name
http URLs: domain name
httpd.conf, various locations: domain name
lynx.cfg, various locations: domain name
mail.local command line: mailbox name
mailq output, various locations: domain name
mailq output, various locations: mailbox name
mailto URLs: domain name
mailto URLs: mailbox name
mutt command line: domain name
mutt command line: mailbox name
named zone files, SOA hostmaster address: mailbox name
named zone files, various locations: domain name
named zone files: file name (often includes domain name)
ndc command line: domain name
nsupdate command line: domain name
pine command line: domain name
pine command line: mailbox name
praliases output, various locations: domain name
praliases output, various locations: mailbox name
qmail-inject command line: domain name
qmail-inject command line: mailbox name
qmail-queue envelope input: domain name
qmail-queue envelope input: mailbox name
sendmail command line: domain name
sendmail command line: mailbox name
ssh command line: domain name
telnet command line: domain name
tinydns log files, various locations: domain name

D. J. Bernstein

2003-02-25 02:57:46 UTC

Paul Hoffman / IMC writes:
> SMTP servers out there, some of which are in hardware and cannot be
> upgraded.

As usual, specifics would be helpful: which SMTP servers you're talking
about, and what costs you're actually referring to when you say ``cannot
be upgraded.''

In particular, if these SMTP servers have any trouble handling 8-bit
data, I'd like to document that fact inside http://pi.cr.yp.to.

> forcing a change when it isn't needed is just plain bad design

The users are demanding a change. They're trying non-ASCII characters
and screaming in anguish at the results. You can hear them, can't you?

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

Martin Duerst

2003-02-20 00:07:19 UTC

At 13:40 03/02/19 -0800, Paul Hoffman / IMC wrote:

>At 8:03 PM +0000 2/19/03, Roy Badami wrote:
>>I think the only way this would be viable is if it was mandatory to
>>convert to IMAA-ACE rather than bounce.
>
>So we need two mechanisms instead of one? And the advantage of that is...?

So we needed two different mechanisms (QP/base64 and 8-bit MIME)
for body parts. I assume these things were not created without
some good advantages in mind.

>For those of you who didn't follow the IDN WG for the past few years, this
>is highly analogous to the debate that happened there. The whole idea of a
>"transition" sounds great until you realize that the second format is
>going to be with us forever. Given that the transition strategy is harder
>than simply going with IMAA-ACE, there has to be a good reason for it.
>
>I don't consider "UTF-8 is good" to be a good enough reason.

Of course just saying 'UTF-8 is good' doesn't cut it.
But the same goes for 'ACE is good'.

The best way to explain the advantages of UTF-8, in my view,
are to look at how to work on email data (mailboxes) with
scripts and tools. While this is not laid down in any standard,
and is usually not considered too much in discussions like these,
the whole area of scripts and tools is very important for the
success of a technology. And this definitely was the case for
Internet mail (as opposed, e.g., to some ISO projects in the
same area).

Now being able to more/grep/less/awk/sed/perl/... through
a mailbox is extremely easy as long as everything relevant
stays in US-ASCII. It would also be very easy as long as
everything relevant is in UTF-8. But as soon as things such
as RFC 2047 and ACE come in, things get extremely complicated.
Searching for 'Paul' or 'Hoffman' in email headers is trivial.
Searching for (the native character equivalents of) 'Taro'
or 'Suzuki' in the same headers turns into a major engineering
project. It doesn't need to stay that way.

A different way to explain things:

Ulrich Drepper, of gclib fame, once put it very clearly that
in order to move internationalization forward, he and other
people with his knowledge would work on getting the basics
done (i.e. moving things to UTF-8 or equivalent), and the
people on the other side of the globe could then work on
top of that (localizing applications, language-specific
search,...). The IETF community has formalized this kind
of thinking in a BCP, http://www.ietf.org/rfc/rfc2277.txt,
(and look who is at the top of that document!)
which says:

"Protocols MUST be able to use the UTF-8 charset,"...

While the IETF is very well known for its flexibility,
RFC 2277 should not be something that is easily dismissed.
Indeed, the burden should be on people to prove that UTF-8
does not work at all (which I haven't seen argued here yet),
without constantly trying to turn around the burden of proof.

>(And before anyone here calls me "anti-UTF-8", please look at the top of
>the first page of the UTF-8 RFC.)

Sorry, but http://www.ietf.org/rfc/rfc2279.txt lists Francois Yergeau
as its only author. Same for the next version that is in the works
(http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-03.txt).
That one lists you in the acknowledgements.

You have co-authored the RFC on UTF-16 (http://www.ietf.org/rfc/rfc2781.txt).
That's something different.

Regards, Martin.

Marc Mutz

2003-02-15 20:58:41 UTC

On Saturday 15 February 2003 16:40, Roy Badami wrote:
<snip>
> But I'd urge you to consider the four scenarios I just put forward in
> the thread "What is IMAA: some scenarios for deployment"

I'd like to extend this to include IMAA-ACE-opaque, IMAA-ACE-split and
IMAA-UTF8:

> Scenario 1a:
(ISP IMAA-unaware, user want to use IMAs with her local MUA):
> with IMAA there's a reasonable hope of basic support
> with just an updated mail client, and better support with minor
> updates to the ISPs sign-up systems. With UTF8ADDRESSES this will
> require in addition a major upgrade to the ISP's mail infrastructure.

IMAA-ACE-* are no different here.

> Scenario 1b:
(1a with webmail)
> IMAA and UTF8ADDRESSES both require a major upgrade to
> the ISP's infrastructure.

IMAA-ACE-* yes, IMAA-UTF8 probably not. It might suffice to add or
change to
<meta http-equiv="content-type" content="text/html; charset=utf-8">
in the delivered html page.

> Scenario 2a:
(IMAA-unaware ISP and company mail server, IMA/IDN use)
> IMAA requires only an upgrade to the mail clients;
> UTF8ADDRESSES requires an upgrade to the clients, the organization's
> MTA, and the ISP's mail infrastructure (so that the backup MX will
> continue to work).

No change here, too.

> Scenario 2b: IMAA requires upgrades to the mail clients and, as
> currently specified in the draft, an upgrade to the organization's
> MTA (though the need to upgrade the MTA might disappear depending on
> the design decisions we take in IMAA). UTF8ADDRESSES requires
> upgrades to the clients, the MTA and the ISP's infrastructure.
<snip>

IMAA-ACE-opaque requires that.
IMAA-ACE-split requires no modification on the server side, just like
2a.

Marc

--
'When you see the ping of death, duck and cover.'
-- Bruce Schneier, Crypto-Gram Oct 2002

Roy Badami

2003-02-15 22:27:52 UTC

> > Scenario 1b:
> (1a with webmail)
> > IMAA and UTF8ADDRESSES both require a major upgrade to
> > the ISP's infrastructure.
>
> IMAA-ACE-* yes, IMAA-UTF8 probably not. It might suffice to add or
> change to
> <meta http-equiv="content-type" content="text/html; charset=utf-8">
> in the delivered html page.

I'm not sure how IMAA-UTF8 differs from John Klensin's UTF8ADDRESSES
proposal, but this would appear to apply equally to both. I'm not
sure I believe that it really would be that simple to upgrade web mail
systems to support UTF-8 addresses, but it's an interesting
suggestion.

> > Scenario 2b: IMAA requires upgrades to the mail clients and, as
> > currently specified in the draft, an upgrade to the organization's
> > MTA (though the need to upgrade the MTA might disappear depending on
> > the design decisions we take in IMAA). UTF8ADDRESSES requires
> > upgrades to the clients, the MTA and the ISP's infrastructure.
> <snip>
>
> IMAA-ACE-opaque requires that.
> IMAA-ACE-split requires no modification on the server side, just like
> 2a.

That's what I intended by saying that the requirement to upgrade the
MTA might disappear depending on the design decisions we make in IMAA
(splitting is an open issue within IMAA).

-roy

Marc Mutz

2003-02-15 22:55:35 UTC

On Saturday 15 February 2003 23:27, Roy Badami wrote:
<snip>
> I'm not sure how IMAA-UTF8 differs from John Klensin's UTF8ADDRESSES
> proposal,

They're the same.

> but this would appear to apply equally to both. I'm not
> sure I believe that it really would be that simple to upgrade web
> mail systems to support UTF-8 addresses, but it's an interesting
> suggestion.

It's _potentially_ that simple, not necessarily.
OTOH, punycode necessarily is _not_ that simple.

Just an observation. UTF-8 has other problems, that others have already
mentioned.

Marc

--
If privacy is outlawed, only outlaws will have privacy.
-- Phil Zimmermann

Martin Duerst

2003-02-16 00:30:34 UTC

At 15:40 03/02/15 +0000, Roy Badami wrote:

>Your document is well argued. We certainly shouldn't blindly assume
>that just because the ACE vs just-send-8 issue was argued to death in
>the IDN WG, the trade-offs between the two approaches when applied to
>IMAs will automatically be the same as those for IDNs. (Though I can
>also understand that this group probably really doesn't want to go
>there again.)

Please note that John never proposed 'just-send-8'. 'just-send-8'
is different from ESMPT UTF8ADDRESS.

Regards, Martin.

Dan Kohn

2003-02-26 23:54:37 UTC

Simon, I feel that Paul is showing a Sisyphean level of patience here,
but I know it can't continue. I believe you understand that very
similar issues were hashed out, and an ASCII Compatible Encoding (ACE)
solution was adopted in IDNA. I think you understand that this does not
foreclose someone from eventually standardizing true UTF-8 support for
DNS (probably using EDNS), but I suspect that no one ever will, because
it's a huge amount of implementation work for no meaningful gain (since
IDNA code paths will still have to be supported forever).

The exact same logic holds for IMAA, and is why an IMAX ESMTP extension
simply adds no meaningful value over IMAA for the only folks who care
about i18n, which is the end-users.

Simon Josefsson wrote:

> Earlier you said my question whether IMAA was an internationalization
> solution for MTAs was ludicrous yet you now say IMAA doesn't require
> any changes to the MTA. Clearly, if you want to internationalization
> support in the MTA, you will have to modify it. Let's take a step
> back:

The whole point of IMAA, as I'm sure you know, is that we don't want
i18n support in MTAs. Why bother? It's a huge amount of work, and any
mail admin who really cares about the LHS can still treat it as an
opaque string (as the standard says it is). Other than the specific
issue of sub-addressing, the whole concept of an I18NMTA is a huge
amount of work for no value.

If you do need to implement Unicode and nameprep support on an MTA to
support sub-addressing (and that's not clear yet), then additionally
adding punycode is a minor step.

>> It allows other entities in the Internet Mail system to easily use
>> the internationalized email addresses without having to know
>> anything about multiple charsets and repertoires.

> That isn't true. Not all systems are using Unicode, but IMAA requires
> that they implement Unicode. Clearly that is forcing them to know
> about multiple charsets.

No, no, no. IMAA requires nameprep and punycode for i18n-capable MUAs.
But the installed based of 500+ M MUAs out there today can continue to
interact perfectly normally with any IMAA address. They just don't see
the i18n version of the IMAA LHS, which is find, because it's an opaque
string. And, MTAs don't need to be upgraded. This is the whole point
of IMAA (and IDNA).

> If you believe IMAX would make Internet mail unreliable, please
> explain why.

Because most MTAs don't support IMAX, and so every IMAX-capable MUA and
MTA would always have to be downgrading to IMAA (or have to bounce the
message), in which case no value has been added, but a lot of addition
work has been done. Since there will always be some non-IMAX capable
MTAs and MUAs, IMAA will always have to be around, so everyone of those
IMAX MUAs and MTAs will still need to implement punycode. Or they can
bounce the message, which is what Paul was referring to as a
non-starter.

Simon, I've seen this movie before, and I know how it ends. The ACE
wins.

If you want to go forward with IMAX, you may even succeed in getting it
published as Experimental, though I doubt it. But no one will implement
it, since it adds lots of effort but no value over IMAA, so why bother?

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>

Edmon Chung

2003-02-27 01:21:34 UTC

Hi Dan,

----- Original Message -----
From: "Dan Kohn" <***@dankohn.com>
> Because most MTAs don't support IMAX, and so every IMAX-capable MUA and
> MTA would always have to be downgrading to IMAA (or have to bounce the
> message), in which case no value has been added, but a lot of addition
> work has been done. Since there will always be some non-IMAX capable
> MTAs and MUAs, IMAA will always have to be around, so everyone of those
> IMAX MUAs and MTAs will still need to implement punycode. Or they can
> bounce the message, which is what Paul was referring to as a
> non-starter.

That was true from the very proposition of ESMTP. And ESMTP was designed so
that features could be added as such.

> If you want to go forward with IMAX, you may even succeed in getting it
> published as Experimental, though I doubt it. But no one will implement
> it, since it adds lots of effort but no value over IMAA, so why bother?

This is not a fair statement. We have implemented the IMAX extension as an
experiment already! :-)
Of course, its because we devised it... But discussion with other peer
vendors indicate that there is interest in this direction. I think
Experimental is interesting. Perhaps we should pursue that... I think
vendors would be interested to add the feature and we can argue till the end
of time and we wont know if we dont publish it.

Your thoughts?

Edmon

Lawrence Greenfield

2003-02-27 03:48:53 UTC

Date: Wed, 26 Feb 2003 15:54:37 -0800
From: "Dan Kohn" <***@dankohn.com>
[...]
work has been done. Since there will always be some non-IMAX capable
MTAs and MUAs, IMAA will always have to be around, so everyone of those
IMAX MUAs and MTAs will still need to implement punycode. Or they can
bounce the message, which is what Paul was referring to as a
non-starter.

Simon, I've seen this movie before, and I know how it ends. The ACE
wins.

Dan,

I think you're being a bit disingenuous here. Are you against the
8BITMIME extension? Do you think it is a failure? Was it a mistake?

Larry

D. J. Bernstein

2003-02-27 05:12:07 UTC

Dan Kohn writes:
> standardizing true UTF-8 support for DNS (probably using EDNS)

You don't know what you're talking about. The DNS protocol is already
8-bit clean. DNS servers and caches can already handle 8-bit data.

> it's a huge amount of implementation work for no meaningful gain (since
> IDNA code paths will still have to be supported forever)

If IDNA were a complete solution, if the massive costs of implementing
and deploying that solution had already been incurred, and if Internet
software development came to a complete halt so that we didn't have to
worry about costs imposed on future implementors, then I'd agree.

But IDNA isn't a complete solution; only a tiny fraction of the IDNA
costs have been incurred; and, most importantly, Internet development
shows no signs of coming to a halt.

All the short-term upgrade costs that we're considering, no matter how
huge they might seem, are tiny compared to the long-term costs of the
character-set mess. Do you want implementors in ten years, or twenty
years, or fifty years, to be continuing to worry about conversions from
one character encoding to another? I want them to be spending the same
time providing _new_ features for the users.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

Martin Duerst

2003-02-27 16:42:07 UTC

The long-term cost argument is very well put below.

Regards, Martin.

At 05:12 03/02/27 +0000, D. J. Bernstein wrote:

>Dan Kohn writes:

> > it's a huge amount of implementation work for no meaningful gain (since
> > IDNA code paths will still have to be supported forever)
>
>If IDNA were a complete solution, if the massive costs of implementing
>and deploying that solution had already been incurred, and if Internet
>software development came to a complete halt so that we didn't have to
>worry about costs imposed on future implementors, then I'd agree.
>
>But IDNA isn't a complete solution; only a tiny fraction of the IDNA
>costs have been incurred; and, most importantly, Internet development
>shows no signs of coming to a halt.
>
>All the short-term upgrade costs that we're considering, no matter how
>huge they might seem, are tiny compared to the long-term costs of the
>character-set mess. Do you want implementors in ten years, or twenty
>years, or fifty years, to be continuing to worry about conversions from
>one character encoding to another? I want them to be spending the same
>time providing _new_ features for the users.
>
>---D. J. Bernstein, Associate Professor, Department of Mathematics,
>Statistics, and Computer Science, University of Illinois at Chicago

Claus Färber

2003-02-27 00:00:00 UTC

Dan Kohn <***@dankohn.com> schrieb/wrote:
> If you do need to implement Unicode and nameprep support on an MTA to
> support sub-addressing (and that's not clear yet), then additionally
> adding punycode is a minor step.

You don't need nameprep just for subaddressing, only a Punycode decoder.
The output of the Punycode decoder consists of normalised and
nameprepped Unicode character sequences that can be compared directly.

Claus
--
http://www.faerber.muc.de/

Edmon Chung

2003-02-27 17:59:03 UTC

I guess part of the concern is that nameprep might not be right for IMA.
For example, some mail servers are case sensitive in the local part... and
that should be perfectly fine.
Edmon

----- Original Message -----
From: "Claus Färber" <list-ietf-i18n-***@faerber.muc.de>
To: <ietf-***@imc.org>
Sent: Wednesday, February 26, 2003 7:00 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> Dan Kohn <***@dankohn.com> schrieb/wrote:
> > If you do need to implement Unicode and nameprep support on an MTA to
> > support sub-addressing (and that's not clear yet), then additionally
> > adding punycode is a minor step.
>
> You don't need nameprep just for subaddressing, only a Punycode decoder.
> The output of the Punycode decoder consists of normalised and
> nameprepped Unicode character sequences that can be compared directly.
>
> Claus
> --
> http://www.faerber.muc.de/
>

Dan Kohn

2003-02-27 04:37:17 UTC

Lawrence Greenfield wrote:

> I think you're being a bit disingenuous here. Are you against the
> 8BITMIME extension? Do you think it is a failure? Was it a mistake?

It's a fair criticism, based on the fact that I just proposed a new CTE
that requires 8BitMIME on email. The difference, I believe, is that
8BitMIME provides a 33% bandwidth reduction when it can be used
end-to-end, at the cost of requiring base64 transformations when
encountering a non-compliant MTA or MUA.

By contrast, I don't think IMAX offers any bandwidth, complexity, or
usability advantages, while still requiring a lot of additional
complexity in implementation. In that way, as I said in the original
message, I believe it is much more analogous to the (since withdrawn)
proposal to implement IDNs using EDNS and UTF-8. One could certainly
design something that would work, but it would require servers to be
upgraded, would offer no more functionality than IDNA, and would
increase complexity. Why bother?

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>

Jeffrey J Zahari

2003-02-27 07:30:14 UTC

Actually dan, though it has been brought up before that 8 bit DNS is frowned
upon, as of bind 9, anyone could have already set up a working DNS/mail
infrastructure using bind 9 and qmail using UTF-8.

However, IDNA was chosen because of better compression comparitive to UTF-8
and most importantly, the use of this "shim" meant that all legacy
implementations would not have to be upgraded. So, while ACE will probably
be around for an extremely long time to come, it may be a bit premature to
say that investigations into divesting the 7 bit legacy is too bothersome to
be attempted.

jeffrey j zahari

----- Original Message -----
From: "Dan Kohn" <***@dankohn.com>
To: "Lawrence Greenfield" <leg+@andrew.cmu.edu>
Cc: <ietf-***@imc.org>
Sent: Thursday, February 27, 2003 12:37 PM
Subject: RE: Problems of Internationalized Mail Address eXtensions (IMAX)

>
> By contrast, I don't think IMAX offers any bandwidth, complexity, or
> usability advantages, while still requiring a lot of additional
> complexity in implementation. In that way, as I said in the original
> message, I believe it is much more analogous to the (since withdrawn)
> proposal to implement IDNs using EDNS and UTF-8. One could certainly
> design something that would work, but it would require servers to be
> upgraded, would offer no more functionality than IDNA, and would
> increase complexity. Why bother?
>
> - dan
> --
> Dan Kohn <mailto:***@dankohn.com>
> <http://www.dankohn.com/> <tel:+1-650-327-2600>
>

D. J. Bernstein

2003-02-27 10:22:49 UTC

Jeffrey J Zahari writes:
> anyone could have already set up a working DNS/mail
> infrastructure using bind 9 and qmail using UTF-8

Here you're talking about a massive change to the Internet as if it were
some trivial overnight task.

> However, IDNA was chosen

Here you're talking about even more massive change to the Internet as if
it were already done.

> better compression comparitive to UTF-8

Here you're wildly exaggerating the importance of a ludicrously small
issue.

> all legacy implementations would not have to be upgraded

Here you're making the content-free observation that we don't have to
do anything if we don't want to accomplish anything.

Are you an implementor? Do you speak any languages other than English?
What are your goals here?

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

Jeffrey J Zahari

2003-02-28 05:10:58 UTC

----- Original Message -----
From: "D. J. Bernstein" <***@cr.yp.to>
To: <ietf-***@imc.org>
Sent: Thursday, February 27, 2003 6:22 PM
Subject: Re: Problems of Internationalized Mail Address eXtensions (IMAX)
>
> Jeffrey J Zahari writes:
> > anyone could have already set up a working DNS/mail
> > infrastructure using bind 9 and qmail using UTF-8
>
> Here you're talking about a massive change to the Internet as if it were
> some trivial overnight task.
>

I am not advocating that everyone should go down this path, just merely
pointing out that efforts have been made to allow 8 bit on the Internet.

> > However, IDNA was chosen
>
> Here you're talking about even more massive change to the Internet as if
> it were already done.
>

IDNA was chosen, yes, by the idn wg.

> > better compression comparitive to UTF-8
>
> Here you're wildly exaggerating the importance of a ludicrously small
> issue.
>

There have been no relaxations to the domain label length restrictions, or
recommendations to cater for variable width encodings, so these limits would
need due consideration.

> > all legacy implementations would not have to be upgraded
>
> Here you're making the content-free observation that we don't have to
> do anything if we don't want to accomplish anything.
>

Existing DNS implementations would not need any upgrades. The tools used to
manipulate the zone files on the other hand, would need helper applications,
or punycode support to do the necessary conversions. Protocols requiring the
use of domain names as their protocol elements would likewise need some
reworking to cater to the impact of ACE idns. However, one could argue the
scope of DNS per se does not need to cover this.

> Are you an implementor? Do you speak any languages other than English?
> What are your goals here?
>

Yes, i speak mandarin, a smattering of dialects from guangdong, fujian and
malay, a language which originally had no written form, but has adopted
us-ascii ( no i18n problems there ), and I presume all our goals are to
present balanced opinions on solutions at hand?

> ---D. J. Bernstein, Associate Professor, Department of Mathematics,
> Statistics, and Computer Science, University of Illinois at Chicago
>

jeffrey j zahari

J-F C. (Jefsey) Morfin

2003-02-27 14:18:02 UTC

Has someone documented:
- the real obsolescence time of the obsolete solutions one is to support
(vs OS version, hardware)?
- possible strategies to force updates through partial compatibility,
versioning?
- when the massive changes to occur must have occured (technical necessity
due to complexity, for example)?

All the products have version incompatibilities. I certainly understand
that stability must be protected, but don't we favor the past against the
future? I mean is compatibility with the 20 last years (ascii only for
everyone and 600M of users) or compatibility with the 1000 years to come
(everyone's vernacular being supported and 6B of users) which is to prevail?

At 11:22 27/02/03, D. J. Bernstein wrote:
>Jeffrey J Zahari writes:
> > anyone could have already set up a working DNS/mail
> > infrastructure using bind 9 and qmail using UTF-8
>
>Here you're talking about a massive change to the Internet as if it were
>some trivial overnight task.
>
> > However, IDNA was chosen
>Here you're talking about even more massive change to the Internet as if
>it were already done.

the key demployement issue (ITLDs) is not supported.

Thank you.
jfc

Marc Mutz

2003-02-27 19:13:50 UTC

On Thursday 27 February 2003 05:37, Dan Kohn wrote:
<snip>
> The difference, I believe, is
> that 8BitMIME provides a 33% bandwidth reduction when it can be used
> end-to-end, at the cost of requiring base64 transformations when
> encountering a non-compliant MTA or MUA.
<snip>

The big difference of 8bitmime vs. any IMA-enabling SMTP extension is
that the former has 8bit CTE as it's companion in rfc2822
serializations, while the latter has nothing like that.

Also, 8BITMIME makes it easier for the MIA[1] (doesn't need to apply CTE
in certain situations anymore), while IMA-SMTP extensions that exist in
a universe of their own (ie. not backed by the same structure in
rfc2822 serializations) don't. If a MIA wants to use the IMA-SMTP
extension, then it doesn't _save_ a conversion (8bit->qp/b64 as is the
case in 8BITMIME), but needs to _add_ one (IMAA->IMAX). Why would a MIA
want to use the extension if it was more work?

[1] MIA = Message Injection Agent

--
The illegal we do immediately.
The unconstitutional takes a bit longer. -- Henry Kissinger

Martin Duerst

2003-02-28 18:58:01 UTC

At 20:13 03/02/27 +0100, Marc Mutz wrote:
>On Thursday 27 February 2003 05:37, Dan Kohn wrote:
><snip>
> > The difference, I believe, is
> > that 8BitMIME provides a 33% bandwidth reduction when it can be used
> > end-to-end, at the cost of requiring base64 transformations when
> > encountering a non-compliant MTA or MUA.
><snip>
>
>The big difference of 8bitmime vs. any IMA-enabling SMTP extension is
>that the former has 8bit CTE as it's companion in rfc2822
>serializations, while the latter has nothing like that.

I agree that it works better if we have both.
That just may mean that we need both.

Regards, Martin.

>Also, 8BITMIME makes it easier for the MIA[1] (doesn't need to apply CTE
>in certain situations anymore), while IMA-SMTP extensions that exist in
>a universe of their own (ie. not backed by the same structure in
>rfc2822 serializations) don't. If a MIA wants to use the IMA-SMTP
>extension, then it doesn't _save_ a conversion (8bit->qp/b64 as is the
>case in 8BITMIME), but needs to _add_ one (IMAA->IMAX). Why would a MIA
>want to use the extension if it was more work?
>
>[1] MIA = Message Injection Agent
>
>--
>The illegal we do immediately.
>The unconstitutional takes a bit longer. -- Henry Kissinger

Martin Duerst

2003-02-27 16:32:02 UTC

Hello Dan,

At 15:54 03/02/26 -0800, Dan Kohn wrote:

>Simon, I feel that Paul is showing a Sisyphean level of patience here,
>but I know it can't continue. I believe you understand that very
>similar issues were hashed out, and an ASCII Compatible Encoding (ACE)
>solution was adopted in IDNA.

The issues were somewhat very similar, and somewhat they are
probably quite different. I know that Paul may easily get tired
because he went through similar argumentation before, and probably
also because one of the purposes of working on IDN was to get some
solution for IMAs. But some people (such as John Klensin) think that
there are important differences between IDN and IMA. So closing
the discussion early just because one is tired from a previous
discussion doesn't seem adequate.

>The exact same logic holds for IMAA, and is why an IMAX ESMTP extension
>simply adds no meaningful value over IMAA for the only folks who care
>about i18n, which is the end-users.

End-users should be taken quite widely here. Somebody who is trying
to write a new and interesting spam filter may be an end user.
Email isn't just only passed around by MTAs and then read by people
using MUAs, there are all kinds of other ways in which it is processed.

>The whole point of IMAA, as I'm sure you know, is that we don't want
>i18n support in MTAs. Why bother? It's a huge amount of work, and any
>mail admin who really cares about the LHS can still treat it as an
>opaque string (as the standard says it is). Other than the specific
>issue of sub-addressing, the whole concept of an I18NMTA is a huge
>amount of work for no value.

Let's look at it. Making MTAs work with 8-bit headers in many ways
is actually rather trivial. Doing negotiation isn't exactly trivial,
but it is done already for 8BITMIME, and probably for other extensions
(which is a huge difference from the IDN situation, as John noted).

The main problem isn't the amount of work it takes in each instance,
it is that due to various historical accidents, we are currently
in a situation that is significantly suboptimal for everybody.
We can either say "we are deep in this mess, let's dig deeper",
or we can say "let's think about how we might dig into a
direction where we might get out of this mess".

> > If you believe IMAX would make Internet mail unreliable, please
> > explain why.
>
>Because most MTAs don't support IMAX, and so every IMAX-capable MUA and
>MTA would always have to be downgrading to IMAA (or have to bounce the
>message), in which case no value has been added, but a lot of addition
>work has been done. Since there will always be some non-IMAX capable
>MTAs and MUAs, IMAA will always have to be around, so everyone of those
>IMAX MUAs and MTAs will still need to implement punycode.

This situations seems to be quite similar to 8BITMIME. Still
8BITMIME is used a lot.

>Simon, I've seen this movie before, and I know how it ends. The ACE
>wins.

So you are concluding from a sample of 1?

>If you want to go forward with IMAX, you may even succeed in getting it
>published as Experimental, though I doubt it. But no one will implement
>it, since it adds lots of effort but no value over IMAA, so why bother?

By your argumentation, 8BITMIME would have become experimental, and
nobody would have implemented it. Why do you think reality as we
see it today is different?

Regards, Martin.

Martin Duerst

2003-02-27 16:58:43 UTC

At 20:37 03/02/26 -0800, Dan Kohn wrote:

>Lawrence Greenfield wrote:
>
> > I think you're being a bit disingenuous here. Are you against the
> > 8BITMIME extension? Do you think it is a failure? Was it a mistake?
>
>It's a fair criticism, based on the fact that I just proposed a new CTE
>that requires 8BitMIME on email. The difference, I believe, is that
>8BitMIME provides a 33% bandwidth reduction when it can be used
>end-to-end, at the cost of requiring base64 transformations when
>encountering a non-compliant MTA or MUA.
>
>By contrast, I don't think IMAX offers any bandwidth,

agreed that this is negligible, and irrelevant.

>complexity, or usability advantages,

Disagreed. Let's say I were a procurer/sysadmin in Japan,
and would have to procure MTAs and MUAs for a whole department.
I would clearly buy a solution that lets me have a look at
all the stuff that's going on without having hundreds of
tools to make sure that I would see the right punycode
decoding at the right place (rather than to have to stare
at punycode anywhere). That would strongly reduce complexity
and increase usability for me, and improve service for my
user base.

>while still requiring a lot of additional
>complexity in implementation. In that way, as I said in the original
>message, I believe it is much more analogous to the (since withdrawn)
>proposal to implement IDNs using EDNS and UTF-8. One could certainly
>design something that would work, but it would require servers to be
>upgraded, would offer no more functionality than IDNA, and would
>increase complexity. Why bother?

Why think about the long-term future of the internet?
Anyway, one of the problems in IDN was that there were so many
different ways that something could be negotiated/distinguished,
and none of them very well established. For SMTP, this is clearly
different. Also, the DNS is so low-level that it is in fact
possible to hide the uglyness of IDNA quite easily (look e.g.
at idnkit, http://www.nic.ad.jp/ja/idn/mdnkit/download/#sources).
It actually allows to patch binary applications in some cases.
This is quite different for IMAA, it's uglyness will show in
much more places.

Regards, Martin.

Dan Kohn

2003-02-27 22:35:20 UTC

Martin Duerst wrote:

> Disagreed. Let's say I were a procurer/sysadmin in Japan,
> and would have to procure MTAs and MUAs for a whole department.
> I would clearly buy a solution that lets me have a look at
> all the stuff that's going on without having hundreds of
> tools to make sure that I would see the right punycode
> decoding at the right place (rather than to have to stare
> at punycode anywhere). That would strongly reduce complexity
> and increase usability for me, and improve service for my
> user base.

I believe sysadmins who care can use a punycode decoding tool (or Emacs
macro) and that the vast majority won't care. But, as I said, if you
really think there's a market for I18NMTAs, please go forward with a
standard for it (maybe it will even get on the standards rather than the
experimental track). But, every IMAX-capable MUA and MTA will need to
implement IMAA for at least the next 50 years (IMHO), and I think most
implementers will not bother with both.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>

D. J. Bernstein

2003-02-27 23:21:35 UTC

Dan Kohn writes:
> I believe sysadmins who care can use a punycode decoding tool (or
> Emacs macro) and that the vast majority won't care.

Won't care? _Won't care_?

Do you think that moving beyond ASCII is some sort of theoretical game?
You declare that the string "xyzzy" is a Greek alpha, you put "xyzzy" on
the screen, and you expect the user to pretend he's seeing an alpha?

Imagine that your computer's UI used octal instead of ABCDE etc. If you
want an A, you type 101, and see 101 on your screen. If you want a B,
you type 102, and see 102 on your screen. Maybe you have a special mail
viewer that shows you A and B and so on, but everything else you do with
the computer is in octal. How would you like that?

I expect that you'll respond by saying that the user interface is ``out
of scope.'' But declaring the failures of your solution to be ``out of
scope'' doesn't make them disappear; it simply makes you look foolish.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

Claus Färber

2003-02-28 00:00:00 UTC

D. J. Bernstein <***@cr.yp.to> schrieb/wrote:
> Do you think that moving beyond ASCII is some sort of theoretical game?
> You declare that the string "xyzzy" is a Greek alpha, you put "xyzzy" on
> the screen, and you expect the user to pretend he's seeing an alpha?

Someone who does not know the Greek language and can't type Greek
characters might actually prefer entering xn--mxa (which is Nameprep and
Punycode of a single alpha); it's even easier than using ISO 14755 for
longer name components.

While Greek and Cyrillic are quite familiar to people used to Latin,
other scripts are certainly not; most people could not compare strings
if they use different fonts or even rendering engines.

Claus
--
http://www.faerber.muc.de/

Martin Duerst

2003-02-28 16:30:52 UTC

At 00:00 03/02/28 +0000, Claus Faerber wrote:

>Someone who does not know the Greek language and can't type Greek
>characters might actually prefer entering xn--mxa (which is Nameprep and
>Punycode of a single alpha); it's even easier than using ISO 14755 for
>longer name components.

Somebody who doesn't know Greek probably doesn't care to type Greek
at all. I guess we should design IMAs so that Greek IMAs work best
for Greeks, and Japanese IMAs work best for Japanese, and so on.
Designing IMAs so that Greek IMAs work best for Japanese and Japanese
IMAs work best for Greeks doesn't make sense.

Regards, Martin.

Claus Färber

2003-02-28 00:00:00 UTC

Martin Duerst <***@w3.org> schrieb/wrote:
> Designing IMAs so that Greek IMAs work best for Japanese and Japanese
> IMAs work best for Greeks doesn't make sense.

So it's ok if someone in Greece can't write down the email address of a
friend in Japan on paper?

Although most people will continue to have ASCII-only addresses in
addition to their non-ASCII addresses, they won't switch the From line
every time they mail foreign people.

Claus
--
http://www.faerber.muc.de/

Martin Duerst

2003-02-28 22:33:33 UTC

At 00:00 03/02/28 +0000, Claus Faerber wrote:

>So it's ok if someone in Greece can't write down the email address of a
>friend in Japan on paper?

It's okay that someone in Greece can't write down an email with
Japanese characters. It's okay that somebody in Greece doesn't
understand Japanese. Would be nice if we could change that,
but that won't happen soon.

>Although most people will continue to have ASCII-only addresses in
>addition to their non-ASCII addresses, they won't switch the From line
>every time they mail foreign people.

They won't want to do that. Tools will do that for them, pretty easily.

Regards, Martin.

Tan Tin Wee

2003-03-01 03:23:09 UTC

but it is NOT ok, if someone in Japan who understands Japanese
cannot write down an email address or send email to another
Japanese person who can understand Japanese.
This needs to happen SOON.

And it would be a big tragedy if 1 billion Chinese persons
cannot write down their email addresses in Chinese and
communicate that to the other 1 billion minus one Chinese
persons using the Internet because we are worried that
someone in Greece can't write down an email with
Japanese characters, or because we are still hoping that
all of them will master English first. We have gone through
these arguments before in the IDN mailing list in case
IMAA mailing list folks are new to these issues.

So we should move on, and focus on going forward
as Martin and others point out.

bestrgds
tin wee

Martin Duerst wrote:

>
> At 00:00 03/02/28 +0000, Claus Faerber wrote:
>
>> So it's ok if someone in Greece can't write down the email address of a
>> friend in Japan on paper?
>
>
> It's okay that someone in Greece can't write down an email with
> Japanese characters. It's okay that somebody in Greece doesn't
> understand Japanese. Would be nice if we could change that,
> but that won't happen soon.
>
>
>> Although most people will continue to have ASCII-only addresses in
>> addition to their non-ASCII addresses, they won't switch the From line
>> every time they mail foreign people.
>
>
> They won't want to do that. Tools will do that for them, pretty easily.
>
> Regards, Martin.
>
>

Claus Färber

2003-03-02 00:00:00 UTC

Tan Tin Wee <***@bic.nus.edu.sg> schrieb/wrote:
> And it would be a big tragedy if 1 billion Chinese persons
> cannot write down their email addresses in Chinese and
> communicate that to the other 1 billion minus one Chinese
> persons using the Internet because we are worried that
> someone in Greece can't write down an email with
> Japanese characters, or because we are still hoping that
> all of them will master English first. We have gone through
> these arguments before in the IDN mailing list in case
> IMAA mailing list folks are new to these issues.

The interesting part is that the IDNA WG seems to have missed the
main advantage of their own IDNA design:
IDNAs have a Unicode version (e.g. for the 1 billion Chinese people who
understand Chinese characters) and an ASCII-only version (for the Greek
person who does not).

The IDNA spec says: ``ACE labels are unsuitable for display to users.''
But that is not true: They are perfectly suitable for display to users
who don't know the Unicode characters encoded within an ACE label.

Claus
--
http://www.faerber.muc.de/

Adam M. Costello

2003-03-03 00:19:01 UTC

Claus Färber <list-ietf-i18n-***@faerber.muc.de> wrote:

> The interesting part is that the IDNA WG seems to have missed the
> main advantage of their own IDNA design:
> IDNAs have a Unicode version (e.g. for the 1 billion Chinese people who
> understand Chinese characters) and an ASCII-only version (for the Greek
> person who does not).

I don't know if I'd call it the "main" advantage, but I agree that it's
handy, and it was discussed by people in the working group.

> The IDNA spec says: ``ACE labels are unsuitable for display to
> users.'' But that is not true: They are perfectly suitable for
> display to users who don't know the Unicode characters encoded within
> an ACE label.

Yeah, somehow that idea didn't make it into the spec. I don't remember
if that was deliberate, or merely because we never thought to put it in.

In any case, it's a user-interface issue, and the spec does permit
applications to show the ACE at the user's request, so it's no tragedy
that the spec neglects to mention this reason why users might sometimes
ask to see the ACE.

AMC

Martin Duerst

2003-03-03 03:04:15 UTC

At 00:19 03/03/03 +0000, Adam M. Costello wrote:

> > The IDNA spec says: ``ACE labels are unsuitable for display to
> > users.'' But that is not true: They are perfectly suitable for
> > display to users who don't know the Unicode characters encoded within
> > an ACE label.
>
>Yeah, somehow that idea didn't make it into the spec. I don't remember
>if that was deliberate, or merely because we never thought to put it in.

I guess it was deliberate. I think it was very important to
make clear that IDNA was developed for the users who would
actually want to see the real stuff. And I'm not sure that
ACE are 'perfectly suitable' for users who don't know the
characters. I don't think humans are good at dealing with
random sequences of characters, even if it's characters they
are familiar with. My guess is that sooner rather than later,
you will find out for yourselves.

Regards, Martin.

Martin Duerst

2003-03-03 03:12:15 UTC

At 00:00 03/03/02 +0000, Claus F$BgS(Bber wrote:

>The interesting part is that the IDNA WG seems to have missed the
>main advantage of their own IDNA design:
>IDNAs have a Unicode version (e.g. for the 1 billion Chinese people who
>understand Chinese characters) and an ASCII-only version (for the Greek
>person who does not).

I don't think that advantage was missed. But I think that for many
participants in the IDN WG, it was more implicit than explicit.
Many people seem to have some kind of urangst to deal with
characters they don't know, and therefore don't think they have
under control. To openly admit this to the WG, or even just to
admit it to themselves (and then get over it) didn't seem possible.

Regards, Martin.

Dan Kohn

2003-02-28 03:12:05 UTC

Dan Kohn wrote:

>> I believe sysadmins who care can use a punycode decoding tool (or
>> Emacs macro) and that the vast majority won't care.

D. J. Bernstein writes:

> Won't care? _Won't care_?

Yes, won't care. The difference between <display-name> and <addr-spec>
(using the constructs from Section 3.4 of RFC 2822) is that the former
is unambiguously user text (in the RFC 2277 sense) and the latter can be
treated much more like a protocol element. I agree that it is nice to
often be able to guess usernames or use heuristics about their meaning,
but it's not necessary. (One could argue that it would also be nice to
know what MAIL FROM or Content-Disposition means in your native tongue
rather than treating them as abstract protocol elements, which they
obviously are.)

Anyway, I'm not making an absolutist argument here, that things are only
protocol or text and never anything in between. I'm just pointing out
that a Japanese mail admin certainly *could* get by just fine treating
the LHS and RHS of <addr-spec> as opaque text, as they would have to do
anyway for many of their users' correspondents' ASCII addresses.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>

Lawrence Greenfield

2003-02-28 03:32:34 UTC

[...]
> Won't care? _Won't care_?

Yes, won't care. The difference between <display-name> and <addr-spec>
(using the constructs from Section 3.4 of RFC 2822) is that the former
is unambiguously user text (in the RFC 2277 sense) and the latter can be
treated much more like a protocol element. I agree that it is nice to
often be able to guess usernames or use heuristics about their meaning,
but it's not necessary. (One could argue that it would also be nice to
know what MAIL FROM or Content-Disposition means in your native tongue
rather than treating them as abstract protocol elements, which they
obviously are.)

E-mail administrators are frequently looking at logs to track what
happened to certain e-mail messages. They're given e-mail addresses
like "bob sent this" or "joe was expecting this".

Any MTA that didn't give facilities to do this sort of tracking
(either through a program or reading logs or whatever) is clearly
substandard. MTAs written after Punycode or whatever has come into
widespread use would have to decode it for administrators, because
administrators will not be given Punycode addresses by users. (Well,
maybe sometime they will? But one hopes not by non-technical users.)

It is silly to think that administrators can do their job effectively
by treating them as "abstract protocol elements".

MTA administrators dealing with addresses outside their native
language/alphabet may well prefer to deal with Punycode
addresses. Designing a usable system for an administrator to deal with
many languages/scripts is probably pretty hard.

Now, this fact doesn't mean that we _have_ to use UTF-8 in SMTP or in
message headers. It just means that MTAs and other internal
infrastructure _will_ become aware of any encodings.

Larry

Martin Duerst

2003-02-28 16:26:32 UTC

At 19:12 03/02/27 -0800, Dan Kohn wrote:

>Dan Kohn wrote:
>
> >> I believe sysadmins who care can use a punycode decoding tool (or
> >> Emacs macro) and that the vast majority won't care.
>
>D. J. Bernstein writes:
>
> > Won't care? _Won't care_?
>
>Yes, won't care. The difference between <display-name> and <addr-spec>
>(using the constructs from Section 3.4 of RFC 2822) is that the former
>is unambiguously user text (in the RFC 2277 sense) and the latter can be
>treated much more like a protocol element. I agree that it is nice to
>often be able to guess usernames or use heuristics about their meaning,
>but it's not necessary.

We are not talking about guessing here. Having somebody tell
that there is a problem with her mail (address FOO), and you
as a sysadmin go looking for that address FOO in some of
your files has nothing to do with guessing.

And the argumentation about protocol elements doesn't really
work, of course it is a protocol element, but the whole effort
we are making here is just about making it readable for users
(at all levels, not just the end recipients).

>(One could argue that it would also be nice to
>know what MAIL FROM or Content-Disposition means in your native tongue
>rather than treating them as abstract protocol elements, which they
>obviously are.)

Administrators (not end users) can and do obviously learn these.
And there is a big difference between something being English, and
always being the same, whereas something actually being e.g. Japanese,
but mutilated beyond recognition.

Just think about it the other way round: If you were a sysadmin
and everything was in Chinese, you would probably pick up on the
character for 'MAIL FROM' rather quickly, but then if user
'John Smith' with address ***@foo.com came over with
a problem, you wouldn't want to have to put ***@foo.com
into a tool and then go hunt for the problem with some Chinese
characters.

>Anyway, I'm not making an absolutist argument here, that things are only
>protocol or text and never anything in between.

Rather than 'between', I'd say 'overlap'.

>I'm just pointing out
>that a Japanese mail admin certainly *could* get by just fine treating
>the LHS and RHS of <addr-spec> as opaque text, as they would have to do
>anyway for many of their users' correspondents' ASCII addresses.

Japanese don't treat ASCII email addresses as opaque. If you have
an address such as ***@dankohn.com, they don't have that many
problems recognizing the element in the address. It will take them
a lot longer than it will take the average native reader, but it's
not really a problem. And even Japanese users do feel much
better with ***@suzuki.name than with $%&^*@#%$((^&%(.
So why should we work on a system that makes things worse for them?

Regards, Martin.

tedd

2003-02-28 18:34:36 UTC

>At 00:00 03/02/28 +0000, Claus Faerber wrote:
>
>>Someone who does not know the Greek language and can't type Greek
>>characters might actually prefer entering xn--mxa (which is Nameprep and
>>Punycode of a single alpha); it's even easier than using ISO 14755 for
>>longer name components.
>
>Somebody who doesn't know Greek probably doesn't care to type Greek
>at all. I guess we should design IMAs so that Greek IMAs work best
>for Greeks, and Japanese IMAs work best for Japanese, and so on.
>Designing IMAs so that Greek IMAs work best for Japanese and Japanese
>IMAs work best for Greeks doesn't make sense.
>
>Regards, Martin.

Martin:

What about char sets that are not language specific, like
mathematical symbols? The Unicode character database does provide
many char sets that are not language specific -- look at symbol fonts
and even dingbats for sake of argument.

Shouldn't Greeks and Japanese (as well as everyone else) have easy
and equal access to those characters -- and to all others char sets
in the Unicode database as well? Designing things on a language
specific basis looks like a "no matter how many times you cut it,
it's still too short" type of thing.

tedd
--
http://sperling.com/

Martin Duerst

2003-02-28 19:12:04 UTC

At 13:34 03/02/28 -0500, tedd wrote:

[reordered]

>What about char sets that are not language specific, like mathematical
>symbols? The Unicode character database does provide many char sets that
>are not language specific -- look at symbol fonts and even dingbats for
>sake of argument.
>
>Shouldn't Greeks and Japanese (as well as everyone else) have easy and
>equal access to those characters -- and to all others char sets in the
>Unicode database as well? Designing things on a language specific basis
>looks like a "no matter how many times you cut it, it's still too short"
>type of thing.

Sorry for having created confusion by maybe stating my opinion in a
somewhat simplified fashion. Of course, if we would go so far
as to create different designs for Greek and Japanese, and so on,
we would end up in deep chaos. The idea is just that the system
is designed so that everybody can easily deal with the characters
they mostly use. If somebody is familiar with math symbols and
wants to use them for mail addresses (I personally doubt that
there will be much such use, but that's not the issue), then
they should be able to just use them, without having to go
through an ACE.

Regards, Martin.

>>At 00:00 03/02/28 +0000, Claus Faerber wrote:
>>
>>>Someone who does not know the Greek language and can't type Greek
>>>characters might actually prefer entering xn--mxa (which is Nameprep and
>>>Punycode of a single alpha); it's even easier than using ISO 14755 for
>>>longer name components.
>>
>>Somebody who doesn't know Greek probably doesn't care to type Greek
>>at all. I guess we should design IMAs so that Greek IMAs work best
>>for Greeks, and Japanese IMAs work best for Japanese, and so on.
>>Designing IMAs so that Greek IMAs work best for Japanese and Japanese
>>IMAs work best for Greeks doesn't make sense.
>>
>>Regards, Martin.

tedd

2003-03-01 15:25:02 UTC

>>Shouldn't Greeks and Japanese (as well as everyone else) have easy
>>and equal access to those characters -- and to all others char sets
>>in the Unicode database as well? Designing things on a language
>>specific basis looks like a "no matter how many times you cut it,
>>it's still too short" type of thing.
>
>Sorry for having created confusion by maybe stating my opinion in a
>somewhat simplified fashion. Of course, if we would go so far
>as to create different designs for Greek and Japanese, and so on,
>we would end up in deep chaos. The idea is just that the system
>is designed so that everybody can easily deal with the characters
>they mostly use. If somebody is familiar with math symbols and
>wants to use them for mail addresses (I personally doubt that
>there will be much such use, but that's not the issue), then
>they should be able to just use them, without having to go
>through an ACE.
>
>Regards, Martin.

Martin:

I believe that the answer will come from commercial software
designers and not from the creators of ACE (or whatever the end
algorithm will be). I believe that whatever encoding is used (i.e.,
xx--whatever) to stand-in for a seven-bit to eight-bit conversion
will be converted to/from the user via Internet end-user software. I
don't think the end-user will ever have to see, or understand, what
an ACE-like encoded string is.

Hardware and software developers clearly want a global market and
adopting an Unicode-like database is one way, if not the only way, to
accomplish that goal. Likewise, adopting an ACE-like algorithm for
delivering an eight-bit message via a seven-bit medium has been the
only way to fulfill that goal without trashing the net in the process.

The end result I envision, as do many, is a Chinese sitting before
his keyboard using a Chinese char set to converse with his brethren
with absolutely no regard for, nor reliance upon, English -- all
AMC-like mechanics (i.e., the Latin alphanumerics) will be completely
transparent to him. As a side note, his preference to his language
char set will be as easy for him to set as it is for us to change
from a Helvetica to a Times font. Please note that this is will done
via end-user software and not through the efforts of this group or
any group like it.

Eventually, I believe that the net will adopt an eight-bit format (or
greater) and all this punycode and other such "make-fit" conversions
will be nothing more than a uncomfortable growing-pains footnote in
history. But until then, we will have to make due with what's
available, and in doing so, provide opportunity for others to solve
and meet end-user needs.

Now with that said, this list is a discussion as to what to do with
the LHS of email addresses. I claim that keeping/treating both sides
the same will simplify and speed the process for developers and will
get this global opportunity to the end-user sooner. However, this
will (and may unnecessarily) limit opportunities for the global end
user. Keep in mind, that case considerations made by us (the English
speaking people) does not have the same implications as it does for
the rest of the world. In other words, we have made a distinction
that UC/LC means something and have extended, or rather imposed, that
limitation globally. So, the question to this list is -- should we
continue to impose the same restraints on the LHS as we have for the
RHS -- or should we consider that the LHS of the argument different
and be treated with less restriction and thus more opportunity --
opportunity, I might add, which is not without problems in
implementation.

tedd
--
http://sperling.com/

J-F C. (Jefsey) Morfin

2003-03-01 20:51:31 UTC

At 16:25 01/03/03, tedd wrote:
>So, the question to this list is -- should we continue to impose the same
>restraints on the LHS as we have for the RHS --

No.

> or should we consider that the LHS of the argument different and be
> treated with less restriction and thus more opportunity -- opportunity, I
> might add, which is not without problems in implementation.

Definitly yes.
This is the only users' interest in this dscussion (user being end users
and application/inter-application developpers).
jfc

99 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Roy Badami 2003-02-15 15:40:40 UTC

Roy Badami 2003-02-15 17:48:04 UTC

Paul Hoffman / IMC 2003-02-15 18:37:46 UTC

Roy Badami 2003-02-15 19:32:21 UTC

John C Klensin 2003-02-15 21:46:35 UTC

Roy Badami 2003-02-15 22:18:05 UTC

Martin Duerst 2003-02-15 23:37:24 UTC

Roy Badami 2003-02-16 13:48:42 UTC

Martin Duerst 2003-02-16 15:33:21 UTC

Jeffrey J Zahari 2003-02-19 03:51:13 UTC

Martin Duerst 2003-02-19 14:45:48 UTC

Paul Hoffman / IMC 2003-02-19 16:47:11 UTC

Roy Badami 2003-02-19 20:03:13 UTC

Paul Hoffman / IMC 2003-02-19 21:40:34 UTC

Jeffrey J Zahari 2003-02-20 03:39:10 UTC

Marc Mutz 2003-02-20 13:19:54 UTC

J-F C. (Jefsey) Morfin 2003-02-20 15:44:05 UTC

Paul Hoffman / IMC 2003-02-20 16:58:15 UTC

Edmon Chung 2003-02-20 17:32:53 UTC

Martin Duerst 2003-02-20 18:28:52 UTC

Edmon Chung 2003-02-20 20:58:27 UTC

Simon Josefsson 2003-02-20 21:41:03 UTC

Edmon Chung 2003-02-20 23:06:56 UTC

Simon Josefsson 2003-02-20 23:19:09 UTC

Jeffrey J Zahari 2003-02-21 02:52:33 UTC

Simon Josefsson 2003-02-21 12:16:39 UTC

James Seng 2003-02-24 04:49:04 UTC

Edmon Chung 2003-02-24 07:05:44 UTC

Paul Hoffman / IMC 2003-02-24 15:59:54 UTC

Simon Josefsson 2003-02-24 18:53:25 UTC

Paul Hoffman / IMC 2003-02-24 19:14:15 UTC

Simon Josefsson 2003-02-24 19:53:07 UTC

Paul Hoffman / IMC 2003-02-24 23:24:57 UTC

Edmon Chung 2003-02-25 00:03:11 UTC

Paul Hoffman / IMC 2003-02-25 00:33:22 UTC

Edmon Chung 2003-02-25 01:06:36 UTC

Simon Josefsson 2003-02-25 12:50:24 UTC

Edmon Chung 2003-02-25 15:42:10 UTC

Paul Hoffman / IMC 2003-02-25 16:37:48 UTC

Simon Josefsson 2003-02-25 21:13:56 UTC

Paul Hoffman / IMC 2003-02-25 22:56:03 UTC

Mark Davis 2003-02-25 23:36:10 UTC

Edmon Chung 2003-02-26 02:20:19 UTC

Mark Davis 2003-02-26 03:08:29 UTC

Martin Duerst 2003-02-26 17:59:47 UTC

Edmon Chung 2003-02-26 23:38:43 UTC

Martin Duerst 2003-02-27 15:59:45 UTC

Edmon Chung 2003-02-27 16:58:20 UTC

Edmon Chung 2003-02-27 19:42:56 UTC

Mark Davis 2003-02-27 22:42:41 UTC

Edmon Chung 2003-02-27 19:45:58 UTC

Claus Färber 2003-02-28 00:00:00 UTC

Edmon Chung 2003-03-01 00:09:52 UTC

Martin Duerst 2003-02-27 19:47:53 UTC

Edmon Chung 2003-02-27 19:59:22 UTC

Simon Josefsson 2003-02-26 11:38:00 UTC

Paul Hoffman / IMC 2003-02-26 20:57:34 UTC

Simon Josefsson 2003-02-26 23:10:32 UTC

Jeffrey J Zahari 2003-02-27 08:00:51 UTC

D. J. Bernstein 2003-02-25 02:44:17 UTC

D. J. Bernstein 2003-02-25 02:57:46 UTC

Martin Duerst 2003-02-20 00:07:19 UTC

Marc Mutz 2003-02-15 20:58:41 UTC

Roy Badami 2003-02-15 22:27:52 UTC

Marc Mutz 2003-02-15 22:55:35 UTC

Martin Duerst 2003-02-16 00:30:34 UTC

Dan Kohn 2003-02-26 23:54:37 UTC

Edmon Chung 2003-02-27 01:21:34 UTC

Lawrence Greenfield 2003-02-27 03:48:53 UTC

D. J. Bernstein 2003-02-27 05:12:07 UTC

Martin Duerst 2003-02-27 16:42:07 UTC

Claus Färber 2003-02-27 00:00:00 UTC

Edmon Chung 2003-02-27 17:59:03 UTC

Dan Kohn 2003-02-27 04:37:17 UTC

Jeffrey J Zahari 2003-02-27 07:30:14 UTC

D. J. Bernstein 2003-02-27 10:22:49 UTC

Jeffrey J Zahari 2003-02-28 05:10:58 UTC

J-F C. (Jefsey) Morfin 2003-02-27 14:18:02 UTC

Marc Mutz 2003-02-27 19:13:50 UTC

Martin Duerst 2003-02-28 18:58:01 UTC

Martin Duerst 2003-02-27 16:32:02 UTC

Martin Duerst 2003-02-27 16:58:43 UTC

Dan Kohn 2003-02-27 22:35:20 UTC

D. J. Bernstein 2003-02-27 23:21:35 UTC

Claus Färber 2003-02-28 00:00:00 UTC

Martin Duerst 2003-02-28 16:30:52 UTC

Claus Färber 2003-02-28 00:00:00 UTC

Martin Duerst 2003-02-28 22:33:33 UTC

Tan Tin Wee 2003-03-01 03:23:09 UTC

Claus Färber 2003-03-02 00:00:00 UTC

Adam M. Costello 2003-03-03 00:19:01 UTC

Martin Duerst 2003-03-03 03:04:15 UTC

Martin Duerst 2003-03-03 03:12:15 UTC

Dan Kohn 2003-02-28 03:12:05 UTC

Lawrence Greenfield 2003-02-28 03:32:34 UTC

Martin Duerst 2003-02-28 16:26:32 UTC

tedd 2003-02-28 18:34:36 UTC

Martin Duerst 2003-02-28 19:12:04 UTC

tedd 2003-03-01 15:25:02 UTC

J-F C. (Jefsey) Morfin 2003-03-01 20:51:31 UTC

about - legalese

Loading...