Discussion:
Protection of hyphen
Roy Badami
2003-04-22 01:19:08 UTC
Permalink
If it's worth protecting punctuation characters at all, then I think
hyphen comes right at the top of the list of characters that it is
useful to protect (probably tying with plus in importance).

Indeed, the draft give the example of owner-listname (1.3) which isn't
actually protected by the current proposal.

One simple approach would be to leave underscore unprotected instead,
use xn__ as the ACE prefix and to use _ as the delimiter in
bootstring. Underscore is one of the few characters that would be
safe to use in this way, since it has longstanding use in e-mail
addresses of the form initials_surname.

(I confess however that I don't understand the explanation of the
infix solution to this problem mentioned in the draft; perhaps an
example would be helpful?)

Of course, others may object underscore is also used in structured
localparts, but it's undoubtedly less common than hyphen.

-roy
Paul Hoffman / IMC
2003-04-22 01:38:13 UTC
Permalink
Post by Roy Badami
If it's worth protecting punctuation characters at all, then I think
hyphen comes right at the top of the list of characters that it is
useful to protect (probably tying with plus in importance).
Indeed, the draft give the example of owner-listname (1.3) which isn't
actually protected by the current proposal.
Note that "owner-listname" is a prime example of why we might *not*
want to protect the hyphen. The hyphen in that name is not for making
subaddresses: it is just there for preventing the two names from
smooshing together.
Post by Roy Badami
One simple approach would be to leave underscore unprotected instead,
use xn__ as the ACE prefix and to use _ as the delimiter in
bootstring. Underscore is one of the few characters that would be
safe to use in this way, since it has longstanding use in e-mail
addresses of the form initials_surname.
"longstanding use" is probably an overstatement. Some people use it
for that, others don't. It can be said to be a rarely-used character,
and is certainly not used in normal non-computer writing.

The advantage of keeping the hyphen is again simplicity and
parallelism with IDNA. Do we lean in one direction and move away from
IDNA in a possibly-confusing fashion, or do we lean in the other
direction and look very much like IDNA but possibly lose some
perceived functionality? Right now, we have taken the second
approach, but we are open to hearing other opinions.

--Paul Hoffman, Director
--Internet Mail Consortium
Roy Badami
2003-04-22 11:27:55 UTC
Permalink
Post by Paul Hoffman / IMC
Note that "owner-listname" is a prime example of why we might *not*
want to protect the hyphen. The hyphen in that name is not for making
subaddresses: it is just there for preventing the two names from
smooshing together.
But some software (eg sendmail) has special knowledge of the
owner-listname construct.

I wonder how much mailing list management software has things like
owner-listname or listname-request hardwired? (Note: I'm not saying
that this is a big problem in practice; I'm just posing the question.)

Also hyphen is the default delimiter for subaddressing in qmail
(though this is easily changed, so may not matter so much).
Post by Paul Hoffman / IMC
Post by Roy Badami
One simple approach would be to leave underscore unprotected instead,
use xn__ as the ACE prefix and to use _ as the delimiter in
bootstring. Underscore is one of the few characters that would be
safe to use in this way, since it has longstanding use in e-mail
addresses of the form initials_surname.
"longstanding use" is probably an overstatement. Some people use it
for that, others don't. It can be said to be a rarely-used character,
and is certainly not used in normal non-computer writing.
My point was that underscore was common enough in e-mail addresses
long enough ago that underscore is unlikely to be incompatible with
existing e-mail gateways or non-Internet e-mail clients. In
particular, I seem to recall regularly seeing initials_surname in the
days before initials.surname became popular.

If ToASCII is going to introduce punctuation that wasn't present in
the original address, it is highly desirable that this does not
involve unusual characters that may be incompatible with existing
software. (Hyphen or underscore are unlikely to pose problems; other
choices may be less safe.)
Post by Paul Hoffman / IMC
The advantage of keeping the hyphen is again simplicity and
parallelism with IDNA. Do we lean in one direction and move away from
IDNA in a possibly-confusing fashion, or do we lean in the other
direction and look very much like IDNA but possibly lose some
perceived functionality? Right now, we have taken the second
approach, but we are open to hearing other opinions.
I'm currently undecided on the issue of structured local parts. The
proposed solution just struck me as a slightly odd compromise, in that
its stated aim is to protect structured local parts, but it fails to
protect one of the more common separators.

I can certainly see the benefits in the current proposal, though, even
if protecting hyphen is considered too much work for too little gain.

In particular, preserving % and ! addressing formats (as the current
proposal does) may be important to some people. Even though such
constructs are considered obsolete on the networks that most of us
use, they _may_ be needed (I don't know) by people in countries where
the network infrastructure is less developed. (Then again, maybe IMAs
may not be considered a priority by users of such networks?)

-roy
Martin Duerst
2003-04-23 01:35:49 UTC
Permalink
Post by Roy Badami
Post by Paul Hoffman / IMC
Note that "owner-listname" is a prime example of why we might *not*
want to protect the hyphen. The hyphen in that name is not for making
subaddresses: it is just there for preventing the two names from
smooshing together.
But some software (eg sendmail) has special knowledge of the
owner-listname construct.
I wonder how much mailing list management software has things like
owner-listname or listname-request hardwired? (Note: I'm not saying
that this is a big problem in practice; I'm just posing the question.)
Also hyphen is the default delimiter for subaddressing in qmail
(though this is easily changed, so may not matter so much).
Can it be changed on a per-installation basis, or on a per-user
basis, or on a per-main-address basis? Can it be changed by the
user, or does it need system priviledges?

If we need a sysadmin to change it, and it applies to all users
and all existing and future addresses, then this is a serious problem.
If each user can change it, per address (many users will want to
have both a traditional ASCII address and an IMA), then that's
probably okay.


Regards, Martin.
John C Klensin
2003-04-23 16:28:26 UTC
Permalink
--On Tuesday, 22 April, 2003 21:35 -0400 Martin Duerst
Post by Martin Duerst
Post by Roy Badami
Post by Paul Hoffman / IMC
Note that "owner-listname" is a prime example of why we
might *not* want to protect the hyphen. The hyphen in that
name is not for making subaddresses: it is just there for
preventing the two names from smooshing together.
But some software (eg sendmail) has special knowledge of the
owner-listname construct.
I wonder how much mailing list management software has things
I'm not saying that this is a big problem in practice; I'm
just posing the question.)
Also hyphen is the default delimiter for subaddressing in
qmail (though this is easily changed, so may not matter so
much).
Can it be changed on a per-installation basis, or on a per-user
basis, or on a per-main-address basis? Can it be changed by the
user, or does it need system priviledges?
If we need a sysadmin to change it, and it applies to all users
and all existing and future addresses, then this is a serious
problem. If each user can change it, per address (many users
will want to have both a traditional ASCII address and an
IMA), then that's probably okay.
Martin,

It is wired into code in many places, i.e., not even the
sysadmin can change it without obtaining source and recompiling.
In many systems, establishing a mail alias (whether for a list
or something else) and its definition are necessarily a sysadmin
function, since such addresses are essentially equivalent to
local-system accounts -- while different lists could, in
principle, have different conventions, differences in
conventions drive sysadmins crazy and are unlikely to be adopted
unless there is compelling need. Similarly, the
listname-request convention is all over the network. It is one
of our oldest naming conventions -- certainly predating
firstname.lastname, initial_lastname, and even ftp.foo.bar.
Worse, both the owner-listname (and listname-owner) strings are
processed by things that are not strictly MTAs (in the narrow
definition of being at one side or the other of SMTP
transactions). They are, instead, embedded in user's minds,
macro functions in address books, server-side (e.g., SIEVE,
procmail, and dozens of arrangements for which there has been no
attempt at standardization) and client-side mail-filtering
subsystems.

In some of these cases, the local parts will have been decoded
from (or to) the ACE forms outside the processing that will need
to interpret the convention and, hence, it just won't make any
difference. But, to the extent that any of these things are
handled as MTA functions -- either the relevant strings and
delimiters will need to be "protected", or the MTAs will need to
be hacked up sufficiently that the advantages of an MUA-based
system will largely disappear.

regards,
john

Adam M. Costello
2003-04-22 03:44:06 UTC
Permalink
Post by Paul Hoffman / IMC
Note that "owner-listname" is a prime example of why we might *not*
want to protect the hyphen. The hyphen in that name is not for making
subaddresses: it is just there for preventing the two names from
smooshing together.
I don't understand your point. If foo is an ACE, and if hyphen is not
protected, then owner-foo is not going to be displayed intelligibly.

Maybe that's not a compelling problem, but it is at least a small reason
in favor of protecting hyphen, not a reason against.

(By the way, for anyone not familiar with it, the owner-foo convention
is built into various mailing list servers and into sendmail. When
sending mail to an alias foo, they often put owner-foo as the return
path.)
Post by Paul Hoffman / IMC
One simple approach would be to leave underscore unprotected
instead, use xn__ as the ACE prefix and to use _ as the delimiter in
bootstring.
(I confess however that I don't understand the explanation of the
infix solution to this problem mentioned in the draft; perhaps an
example would be helpful?)
Suppose we want to protect all nonalphanumeric ASCII characters. The
infix could be, say, 8iesg8 (which contains no protected characters).
(Of course iesg is a placeholder for something shorter to be chosen
later.)

Now suppose we have the local part <pafii>de<runba> (example Q from the
Punycode spec). The Punycode encoder outputs de-jg4avhby1noc0d. We
change the hyphen to the infix, yielding de8iesg8jg4avhby1noc0d. To
decode that, we search for the infix, change it to a hyphen, and apply
the Punycode decoder.

Now consider <sono><supiido><de> (example R from the Punycode spec).
The Punycode encoder outputs d9juau41awczczp. That contains no hyphen,
so we prepend the infix, resulting in 8iesg8d9juau41awczczp. To decode
that, we search for the infix, notice that it appears at the very
beginning, and therefore simply remove it, then apply the Punycode
decoder.

The underscore solution is a little simpler, but it fails to protect
underscore. Maybe no one would care about protecting underscore, I
don't know.

AMC
Dan Oscarsson
2003-04-23 06:13:04 UTC
Permalink
Post by Roy Badami
But some software (eg sendmail) has special knowledge of the
owner-listname construct.
I wonder how much mailing list management software has things like
owner-listname or listname-request hardwired? (Note: I'm not saying
that this is a big problem in practice; I'm just posing the question.)
The "owner-" construct is used by smart auto reply software to avoid
giving an auto reply to mailing lists.

To avoid breaking things in use I would recommend encoding each
segment composed of ascii letters, digits and all non-ascii characters,
or each character separately.

Dan
Loading...