This message responds to messages by Roy Badami and Claus Färber.
Post by Roy BadamiWhen dequoting/requoting localparts, should we consider recognizing
fullwidth double quotes and fullwidth backslash (and any other
double-quote-like and backlash-like characters)?
It seems to me that the arguments for this are similar to those for
fullwidth dot and fullwidth at, and once we decide to recognize
metacharacters in fullwidth form, we should apply this consistently to
*all* metacharacters.
I don't think the arguments are sufficiently similar.
For one thing, the dots and at-signs that delimit a mail address are
not metacharacters. They are part of the address, and they serve a
standard function in all mail addresses in all contexts. Metacharacters
are characters that are not actually part of the string they appear
in. Examples are quote characters, wildcard characters, macro-expansion
characters, etc.
The motivating example for requiring the recognition of various dots
and at-signs as separators in IDNs and IMAs is this: If I can type an
address into my IMAA-aware application and it works, then I expect to be
able to type the address into a message body, mail it to you, and have
you paste it into your IMAA-aware application, and have it work.
We cannot guarantee success, but standardizing the most common dots and
at-signs gets us 99% of the way there.
But local parts that require quotation are fundamentally more difficult,
even with today's ASCII local parts. Although there is a standard
quotation mechanism for local parts in message headers and SMTP
commands, there is no standard quotation mechanism for user interfaces.
Some user agents might copy the user input directly into the header
(relying on the user to supply any needed quotation), others might
assume the user input is literal and add more quotation if needed,
and others might allow users to use some other quotation mechanism
altogether, which the agent undoes before applying the 822-style
quotation. There's no standard, so we can't expect local parts
requiring quotation to be mailable and paste-able, even in today's ASCII
world. It would be a wasted effort to try to standardize the Unicode
variants of non-standard ASCII metacharacters.
Post by Roy Badamiquotes can appear on business cards.
They can, but anyone who puts such an address on a business card must
not be very concerned about being reachable (for the reasons above).
Aside from the futility argument, it would probably be overstepping our
authority to try to standardize Unicode variants of metacharacters.
It's not hard to imagine that local parts might be found in contexts
where dequoting them involves undoing %hex escapes or &ent; escapes.
Should we try to insist that fullwidth % and fullwidth & should be
recognized as introducing those escape sequences? Of course not, that
would almost surely contradict the relevant standards.
Post by Roy BadamiJust do a NFKC normalisation at the very beginning
Not before dequoting, for the reason given in the preceeding paragraph.
Metacharacters are context-dependent and out of our jurisdiction, and
need to be removed before we even have a string to work with.
Applying NFKC after dequoting, but before subdividing the local part, is
okay.
Post by Roy BadamiFor IMAA, it suffices to specify that implementations MUST accept
all characters as delimiters that decompose to one of our delimiters
during NFKC-with-U+3002-to-U+002E normalisation and that the
delimiters MUST be normalised.
The easiest way to implement this is an additional normalisation at
the very beginning.
I'm not confident that the first paragraph is exactly equivalent to
the second. Normalization is very subtle. If the latter is what you
have in mind, it might be best to specify that, and leave it up to the
optimizers to prove the existence of a shortcut if there is one.
By the way, I'm not sure the CJK community would want ideographic full
stop mapped to full stop inside the local part. They might prefer the
ability to have genuine ideographic full stops in there.
Post by Roy BadamiAre you saying we can do a normalization of the entire e-mail address
without violating IDNA (which specifies that the domain be split on
dot-like characters before normalization).
IDNA requires that normalization happen as part of the processing of
each individual label, but it doesn't say there must never have been a
previous normalization step. IDNA does not specify exactly how a domain
name is split into labels (because it depends on context). In some
situations normalization could be a part of, or a precursor to, that
splitting operation.
AMC