Dan Oscarsson
2003-04-23 06:53:07 UTC
Reading through the IMAA draft I get the same mess of protocol and UI
that I got in IDNA. Also the focus is very much on legacy handling.
I would like to start with an international focus.
I can see three basic areas:
- legacy protocol context (current ASCII context)
- international protocol context
- user interface
I will not say anything more about user interface. The importat area to
start is with the international context. And protocol level.
When I look at mail addresses I start looking at what is needed
to make mail addresses used in an international context as easy as
in the legacy context. When that is done, I can start looking on how
to encode mail addresses from international context to be sent through
legacy context.
In an international context we have: Standard mail address
in the legacy context: Legacy mail address
The Standard mail address must have to be as easy to use as legacy mail address:
- local-***@domain
- UCS
- precomposed required if available for a character
- unambigous code points
- simple case-insensitive matching
The above do not mean NFKC. It more means NFC with all characters having
multiple code points in UCS replaced by one (or alternatie forms forbidden).
It also means that case-insensitively is done by simple one-to-one
character case-insensitively, probably including the SC/TC matching.
NFKC cannot be used as it destroys data.
The above means that there is only the ASCII @ code point separating
local-part from domain part. And no full width dots separating domain
labels. Only ONE code point per character. Use of full width (or green colour)
is a user interface matter and do not belong in a protocol context.
With the above requirement mail addresses in international context is as easy
to handle as in legacy context.
When you have to change between international and legacy context you have
to encode non-ASCII into ASCII or the other way around. An ACE need to
be used. It need not be Punycode with is complex. It could be SCSU with
hex encoding which also give a fairely compact encoding. For the
domain part IDNA will have to be used even if it cannot encode all
domain names.
During encoding into ASCII a Standard mail address may not be changed,
not lower cased or in some other way, so all sematics is preserved.
Two mail addresses are equal if they in Standard mail address form are
equal.
Using the above form there are no problems in having both case-sensitive
and case-insensitive mail addresses.
I hope this can get us to start at the needs of the international
user instead of the needs of the legacy protocols and applications.
Dan
NOTE: I do not use the words "internationalised mail address" or
"traditional mail address" used in the IMAA draft.
The traditional mail address could be "local-***@domain" and not restricted
to ASCII. Internationalised is often used on applications and means
"make possible to handle international characters". You could call
the IDNA form of domain names for internationalised legacy domain names.
But the mail address, domain name or URL are all in international format
from the beginning. They cannot be internationalised because they already
are.
that I got in IDNA. Also the focus is very much on legacy handling.
I would like to start with an international focus.
I can see three basic areas:
- legacy protocol context (current ASCII context)
- international protocol context
- user interface
I will not say anything more about user interface. The importat area to
start is with the international context. And protocol level.
When I look at mail addresses I start looking at what is needed
to make mail addresses used in an international context as easy as
in the legacy context. When that is done, I can start looking on how
to encode mail addresses from international context to be sent through
legacy context.
In an international context we have: Standard mail address
in the legacy context: Legacy mail address
The Standard mail address must have to be as easy to use as legacy mail address:
- local-***@domain
- UCS
- precomposed required if available for a character
- unambigous code points
- simple case-insensitive matching
The above do not mean NFKC. It more means NFC with all characters having
multiple code points in UCS replaced by one (or alternatie forms forbidden).
It also means that case-insensitively is done by simple one-to-one
character case-insensitively, probably including the SC/TC matching.
NFKC cannot be used as it destroys data.
The above means that there is only the ASCII @ code point separating
local-part from domain part. And no full width dots separating domain
labels. Only ONE code point per character. Use of full width (or green colour)
is a user interface matter and do not belong in a protocol context.
With the above requirement mail addresses in international context is as easy
to handle as in legacy context.
When you have to change between international and legacy context you have
to encode non-ASCII into ASCII or the other way around. An ACE need to
be used. It need not be Punycode with is complex. It could be SCSU with
hex encoding which also give a fairely compact encoding. For the
domain part IDNA will have to be used even if it cannot encode all
domain names.
During encoding into ASCII a Standard mail address may not be changed,
not lower cased or in some other way, so all sematics is preserved.
Two mail addresses are equal if they in Standard mail address form are
equal.
Using the above form there are no problems in having both case-sensitive
and case-insensitive mail addresses.
I hope this can get us to start at the needs of the international
user instead of the needs of the legacy protocols and applications.
Dan
NOTE: I do not use the words "internationalised mail address" or
"traditional mail address" used in the IMAA draft.
The traditional mail address could be "local-***@domain" and not restricted
to ASCII. Internationalised is often used on applications and means
"make possible to handle international characters". You could call
the IDNA form of domain names for internationalised legacy domain names.
But the mail address, domain name or URL are all in international format
from the beginning. They cannot be internationalised because they already
are.