what is the real problem?

Discussion:

Keith Moore

2003-11-11 20:14:52 UTC

Here I attempt to define the problems that IMAA needs to solve from a
user's perspective:

1. Users need to have email addresses that are easy to remember and can
be reliably transcribed from memory. That's the reason we like to use
people's names in email addresses - because it makes them easy to
remember. ASCII addresses are not adequate because the ASCII
repertoire is not sufficient to express people's names in most of the
world's languages.

2. Users need to have email addresses that can easily be transcribed
from written or printed form, so they can be copied from business
cards, handwriting on paper, etc.
ASCII addresses do not always suffice because these characters may be
difficult to generate on the keyboards that are commonly in use in some
parts of the world, and because people unused to generating Latin
characters on their keyboards can easily confuse Latin characters with
similar characters in Greek, Cyrillic, or other alphabets.

3. Users need to have email addresses that can be transcribed from
sounds - e.g. read over a telephone. This is harder in some languages
than in others, but even in most languages where this works, ASCII is
not adequate because people may not know or recognize the names of
Latin or special characters (even if they could type them).

So, given a suitable input device, a user who is skilled in a language
and a writing system for that language should be able to:

a. With a high probability of success, correctly transcribe an email
address that uses a person's name and a well-known domain string from
that language and writing system,

b. With a high probability of success, correctly transcribe an email
address from that language and writing system that is written or
printed on paper,

c. With a high probability of success, correctly transcribe an email
address from sounds spoken over the telephone by another user with
adequate skills in that language, approximately to the extent that
those users could successfully transcribe the same address over the
telephone onto paper

Other constraints:

4. IMAs must have a low probability of causing operational failures
with existing mail software - MTAs, MUAs, mailing lists, automatic
responders, etc. - and other software which uses email addresses
(directories, address books, security protocols, etc.).

5. All ordinary mail functions - replies, forwarding, resending,
mailing lists, must continue to work in the presence of any mixture of
IMA-capable software and legacy software.

6. Users need to be able to transcribe addresses that they receive in
email, whether in message headers or in a message body, and whether or
not their software supports the extensions that enable IMAs, and
whether or not the recipient knows the language and script in which the
sender normally writes his name and email address. This is separate
from the need to reply to messages or store received addresses in an
address book.

(Note: #6 directly implies the need to support multiple versions of a
sender address, in order to provide a recipient with an address that he
can transcribe.)

Keith

John Cowan

2003-11-11 21:15:06 UTC

Permalink

Keith Moore scripsit:

[much Good Stuff snipped]

Post by Keith Moore
6. Users need to be able to transcribe addresses that they receive in
email, whether in message headers or in a message body, and whether or
not their software supports the extensions that enable IMAs, and
whether or not the recipient knows the language and script in which the
sender normally writes his name and email address. This is separate
from the need to reply to messages or store received addresses in an
address book.
(Note: #6 directly implies the need to support multiple versions of a
sender address, in order to provide a recipient with an address that he
can transcribe.)

This requirement strikes me as unreasonably broad: it demands not merely
multiple versions, but as many multiple versions as there are scripts.
I do not think that senders need to provide email addresses in any scripts
except those used to write languages they know, plus Basic Latin (ASCII).

--
John Cowan ***@reutershealth.com www.reutershealth.com ccil.org/~cowan
Dievas dave dantis; Dievas duos duonos --Lithuanian proverb
Deus dedit dentes; deus dabit panem --Latin version thereof
Deity donated dentition;
deity'll donate doughnuts --English version by Muke Tever
God gave gums; God'll give granary --Version by Mat McVeagh

Keith Moore

2003-11-11 23:23:52 UTC

Permalink

Post by John Cowan

This requirement strikes me as unreasonably broad: it demands not merely
multiple versions, but as many multiple versions as there are scripts.

not quite - all that is required is that there be one address that is
transcribable by any potential recipient. but for me it's an open
question as to whether a Latin alphabet / ASCII-encoded fallback is
sufficient, given the absence of those characters on many of the
world's keyboards.

John Cowan

2003-11-12 01:48:04 UTC

Permalink

Post by Keith Moore
not quite - all that is required is that there be one address that is
transcribable by any potential recipient. but for me it's an open
question as to whether a Latin alphabet / ASCII-encoded fallback is
sufficient, given the absence of those characters on many of the
world's keyboards.

If you can't count on Basic Latin, I very much doubt that there is
anything you can count on. "Implement zero [not useful], one [Basic Latin]
or every possible way to do something."

--
Some people open all the Windows; John Cowan
wise wives welcome the spring ***@reutershealth.com
by moving the Unix. http://www.reutershealth.com
--ad for Unix Book Units (U.K.) http://www.ccil.org/~cowan
(see Loading Image...

)

Keith Moore

2003-11-12 02:01:19 UTC

Permalink

Post by John Cowan

If you can't count on Basic Latin, I very much doubt that there is
anything you can count on.

digit strings. no, I don't like that idea, and I'm rather hoping that
a Latin fallback is sufficient. but I can easily imagine situations
where it might not be sufficient.

Keith

J-F C. (Jefsey) Morfin

2003-11-12 03:29:12 UTC

Permalink

Content-Transfer-Encoding: 7bit

Post by John Cowan

If you can't count on Basic Latin, I very much doubt that there is
anything you can count on.

digit strings. no, I don't like that idea, and I'm rather hoping that a
Latin fallback is sufficient. but I can easily imagine situations where
it might not be sufficient.

May I suggest that the difficulty comes here too from wanting to do too
many things at the same layer. Please consider there are two _different_
and _ independent_ layers.

1. the infromation used by the networks protocols
2. the information used by the people

Up to now the character set used by the protocol was the same as the
charcacter set used by the people. So we did not need a presentation tool.
IDNA was the first one. Now we want to make the presentation more beautiful
for those wanting/needing it. OK, let do it. This is at the people layer,
this does not change anythng at the network layer level, unless you want to
bundle them.

I will take an example we all know. Numeric Telephone keypads also wear
letters, so you can chose a number for the way it spells. If typing a
company slogan on your telephone pad is good for their marketing, this does
not change anything for the telephone network. It still uses digits.

IMHO Vint proposed the right approach. To consider that the internet uses
numerics only, ranging from 0 to Z. And that cute American people managed
to get themselves a language which permits to memorize some of these
numbers easily as names. (what are the numbers in Chinese for "ali-baba"?)

Keith's demands for the users are missing many requirements because users
first want to live with their system, to be polite, to be legal, to
organize themselves, to get aliases, etc. etc. This what I name vernacular.
But it also spells a few very good things, including that I must be able to
understand what the other person says on the phone and to translate it into
a mail name. This is the way the world works : we could not fly if pilots
the world over were not spelling names from "zero to niner and from alpha
to zulu". These are no Latin characters, these are universal figures from
zero to 36 we also used in scott and morse. Add the dash and the dot if you
want, this is OK everyone knows. You will note that the world over what the
people noticed when they started using mail names, was that there was a new
"figure": @. They made it the symbole of the net.

So there are 40 figures available: 0 9, A-4 - . and @. Let accept and
document it, and let help people do what they want with them. This is far
more an OPES matter than an e-mail or a DNS matter.

For example, they want to enter multilingual domain names with TLD in
Chinese. The solution is not to go lat.root, to create new TLDs, or to
build and disseminate plug-ins. It is to get an OPES at the proper place to
replace ".COM" in Chinese by ".com" in 'network scriping". Watever the way
"COM" may spell in the 5000 languages of the world, the only problem is to
make sure that "NET" Japanese does not conflict with "COM" in Polish.

This will give time to totally rething an e-mail system where spam will not
fit. What is the real use to work on a system people start quitting ?

jfc

Nathaniel Borenstein

2003-11-12 16:02:31 UTC

Permalink

I think that it might help to frame the problem of email addresses in
the larger cultural context.

Globalization is a very real force that is bringing enormous changes --
some good, some bad -- to most human societies. Many people from
non-English-speaking (and particularly non-Western) cultures are afraid
that their entire language and culture will be swept away by a
techno-cultural wave of which the Internet is an irresistible symbol.
Kids in the developing world think of English as the language of the
Net, and the Internet architecture itself is widely perceived as saying
"speak English or live in a second-class ghetto." And then there's
email addresses, which add the ultimate insult to this injury, telling
a majority of the world's people that, in essence, they have to change
their names -- or at least the way they write them -- if they want to
play the Internet game.

I restate this perception in the belief that an analysis of the
underlying dynamic sheds light on the appropriate goals. We need to
mute the cultural hegemony implied by making people essentially rename
themselves for email in their own language. But that doesn't mean we
have to pretend we're unaware of the unique and still growing role
played by English in the world, nor of the important technical role
ASCII plays as the minimal and universal character set supporting the
primary language of international communication.

In that light, I see no reason to expand the IMA goal to some of the
extremes I have heard mentioned. If you have person A who speaks only
Korean and person B who speaks only Hindi, I really don't think we need
to worry much about how they enter/input *each other's* email
addresses, because they have nothing to say to each other anyway,
lacking a common language. Making them refer to each other by
latinized email addresses is simply no big deal if they don't speak a
common language to begin with. On the other hand, when a
Korean-speaker and a Hindi-speaker *do* communicate, the odds are
overwhelming -- and growing -- that they will do so in English, and
will write down each other's names in latin characters, which they
learned when they learned English.

I think, therefore, that we have to recognize the larger realities of
global culture by restating the IMA goal as two distinct goals:

1. People need to be able to use names in their own script when
communicating with others who share their language.

2. People need to be able to map (or at least alias) their personal
names into the evolving shared language of global communication --
English-in-ASCII -- for communication across linguistic boundaries.

In particular, I think we can put aside considerations of whether the
poor Mongolian peasant needs to learn ASCII -- the answer should be no
if he only wants to communicate in Mongolian, but probably yes
otherwise. This is a non-technical reality, for the most part, which
we can rely on in the standards. We don't want to require ASCII for
communication between two people who share a language, but it is
completely legitimate to require it as a lingua franca for any
cross-language communications. (And yes, that means that Americans get
lucky, but how surprising is that?)

Framing the problem this way, I think, makes the problem a tad more
practical but has some interesting implications. The person at the
Korean keyboard is going to need to be able to use ASCII characters for
communicating with non-Koreans, and can be expected (long term) to have
access to the ASCII characters if he needs them. However, this doesn't
mean that he can even be expected to be able to type a name like Jose
or Faelstroem correctly (as I just didn't), and his UI doesn't need to
support typing email addresses in Farsi, either. The real polyglots
may end up with special multilingual software that helps them compose
email address in multiple scripts, but most people will be completely
satisfied with software that supports composing email addresses in two
forms -- their own language, and ASCII.

My apologies if some of the above seems a tad obvious, but I don't
think we've been clear about the goal. It's silly to go from an
ASCII-only namespace to one where all languages are treated completely
equally, because that doesn't reflect the underlying realities about
the evolution of English as an international language. All languages
deserve our respect and technical support, including the ability to use
that language entirely for email within that linguistic community. But
English/ASCII will retain an vital role for communication across
linguistic communities, and this reality can and should be reflected in
our architectural considerations. -- Nathaniel

Keith Moore

2003-11-12 16:23:19 UTC

Permalink

Consider the problem of a student from a country that uses an
ideographic written language who is attending university in a country
that uses an alphabetic language that is compatible with ASCII (English
is one such language, not quite the only one).

The student cannot write to his relatives at home because their email
addresses require him to use characters that are not supported by the
ASCII-based MTAs at his university.

The relatives cannot write to the student because they can only write
in their native language and script, and therefore they cannot type the
student's email address.

And I'm not at all sure it's reasonable to assume that people will
either communicate in their native language or in English. I've seen
too many examples to the contrary.

Keith

John Cowan

2003-11-12 20:57:03 UTC

Permalink

Post by Keith Moore
Consider the problem of a student from a country that uses an
ideographic written language who is attending university in a country
that uses an alphabetic language that is compatible with ASCII (English
is one such language, not quite the only one).
The student cannot write to his relatives at home because their email
addresses require him to use characters that are not supported by the
ASCII-based MTAs at his university.

This problem is solved if everyone who has a non-ASCII address also has
an equivalent ASCII one (without prejudice to the question of whether
non-ASCII text appears in the underlying protocol).

Post by Keith Moore
The relatives cannot write to the student because they can only write
in their native language and script, and therefore they cannot type the
student's email address.

This problem is a genuine one, but I don't know that it's reasonable to
expect a solution to it. It is not possible to snail-mail me if you can't
write Latin letters, for example; but the reverse is not true, because
postal systems have agreed to accept mail addressed in Latin letters.

Indeed, by Universal Postal Union rules, Latin letters and European
digits MUST be used in addresses in international mail; the script
customary in the destination country SHOULD be used as well.

If anyone is curious, the following rules also apply to international mail:

1) The means of specifying of the destination country MUST be that
required by the source country.

2) The format of the rest of the address MUST be as specified by the
destination country.

3) The name of the destination country SHOULD be specified in a language
used in the source country; the name of the country in any widely used
language MAY be added in order to assist in transport.

(Source: UPU Letter Post Regulations, Article RE 204, section 3.3)

--
John Cowan www.ccil.org/~cowan www.reutershealth.com ***@reutershealth.com
In might the Feanorians / that swore the unforgotten oath
brought war into Arvernien / with burning and with broken troth.
and Elwing from her fastness dim / then cast her in the waters wide,
but like a mew was swiftly borne, / uplifted o'er the roaring tide.

J-F C. (Jefsey) Morfin

2003-11-12 23:51:52 UTC

Permalink

Post by John Cowan
This problem is a genuine one, but I don't know that it's reasonable to
expect a solution to it. It is not possible to snail-mail me if you can't
write Latin letters, for example; but the reverse is not true, because
postal systems have agreed to accept mail addressed in Latin letters.

I am afraid you forget that an assistant can write it for you. Please refer
to the protocol/users layers I quoted. This is not because the user cannot
_type_ the cases used by the protocol layer that he cannot use the
protocol. A very common solution is a menu where the user clicks on entries
in his scripting and they are transcoded in the protocol's appropriate way.
Menu servers are a very old solution. But I also documented that an OPES
data base can carry the job. Typing a mailbox name in Chinese scripting
does not prevent this mailbox to be translated into an ascii name.

A very simple and old solution is an Host.txt like solution. Please
consider a current very common case. In some countries sites are removed
the name servers that people continue to access in using Host.txt.
jfc

Adam M. Costello

2003-11-12 02:43:13 UTC

Permalink

Keith Moore <***@cs.utk.edu> wrote:

[A good case for the ACE approach in mail addresses.]

[Reasons why a change-the-message-format approach is not as simple and
obvious as it might seem, but needs careful study.]

At some point the message format needs to change, but we can't really
evaluate whether it's worth it at any particular point without doing
the impact analysis.

And even if the result of that analysis is that the message format
should be changed, each non-ASCII address will need an ASCII counterpart
during the transition period, as a fallback for when a message bumps
up against an old piece of the infrastructure. The ASCII counterpart
can be either an ACE or something prettier that gets looked up
online. The ACE approach is much easier because it doesn't require any
infrastructure support. With the online lookup approach, a user who
creates a non-ASCII address needs to also create an ASCII address and
keep them pointed at the same mailbox and make sure that some server
somewhere is providing the mapping between the two addresses (it would
probably have to be a DNS server, because people expect to be able to
send/receive mail from behind a firewall that blocks everything and
provides only a DNS server and an SMTP relay). If we can dismiss the
online-lookup idea as too burdensome, then we need an ACE-based approach
regardless of the outcome of the new-message-format analysis, so we
might as well push ahead on both in parallel.

6. Users need to be able to transcribe addresses that they receive in
email, whether in message headers or in a message body, and whether
or not their software supports the extensions that enable IMAs, and
whether or not the recipient knows the language and script in which
the sender normally writes his name and email address.
(Note: #6 directly implies the need to support multiple versions of a
sender address...
all that is required is that there be one address that is
transcribable by any potential recipient. but for me it's an open
question as to whether a Latin alphabet / ASCII-encoded fallback is
sufficient, given the absence of those characters on many of the
world's keyboards.

#6 is going to be pretty hard to satisfy. Maybe if the ACE form
contained only digits, that would do it, but the ACE form of the domain
part of the address has already been settled, and it contains ASCII
letters.

If you want to add a third form (or more forms), I don't see how without
creating and maintaining a bunch of aliases (which would involve
configuring both DNS servers and mail servers).

AMC

Martin Duerst

2003-11-13 09:05:36 UTC

Permalink

Hello Keith,

Very good start. Some comments.

Post by Keith Moore
Here I attempt to define the problems that IMAA needs to solve from a
1. Users need to have email addresses that are easy to remember and can be
reliably transcribed from memory. That's the reason we like to use
people's names in email addresses - because it makes them easy to remember.

Yes. Add to that:
- easy to recognize (who's email was that? passive memory rather than
active memory as above)
- easy to guess (works less for email addresses than for domain names,
but is still sometimes useful)
- easy to create (when somebody gets a new address)
- easy to identify with

[these things are listed in the IRI draft]

Post by Keith Moore
ASCII addresses are not adequate because the ASCII repertoire is not
sufficient to express people's names in most of the world's languages.
2. Users need to have email addresses that can easily be transcribed from
written or printed form, so they can be copied from business cards,
handwriting on paper, etc.
ASCII addresses do not always suffice because these characters may be
difficult to generate on the keyboards that are commonly in use in some
parts of the world,

I do not really know any examples of keyboards that would make generating
ASCII characters actually difficult. The difficulties lie much more
at the following points:
- People may rarely use ASCII characters, and therefore may not be familiar
with how to input them (even if it's actually very easy).
- People are not (very) familiar with the characters themselves, which
makes all operations much more difficult and error-prone. The best
example for this is to imagine that you would have to use Greek
characters. Although many scientists and engineers,... basically
know the Greek alphabet, and inputing Greek on most modern computers
isn't really difficult (assuming correct setup), it would still be
a big pain for you/us.

Post by Keith Moore
and because people unused to generating Latin characters on their
keyboards can easily confuse Latin characters with similar characters in
Greek, Cyrillic, or other alphabets.

I'm not exactly sure how this plays in here. It seems to me it could
be used as an argument both ways.

Post by Keith Moore
3. Users need to have email addresses that can be transcribed from sounds
- e.g. read over a telephone. This is harder in some languages than in
others, but even in most languages where this works, ASCII is not adequate
because people may not know or recognize the names of Latin or special
characters (even if they could type them).
So, given a suitable input device, a user who is skilled in a language and
a. With a high probability of success, correctly transcribe an email
address that uses a person's name and a well-known domain string from that
language and writing system,
b. With a high probability of success, correctly transcribe an email
address from that language and writing system that is written or printed
on paper,
c. With a high probability of success, correctly transcribe an email
address from sounds spoken over the telephone by another user with
adequate skills in that language, approximately to the extent that those
users could successfully transcribe the same address over the telephone
onto paper

- a. seems to include both b. and c. Is it needed?
- I like your wording for c. It works very well even for not at all
phonetic writing systems (e.g. Japanese)

Post by Keith Moore
4. IMAs must have a low probability of causing operational failures with
existing mail software - MTAs, MUAs, mailing lists, automatic responders,
etc. - and other software which uses email addresses (directories, address
books, security protocols, etc.).
5. All ordinary mail functions - replies, forwarding, resending, mailing
lists, must continue to work in the presence of any mixture of IMA-capable
software and legacy software.
6. Users need to be able to transcribe addresses that they receive in
email, whether in message headers or in a message body, and whether or not
their software supports the extensions that enable IMAs, and whether or
not the recipient knows the language and script in which the sender
normally writes his name and email address. This is separate from the
need to reply to messages or store received addresses in an address book.
(Note: #6 directly implies the need to support multiple versions of a
sender address, in order to provide a recipient with an address that he
can transcribe.)

I think it implies multiple versions of a sender address, or multiple
sender addresses.

At this point, this may be a detail, but I think that MUAs will develop
ways to handle the problem of 'which address to use for which recipient(s)'.
This will be based on:
- User settings (use this address for this counterpart, or this domain,...)
- Detection on incoming mail (replies to mails with new addresses can use
new addresses)
- Detection of mail content (e.g. at some point, a mail containing
Japanese may be sent with a Japanese address, but a mail containing
only ASCII would be sent with as ASCII address).
- ...
I think there is enough potential here that we can leave the implementations
to each MUA.

Regards, Martin.

Post by Keith Moore
Keith

Keith Moore

2003-11-13 16:13:30 UTC

Permalink

Post by Martin Duerst

Post by Keith Moore
So, given a suitable input device, a user who is skilled in a
a. With a high probability of success, correctly transcribe an email
address that uses a person's name and a well-known domain string from
that language and writing system,
b. With a high probability of success, correctly transcribe an email
address from that language and writing system that is written or
printed on paper,
c. With a high probability of success, correctly transcribe an email
address from sounds spoken over the telephone by another user with
adequate skills in that language, approximately to the extent that
those users could successfully transcribe the same address over the
telephone onto paper

- a. seems to include both b. and c. Is it needed?

I left out part of a. It was supposed to be "correctly transcribe an
email address from memory..." (as distinguished from writing on paper
or spoken sounds)

Post by Martin Duerst
- I like your wording for c. It works very well even for not at all
phonetic writing systems (e.g. Japanese)

thanks.

Post by Martin Duerst
At this point, this may be a detail, but I think that MUAs will develop
ways to handle the problem of 'which address to use for which
recipient(s)'.

I'm not sure that this is sufficient, as it doesn't handle the cases of
forwarding, mailing lists, etc. I suspect it makes more sense to have
senders' MUAs include 1 or more addresses for each sender or recipient
(including a required "worst case" one in ASCII), and let recipients'
MUAs decide which address to display to their users.

Keith

Arnt Gulbrandsen

2003-11-13 17:07:51 UTC

Permalink

I suspect it makes more sense to have senders' MUAs include 1 or more
addresses for each sender or recipient (including a required "worst
case" one in ASCII),

And force the sending user to type in multiple addresses per recipient?

--Arnt

Keith Moore

2003-11-13 17:18:41 UTC

Permalink

Post by Arnt Gulbrandsen

I suspect it makes more sense to have senders' MUAs include 1 or more
addresses for each sender or recipient (including a required "worst
case" one in ASCII),

And force the sending user to type in multiple addresses per recipient?

no, of course that would never fly. I'm working on a proposal, but I'd
like to get the pieces put together before presenting it to the group,
lest the individual pieces get shot down for lack of a complete picture.

Keith