Discussion:
Further thoughts on preferred printable representations...
Roy Badami
2003-06-12 20:22:34 UTC
Permalink
[This is a followup to me previous post entitled "On humanly-readable
(printable) e-mail addresses"]

I meant to post this some time ago, but never found the time.

The prefered printable representation[1] of an e-mail address is a
vital concept, as I argue in my previous message. It is what I print
on my business card. It's what appears on advertising hoardings and
trailers for movies.

If I give you my business card, you can type it the address into your
e-mail client, and send a message to me.

When I send you an e-mail message, it's what appears on your display.
You can write this down on a piece of paper, give it to someone else,
and they can type it into their e-mail client to send me a message.

I can include one in the body of a message, and you can cut and paste
it into your e-mail client.

And you can do all this without any knowledge of Internet protocols or
the structure of Internet e-mail addresses; you can simply regard an
e-mail address (in its preferred printable representation) as an
opaque character string that you have to copy verbatim.

What struck me after I wrote my previous message is that it seems to
me that the primary goal of internationalized e-mail addresses is to
internationalize something that is not currently formally defined,
namely the preferred printable representation.

ie the primary goal of internationalization is to enable the same
kinds of interactions (typing an e-mail address from a business card,
writing an e-mail address down on paper) to work for users of
non-roman scripts. Abstract representations and protocol
representations are just by-products of this that a user will never
see.

Far from being just a user-interface issue, for the (non-technical)
end-user, the opaque sequence of printable characters that they have
to copy verbatim is the _only_ thing that matters.

-roy


[1] Yes, it's not a very accurate term. Something like 'canonical
character-sequence serialization' would be better.
D. J. Bernstein
2003-06-12 23:52:42 UTC
Permalink
Post by Roy Badami
The prefered printable representation[1] of an e-mail address is a
vital concept, as I argue in my previous message.
One of the big flaws in IDNA is its failure to recognize this basic fact.
As I wrote in http://cr.yp.to/djbdns/idn.html:

The value of a character-set expansion comes entirely from the
visibility of the additional characters to users. There is no point
in merely expanding the set of bytes allowed inside the computer; the
internationalized domain name {alpha}{beta}{gamma}.com must be
displayed with Greek letters on a typical user's screen.

The same issue arises for any piece of text. There's nothing special
about domain names, or mailbox names, or blah.html, or chat usernames,
or whatever tomorrow's textual identifiers will be.
Post by Roy Badami
It is what I print on my business card.
There's a difference between what should appear on a screen for users
to see, and what should appear in print for users to type. As I wrote
on the idn mailing list two and a half years ago:

Operating systems are all going to support direct input of Unicode
characters by number. The ISO standard method is Shift-Ctrl-222E for
character 222E.

People who put hard-to-recognize characters onto their business cards
are going to add a line giving the Unicode numbers, perhaps in boxes:

***@S.cr.yp.to (with a contour-integral sign)

+----+
postmaster@|222E|.cr.yp.to (in smaller type)
+----+

The second line won't distract the people in the intended audience who
recognize the contour-integral sign and have, say, Alt-S configured to
produce it.

End of problem. Notice that this doesn't require any extra DNS names.
The conversion of Shift-Ctrl-222E to \342\210\256 is isolated inside the
keyboard interface; other software doesn't have to worry about it.

The ISO standard mentioned is ISO 14755. Notice that it works the same
way for all text; it isn't something that has to be reinvented for
domain names, mailbox names, etc.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago
Roy Badami
2003-06-29 14:43:41 UTC
Permalink
Apologies for the very delayed response.
Post by D. J. Bernstein
Operating systems are all going to support direct input of Unicode
characters by number. The ISO standard method is Shift-Ctrl-222E for
character 222E.
Thanks for the reference; I wasn't aware of this.
Post by D. J. Bernstein
People who put hard-to-recognize characters onto their business cards
+----+
+----+
It's an interesting suggestion, and it might certainly be relevent in
some contexts, but I don't see this being widely adopted for e-mail
addresses. The internationalized e-mail address will clearly be
chosen to be unambiguous to the target audience; if the person
producing the business card cares about people outside his target
audience (ie people unfamilliar with the relevant script) then most
likely they will simply include an additional ASCII-only address...

-roy
D. J. Bernstein
2003-06-30 01:19:44 UTC
Permalink
most likely they will simply include an additional ASCII-only address...
Let me get this straight. You think that, instead of using a generic
``International style'' business-card-printing program, I should go set
up a new domain name, new mailbox name, new URL, etc., and print those?

Sounds like a lot of extra effort. What exactly is the benefit?

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago
Arnt Gulbrandsen
2003-06-30 09:24:29 UTC
Permalink
Post by D. J. Bernstein
most likely they will simply include an additional ASCII-only address...
Let me get this straight. You think that, instead of using a generic
``International style'' business-card-printing program, I should go
set up a new domain name, new mailbox name, new URL, etc., and print
those?
"Should do" is one thing, "will do" quite another.

Both in my native Norway and now in Germany, I've seen a lot of
two-sided business cards: One side in Norwegian/German, the other in
English and devoid of æøåäëöß.

This might not be something people should do. But since a lot of people
do print their names both properly and in ASCII, it sounds reasonable
to assume that they will do the same for their email addresses.

--Arnt
Arnt Gulbrandsen
2003-06-30 09:25:54 UTC
Permalink
Post by Arnt Gulbrandsen
Both in my native Norway and now in Germany, I've seen a lot of
two-sided business cards: One side in Norwegian/German, the other in
English and devoid of æøåäëöß.
(Oops. ü, not ë, of course. Sorry.)

--Arnt
John C Klensin
2003-06-30 11:21:08 UTC
Permalink
--On Monday, 30 June, 2003 11:24 +0200 Arnt Gulbrandsen
Post by Arnt Gulbrandsen
"Should do" is one thing, "will do" quite another.
Both in my native Norway and now in Germany, I've seen a lot
of two-sided business cards: One side in Norwegian/German, the
other in English and devoid of æøåäëöß.
This might not be something people should do. But since a lot
of people do print their names both properly and in ASCII, it
sounds reasonable to assume that they will do the same for
their email addresses.
You may want to distinguish between "properly and in ASCII" from
"properly and appropriately in the country in which you are
using the card" -- both occur, and they aren't always the same.
I've seen a fair number of "German-Chinese" and
"French-Japanese" cards, different cards used by the same person
in Japan and China, etc. For business cards, you presumably
always know where you come from (or where your organization
does) and where you are, which gives only two (or sometimes
three) cases. For anything handled on the Internet, the issue
rapidly turns into a N-way problem.

This issue was, fwiw, extensively discussed on the IDN WG list
back when there was still an effort to get a "requirements"
document together.

john
Martin Duerst
2003-06-30 15:07:09 UTC
Permalink
You may want to distinguish between "properly and in ASCII" from "properly
and appropriately in the country in which you are using the card" -- both
occur, and they aren't always the same. I've seen a fair number of
"German-Chinese" and "French-Japanese" cards, different cards used by the
same person in Japan and China, etc. For business cards, you presumably
always know where you come from (or where your organization does) and
where you are, which gives only two (or sometimes three) cases. For
anything handled on the Internet, the issue rapidly turns into a N-way problem.
I think good email clients will find clever ways to know which address
they should be sending to which people. The Internet not only connects
us with many people around the globe, computers also can help sort
things out for us.


Regards, Martin.
Martin Duerst
2003-06-30 15:12:39 UTC
Permalink
Post by D. J. Bernstein
most likely they will simply include an additional ASCII-only address...
Let me get this straight. You think that, instead of using a generic
``International style'' business-card-printing program, I should go set
up a new domain name, new mailbox name, new URL, etc., and print those?
Sounds like a lot of extra effort. What exactly is the benefit?
Most people hand out their business cards because they want to
be contacted. Making it easy for the people you give the business
card to contact you is in your own interest. Otherwise, why
hand out the business card in the first place?

Also, helping your counterparts to associating your name and your
email address, rather than having to think about you as a number,
will help you getting contacted, and getting contacted in a decent
way.

So ultimately, if you are doing it, you are doing it in your own
interest. If you don't want to do it, you don't have to.

Regards, Martin.
D. J. Bernstein
2003-07-01 03:34:51 UTC
Permalink
Post by Martin Duerst
Making it easy for the people you give the business
card to contact you is in your own interest.
But why should that ease be achieved through the pain of separate domain
names, separate mailbox names, etc., rather than through the ISO 14755
keyboard interface?

Analogy: If you want people from Russia to be able to call you and send
you postal mail, which of the following options will you take?

(1) Set up an office in Russia, and print a business card with that
office's location.

(2) Simply print an ``international style'' business card, with your
country name below your postal address, and an extra line showing
your country-code-plus-phone-number.

Yes, I realize that huge corporations will have offices in Russia. But
they aren't the typical users setting business-card conventions.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago
J-F C. (Jefsey) Morfin
2003-06-13 08:59:17 UTC
Permalink
There are three ways of considering the problem we have to keep in mind and
this thread underlines:
1. the proposition by a developpers team (in here) - let say IETF proposition
2. a consensus by opertors (what they now look for through tests) - let say
ITU standardization
3. the real need of the users what will actually filter the above into what
will develop.

I would say this translates into:
- internationalized names: the introduction of the Unicode character set
into the Legacy name space
- multilingual names: the consideration of one name space per language (and
country due to the way ccTLDs are organized)
- vernacular names: the way the users networks, applications developpers,
states, etc. need them and will use them.

IDNs do not fit the whole job, but this is what we have at hand and it is
urgent we start with them.

MDNs are to be organized in the quickest way so the network keep stability
(national and lingual roots) . Obviously ICANN cannot manage it because
this concerns the entire world beyong the Legacy borders. But it should
share in it. However past focusing on "alt.roots" in a wrng way has not
prepared ICANN and IETF and made them overlook the ICANN ICP-3 writer's
call for experimentation and innovation. So we are late and unprepared to
dialog with ITU and others (ITU Marrakech Resolution). But whatever the way
we do it, our "I-Sector" must conduct this, not the T-Sector and UN.

E-Mail addresses are a free area for the users where he can indicate how he
imagines Internet vernaculars. So, I think we should have an iterative
process matching what James/Paul proposes, as well as Roy and Daniel. We
only do it the other way around because WG-IDN was not permitted to carry
its charter and to investigate the users and the DNS Managers firts: what
we must do now the standard is imposed.

- IDNA is here. It is raw material. There is a need. Let patch our
imgination of the user's behavior: we will correct later on. Some proposes
variants (a priori prevention), I propose context control (a postriori
corection). Probably both are good and urgent. Let learn and proceed.

- Let help experience and real life operations. The best way is to develop
the virtual zone concept (each domain name is attached a "context"
parameter, i.e. the language of its registration contract). Let see what it
comes from that and what we build on top (IMHO an architectural revolution
but without discontinuity). This way operators will relax, be able to
operate and to consider the WTO Services Round imlications (which IMHO
might be major).

- let be innovative in the users usage area too. We are talking of business
cards. Question: is it reasonable to still use XVIIIth century business
cards and to force us to adapt to them? When you are going to buy potatoes
the package will wear a chips with far more information than you have on
your business card. Should we not consider smart business cards or simple
cards with a magnetic strip? And see what the people actually do when the
e-mail name becomes mainly something to see and not to type.

The same, would there not be time to slightly upgrade the mail system and
adopt some very simple method simmilar to SSH: the first time I relate with
another e-mail agent my agent asks it some information and I may add my
ones. Simply look at lack of presentation services of the today major
e-mail agents. Would someone develop an Outlook+Eudora enhancement
including serious e-mail data base manament, most probably things discussed
in here would be very different or non existing.

Please tell me what I gained in 26 years of e-mail usage? When I started I
could sort my mails without reading them, encrypt them, store them in my
database, etc. OK I was with the leading public network company and a major
timesharing service with computer power ... but we were 8 bits at that time
and core memore of 1 Meg was very big.

Would this vernacular need not be a good occasion/alibi for a general
review of the mail system? I tend to feel it could be easier than DNS - if
we add to and not replace. And a way to lead to a DNS revamp that everyone
will accept and even want.
jfc
Post by D. J. Bernstein
Post by Roy Badami
The prefered printable representation[1] of an e-mail address is a
vital concept, as I argue in my previous message.
One of the big flaws in IDNA is its failure to recognize this basic fact.
The value of a character-set expansion comes entirely from the
visibility of the additional characters to users. There is no point
in merely expanding the set of bytes allowed inside the computer; the
internationalized domain name {alpha}{beta}{gamma}.com must be
displayed with Greek letters on a typical user's screen.
The same issue arises for any piece of text. There's nothing special
about domain names, or mailbox names, or blah.html, or chat usernames,
or whatever tomorrow's textual identifiers will be.
Post by Roy Badami
It is what I print on my business card.
There's a difference between what should appear on a screen for users
Operating systems are all going to support direct input of Unicode
characters by number. The ISO standard method is Shift-Ctrl-222E for
character 222E.
People who put hard-to-recognize characters onto their business cards
+----+
+----+
The second line won't distract the people in the intended audience who
recognize the contour-integral sign and have, say, Alt-S configured to
produce it.
End of problem. Notice that this doesn't require any extra DNS names.
The conversion of Shift-Ctrl-222E to \342\210\256 is isolated inside the
keyboard interface; other software doesn't have to worry about it.
The ISO standard mentioned is ISO 14755. Notice that it works the same
way for all text; it isn't something that has to be reinvented for
domain names, mailbox names, etc.
---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago
---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.483 / Virus Database: 279 - Release Date: 19/05/03
Loading...