Discussion:
[bidi] BIDI in mail addresses
Jony Rosenne
2003-09-30 06:41:57 UTC
Permalink
I don't agree with the assumption made in http://www.gnomon.org.uk/bidi-ambiguities.txt:

"This paper assumes a standard display model that has been proposed
for the display of identifiers such as IDNs, Internatialized Resource
Identifiers (IRIs) and Internationalized Mail Addresses (IMAs).

The standard display model is that an Internet identifier is rendered
according to the Unicode bidirectional algorithm, in a left-to-right
context."

In my view, a right to left UI will use a right-to-left context, at least for right-to-left identifiers.

In my view, this is the whole point. Our aim should be for a Hebrew or Arabic user to be able to use the internet in his own language, without the need to learn English or the Latin script.

Tricky use of mixed RTL/LTR identifiers is of secondary importance and I support any restriction required to allow them in both RTL and LTR contexts.

Jony
-----Original Message-----
Sent: Monday, September 29, 2003 3:44 PM
Subject: [bidi] BIDI in mail addresses
There is a discussion of bidi in mail addresses which could
really use some feedback from people on this group. I'll try
1. There are current restrictions on the contents of a domain
name, which are intended to prevent ambiguous rendering
(where the same visual appearance could have two different
actual representations, such as 123ABC.org, which could
either be "CBA123.org" or "123CBA.org" internally.)
2. However, there are still cases where this doesn't work,
which have been discussed on the list. And the restrictions
may or may not be felt to be reasonable (which could use
feedback from here).
A paper has been published on this, at
http://www.gnomon.org.uk/bidi-> ambiguities.txt.
The archive
is at
http://www.imc.org/ietf-imaa/mail-archive/threads.html. Look
at the items titled ...Bidi Issues...
http://www.imc.org/ietf-imaa/
Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄
John Cowan
2003-09-30 10:52:54 UTC
Permalink
Post by Jony Rosenne
In my view, a right to left UI will use a right-to-left context,
at least for right-to-left identifiers.
In my view, this is the whole point. Our aim should be for a Hebrew
or Arabic user to be able to use the internet in his own language,
without the need to learn English or the Latin script.
Unfortunately, non-Latin identifiers have to interoperate with the
existing infrastructure, where top-level names are encoded in Latin script.
Replacing the ISO 3166 codes at the top of the domain hierarchy is not
contemplated, so every host in the "il" domain will bear a name that is
either purely LTR (as today) or bidirectional.
--
My confusion is rapidly waxing John Cowan
For XML Schema's too taxing: ***@reutershealth.com
I'd use DTDs http://www.reutershealth.com
If they had local trees -- http://www.ccil.org/~cowan
I think I best switch to RELAX NG.
Jony Rosenne
2003-09-30 12:46:39 UTC
Permalink
That's the next thing on my list. Why not?

Jony
-----Original Message-----
Sent: Tuesday, September 30, 2003 12:53 PM
To: Jony Rosenne
Subject: Re: [bidi] BIDI in mail addresses
Post by Jony Rosenne
In my view, a right to left UI will use a right-to-left context, at
least for right-to-left identifiers.
In my view, this is the whole point. Our aim should be for
a Hebrew or
Post by Jony Rosenne
Arabic user to be able to use the internet in his own language,
without the need to learn English or the Latin script.
Unfortunately, non-Latin identifiers have to interoperate
with the existing infrastructure, where top-level names are
encoded in Latin script. Replacing the ISO 3166 codes at the
top of the domain hierarchy is not contemplated, so every
host in the "il" domain will bear a name that is either
purely LTR (as today) or bidirectional.
--
My confusion is rapidly waxing John Cowan
I'd use DTDs http://www.reutershealth.com
If they had local trees -- http://www.ccil.org/~cowan
I think I best switch to RELAX NG.
Roy Badami
2003-09-30 12:06:40 UTC
Permalink
Note that the proposed display model (as proposed, for instance, in
the IRI draft) will display RTL labels in RTL order. The base
embedding level for the identifier acts as a kind of default, for when
the bidi algorithm can't determine the directionality of a character.

-roy
Mark Davis
2003-09-30 13:28:35 UTC
Permalink
What Jony is saying, I believe, is that people would like to see, in an Arabic
or hebrew context, something like the following:

...http://ABC.DEF/GHI.htm...
displayed as:
...htm.IHG/FED.CBA//:http

Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message -----
From: "Roy Badami" <***@gnomon.org.uk>
To: "Jony Rosenne" <***@qsm.co.il>
Cc: "'John Cowan'" <***@mercury.ccil.org>; <***@unicode.org>;
<ietf-***@imc.org>
Sent: Tue, 2003 Sep 30 05:06
Subject: RE: [bidi] BIDI in mail addresses
Post by Roy Badami
Note that the proposed display model (as proposed, for instance, in
the IRI draft) will display RTL labels in RTL order. The base
embedding level for the identifier acts as a kind of default, for when
the bidi algorithm can't determine the directionality of a character.
-roy
Roy Badami
2003-09-30 13:32:35 UTC
Permalink
Post by Mark Davis
What Jony is saying, I believe, is that people would like to see, in an Arabic
...http://ABC.DEF/GHI.htm...
...htm.IHG/FED.CBA//:http
What will the browser display in the address bar, though? Will there
be two different display modes (using a LTR or RTL embedding)
depending on the users locale?

-roy
Jony Rosenne
2003-09-30 14:51:16 UTC
Permalink
There are today. The text "Address" should be localized too. Hebrew and
Arabic Windows have the whole screen layout RTL. What is missing can be
fixed.

Jony
-----Original Message-----
Sent: Tuesday, September 30, 2003 3:33 PM
To: Mark Davis
Subject: Re: [bidi] BIDI in mail addresses
Post by Mark Davis
What Jony is saying, I believe, is that people would like
to see, in an Arabic > or hebrew context, something like the
following: >
Post by Mark Davis
...http://ABC.DEF/GHI.htm...
...htm.IHG/FED.CBA//:http
What will the browser display in the address bar, though?
Will there be two different display modes (using a LTR or RTL
embedding) depending on the users locale?
-roy
Roy Badami
2003-09-30 14:01:19 UTC
Permalink
Post by Jony Rosenne
There are today. The text "Address" should be localized too. Hebrew and
Arabic Windows have the whole screen layout RTL. What is missing can be
fixed.
And in an RTL embedding, the URL

http://www.foo.com/123/ (logical order)

will display as

/123/http://www.foo.com (display order)

unless there is some more complex display model than the one that I've
seen proposed. Given that Hebrew/Arabic users are going to want to be
able to access ASCII URLs for the foreseeable future, if not
indefinitely, this looks like it would be horribly confusing...

-roy
Roy Badami
2003-09-30 14:07:33 UTC
Permalink
Post by Roy Badami
And in an RTL embedding, the URL
http://www.foo.com/123/ (logical order)
will display as
/123/http://www.foo.com (display order)
Oops, sorry, it will display as

/http://www.foo.com/123 (display order)

(I think)

-roy
Jony Rosenne
2003-09-30 15:17:32 UTC
Permalink
The browser can make the context LTR if there is no RTL character in the
text.

We should first think of our destination, and only after that about
intermediate steps, otherwise we won't get very far.

Jony
-----Original Message-----
Sent: Tuesday, September 30, 2003 4:01 PM
To: Jony Rosenne
Cc: 'Roy Badami'; 'Mark Davis'; 'John Cowan';
Subject: RE: [bidi] BIDI in mail addresses
Post by Jony Rosenne
There are today. The text "Address" should be localized
too. Hebrew and > Arabic Windows have the whole screen
layout RTL. What is missing can be > fixed.
And in an RTL embedding, the URL
http://www.foo.com/123/ (logical order)
will display as
/123/http://www.foo.com (display order)
unless there is some more complex display model than the one
that I've seen proposed. Given that Hebrew/Arabic users are
going to want to be able to access ASCII URLs for the
foreseeable future, if not indefinitely, this looks like it
would be horribly confusing...
-roy
Israel Gidali
2003-09-30 14:02:19 UTC
Permalink
If this is indeed what Jony says, then this is Jony's personal view and
does not necessarily reflect what other people in our area will want to
see!
What if anybody using Arabic or Hebrew or Farsi or Urdu wants to access
one of the URLs which are in English? will he see the URL entirely
reversed?




Shalom, Salam, Peace,
Israel Gidali ישראל גידלי
Globalization Manager, IBM Israel , GCoC - Complex Text Languages
**NEW**: Ph: +972 3 918 8604 Mob: +972 52 554 604 Fax:
+972 3 918 8883



|---------+---------------------------->
| | "Mark Davis" |
| | <***@jtcsv|
| | .com> |
| | Sent by: |
| | bidi-***@unico|
| | de.org |
| | |
| | |
| | 30/09/2003 16:28 |
|---------+---------------------------->
--------------------------------------------------------------------------------------------------------------------------|
| |
| To: "Jony Rosenne" <***@qsm.co.il>, "Roy Badami" <***@gnomon.org.uk> |
| cc: "'John Cowan'" <***@mercury.ccil.org>, <***@unicode.org>, <ietf-***@imc.org> |
| Subject: [bidi] Re: BIDI in mail addresses |
| |
| |
--------------------------------------------------------------------------------------------------------------------------|
What Jony is saying, I believe, is that people would like to see, in an
Arabic
or hebrew context, something like the following:

...http://ABC.DEF/GHI.htm...
displayed as:
...htm.IHG/FED.CBA//:http

Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message -----
From: "Roy Badami" <***@gnomon.org.uk>
To: "Jony Rosenne" <***@qsm.co.il>
Cc: "'John Cowan'" <***@mercury.ccil.org>; <***@unicode.org>;
<ietf-***@imc.org>
Sent: Tue, 2003 Sep 30 05:06
Subject: RE: [bidi] BIDI in mail addresses
Note that the proposed display model (as proposed, for instance, in
the IRI draft) will display RTL labels in RTL order. The base
embedding level for the identifier acts as a kind of default, for when
the bidi algorithm can't determine the directionality of a character.
-roy
Paul Hoffman / IMC
2003-09-30 16:00:05 UTC
Permalink
Post by John Cowan
Unfortunately, non-Latin identifiers have to interoperate with the
existing infrastructure, where top-level names are encoded in Latin script.
True.
Post by John Cowan
Replacing the ISO 3166 codes at the top of the domain hierarchy is not
contemplated,
False: it is being actively contemplated.
Post by John Cowan
so every host in the "il" domain will bear a name that is
either purely LTR (as today) or bidirectional.
Every domain name will contain periods. Every URL will contain
periods in the domain name, as well as slashes, colons, and ASCII
characters in the scheme names.

--Paul Hoffman, Director
--Internet Mail Consortium
Arnt Gulbrandsen
2003-09-30 16:11:05 UTC
Permalink
Post by Paul Hoffman / IMC
Post by John Cowan
Replacing the ISO 3166 codes at the top of the domain hierarchy is
not contemplated,
False: it is being actively contemplated.
Please do tell. (Or post an URL, an i-d name or equivalent.)

--Arnt
John C Klensin
2003-09-30 16:14:22 UTC
Permalink
--On Tuesday, 30 September, 2003 09:00 -0700 "Paul Hoffman /
Post by Paul Hoffman / IMC
Post by John Cowan
Unfortunately, non-Latin identifiers have to interoperate
with the existing infrastructure, where top-level names are
encoded in Latin script.
True.
Post by John Cowan
Replacing the ISO 3166 codes at the top of the domain
hierarchy is not contemplated,
False: it is being actively contemplated.
Paul, the word John used was "replacing", not "adding to". I'm
not aware of replacement of the 3166-based codes being
contemplated, actively or otherwise, by anyone with the ability
to make such a change. For my edification, could you fill me in?
Post by Paul Hoffman / IMC
Post by John Cowan
so every host in the "il" domain will bear a name that is
either purely LTR (as today) or bidirectional.
Every domain name will contain periods. Every URL will contain
periods in the domain name, as well as slashes, colons, and
ASCII characters in the scheme names.
Yes, absolutely, although I can more easily imagine non-ASCII
scheme names than I can getting rid of 3166-based TLDs.

john
Paul Hoffman / IMC
2003-09-30 16:30:25 UTC
Permalink
Post by John C Klensin
Post by Paul Hoffman / IMC
Post by John Cowan
Replacing the ISO 3166 codes at the top of the domain
hierarchy is not contemplated,
False: it is being actively contemplated.
Paul, the word John used was "replacing", not "adding to".
Whoops, good point; I mis-read the sentence. John is right that John
is right: many people are actively working on *adding to* the current
ccTLDs with IDNs. The two will co-exist.

--Paul Hoffman, Director
--Internet Mail Consortium
John Cowan
2003-09-30 16:58:24 UTC
Permalink
Post by Paul Hoffman / IMC
Whoops, good point; I mis-read the sentence. John is right that John
is right: many people are actively working on *adding to* the current
ccTLDs with IDNs. The two will co-exist.
*whew*
--
They do not preach John Cowan
that their God will rouse them ***@reutershealth.com
A little before the nuts work loose. http://www.ccil.org/~cowan
They do not teach http://www.reutershealth.com
that His Pity allows them --Rudyard Kipling,
to drop their job when they damn-well choose. "The Sons of Martha"
Adam M. Costello
2003-09-30 23:57:03 UTC
Permalink
Post by John Cowan
Replacing the ISO 3166 codes at the top of the domain hierarchy is not
contemplated, so every host in the "il" domain will bear a name that
is either purely LTR (as today) or bidirectional.
Certainly, every subdomain of "il" is either purely LTR or
bidirectional, but there is hope that in the future there could be pure
RTL domains under a TLD belonging to Israel. The idea is that the root
could delegate new TLDs to a country in addition to its ccTLD.

I have no idea how likely this is to happen, or how soon.

AMC

Adam M. Costello
2003-09-30 23:43:41 UTC
Permalink
I'd like to encourage the people coming from the bidi mailing list to
help us with the main question that IMAA is now facing: how to apply the
Stringprep bidi check.

The current draft applies it to the entire local part, which creates two
annoyances: FOO2 is not allowed (because it's right-to-left but does not
end with a strong RTL character), and owner-FOO is not allowed (because
it mixes strong LTR and strong RTL characters).

One alternative is to apply the bidi check to individual segments.
(IMAA already divides the local part into segments for other reasons.
Segments are delimited by non-alphanumeric ASCII characters.) This
would allow owner-FOO, but would continue to disallow FOO2, and would
also disallow FOO2-BAR, which is allowed in the current draft.

Another alternative is to not apply the bidi check at all. This would
allow all the cases mentioned, and would also allow things that the bidi
check was intended to protect us against, like 2FOOfooBARbar3.

We think that disallowing owner-FOO is sufficiently onerous that the
current draft needs to be changed, but it's not clear to us which of the
other two ideas has a better risk/reward tradeoff, or if they're both so
bad that we ought to try to think of something else.

AMC
Loading...