Global and national e-mail address

Discussion:

Dan Oscarsson

2003-12-08 12:27:14 UTC

This topic has been up before but as it may affect the proposals for
handling alternative addresses, I think we could discuss it a
little more.

There have been some talk about having an address people can remember,
speak over the phone, have on paper or business card. I think some of you
see the old ASCII e-mail address is the suitable "global" address that
everybody should have as a common address everybody can use. And some national
version that only some will use.

Unfortunately the 26 letters of ASCII are not sufficient to represent all
names in the world in a good enough manner. Not all sounds can be represented
well. But the easies thing for everybody to use, is an alphabet. So ideographic
characters are out. Phonetic letters would work well for writing names, but
are difficult for most people to use.
Best choice I can see to use for "global" e-mail addresses is the Latin
alphabet. Many today have a knowledge of how the glyphs look and can
recognise them. Also many can type through the keyboard at least a subset
of all letters.
The Latin alphabet is not ASCII, the Latin letters include several letters
more then the 26 in ASCII, for example, åäöæþ (for those who can see ISO
8859-1), as well as several accents that can be used together
with the letters. Still the number of letters are not that many, and neither
are the number of accents, so everybody should be able to learn to
recognise and use them.
Using the full latin alphabet instead of the ASCII subset, I think most
(if not all) names can be written quite well.
It is also good for everybody to get a little extended understanding
of other cultures than their own.

As people want to use a name written using national letters, they
will need one or more national versions in addition to their global address.

>From the above, my choice for representing the Global e-mail address
(the address everybody will be able to use) is to use the full Latin alphabet.

This means that we should move the current infrastructure for e-mail
so that everybody will, as a common subset, support all latin letters in
email addresses. This subset can more easily be downgraded into ASCII
to be used in legacy systems, that complete UCS, and can be made more
easy to read in encoded form.
At the same time it will allow most people to write their name/e-mail address
in a way the closely represents their name as written using national characters.

What do you think?

Dan

John C Klensin

2003-12-08 15:47:54 UTC

Permalink

Dan,

I don't think so, for at least three reasons, however
unfortunate they may be...

(i) Absent tagging, we only get one shot at characters with the
eighth bit set. To say "ok, let's use 8859-1", is likely to
turn out to be equivalent to "can't use UTF-8, or any other
plausible Unicode encoding" in email, even within a country".
That would be a very bad outcome, bad enough to encourage local,
non-interoperable, conventions.

(ii) It is not even clear what characters are part of "the full
Latin (sic) alphabet". Note first that Latin itself uses a
_subset_ of the characters that appear in ASCII. Conversely,
8859-1 contains most Western European "Latin" characters, but
leaves out several that are needed to write Eastern European
languages that use "Latin" scripts. If you put all of those
"Latin" characters together, there isn't room in a single 8-bit
set, and then we start facing all of those "other" problems...
might as well just give it up and use Unicode, rather than
figuring out who picks what is, and is not, important enough.

(iii) "Latin", perhaps more than most other scripts (but that is
getting past the limits of my knowledge; I hope Michael Everson
will comment), has a long history of interesting and artistic
fonts, some of them barely readable without contextual clues to
those who use the underlying scripts every day. As soon as one
moves away from those basic 36 characters (or maybe a bit
fewer), understanding whether adding sundry dots, hooks, bars,
etc., in assorted orientations are font decorations or different
characters actually requires a good deal of training and
script-familiarity. To take an example from a different
alphabetic script, many Thai characters, as conventionally
written, contain small loops. Can those loops be omitted
without changing the character? I don't know and presume you
don't either. I'm sure any Thai schoolchild does. If you think
there are firm rules that apply across scripts and make
recognition reasonably easy for people who are not trained in,
and regular users of, the script in question, please try
explaining just how long the descender on a "j" must be
--independent of font stylizations-- in order to turn it into an
"i". And that example, of course, doesn't even require going
beyond ASCII. As I have noted many times before, our colleagues
at ISO and ITU continue to select _extremely_ limited character
sets when they want international interchange. The reasons are
no longer primarily a shortage of bits.

I continue to believe that, realistically, we are destined to a
world of email address aliases if we want interoperability to be
preserved. For better or worse, we know how to do that and have
been doing it for years -- as far as either 821 or 822 or their
successors are concerned, "DAN" and "Dan" and "dan" in your
address are aliases, not automagic case mappings... treating
them as case mappings is merely an implementation decision. In
that context, I note that, if your system is case-sensitive for
addresses (without aliases) and I assume otherwise and try to
send mail to ***@kiconsulting.se, the mail will
bounce. We have been living with this for many years; few
people are surprised by it very often.

What does that mean in practice? It means that my hypothetical
friend Ug, who lives in Lower Slobbovia and wants to communicate
with his Lower Slobbovian friends in their native script, is
likely to want an email address in Lower Slobbovic. I would
hope that he would first hold a discussion with UTC and wait
until the script appears in Unicode 4.99, but I'd predict, given
a sense of urgency and/or being left out and the traditional
behavior of Slobbovians toward standards, that they would use
private-use Unicode space and adopt their own standards/model.
If he wants to hear from you or me (or most of the rest of the
world, including Upper Slobbovians, who have different case
conventions at least), he is going to need an ASCII address
(just as you need lower-case alias capability) -- the absolutely
worst threat to interoperability involves our receiving a
Unicode-based address that contains private-use characters (and,
for a receiving Unicode 3.2 system, characters that are first
defined in 4.0 may not be much better)... even cut and paste may
not work, depending on the operating system environment.

As a final observation, please note that the "send UTF-8 (or
something) in transport" and the "IMAA-like codings so transport
doesn't need changes" differ only trivially under the scenario
I've outlined above. Under an idea set of conditions and
assumptions (even though the ideal conditions are slightly
different), things will work relatively smoothly with either.
It is the edge cases that make the difference. My Slobbovian
examples are always about edge cases, but, even for the less
extreme ones, the questions are about whether we can have
localization where it is important _and_ global
interoperability, or whether we need to start making "that
script and the conventions of that language --and whether or not
people can spell their names correctly-- are more important than
that one" decisions. I'm not prepared to either volunteer to
be one of the human sacrifices the latter course is likely to
require, or to volunteer anyone else. My search for "do it
right, even if it isn't quite as fast" solutions is based on the
believe that we shouldn't sign IETF up for that role either.

john

--On Monday, 08 December, 2003 13:27 +0100 Dan Oscarsson
<***@kiconsulting.se> wrote:

> This topic has been up before but as it may affect the
> proposals for handling alternative addresses, I think we could
> discuss it a little more.
>
> There have been some talk about having an address people can
> remember, speak over the phone, have on paper or business
> card. I think some of you see the old ASCII e-mail address is
> the suitable "global" address that everybody should have as a
> common address everybody can use. And some national version
> that only some will use.
>
> Unfortunately the 26 letters of ASCII are not sufficient to
> represent all names in the world in a good enough manner. Not
> all sounds can be represented well. But the easies thing for
> everybody to use, is an alphabet. So ideographic characters
> are out. Phonetic letters would work well for writing names,
> but are difficult for most people to use.
> Best choice I can see to use for "global" e-mail addresses is
> the Latin alphabet. Many today have a knowledge of how the
> glyphs look and can recognise them. Also many can type through
> the keyboard at least a subset of all letters.
> The Latin alphabet is not ASCII, the Latin letters include
> several letters more then the 26 in ASCII, for example, åäöæþ
> (for those who can see ISO 8859-1), as well as several accents
> that can be used together with the letters. Still the number
> of letters are not that many, and neither are the number of
> accents, so everybody should be able to learn to recognise and
> use them.
> Using the full latin alphabet instead of the ASCII subset, I
> think most (if not all) names can be written quite well.
> It is also good for everybody to get a little extended
> understanding of other cultures than their own.
>
> As people want to use a name written using national letters,
> they will need one or more national versions in addition to
> their global address.
>
> From the above, my choice for representing the Global e-mail
> address (the address everybody will be able to use) is to use
> the full Latin alphabet.
>
> This means that we should move the current infrastructure for
> e-mail so that everybody will, as a common subset, support all
> latin letters in email addresses. This subset can more easily
> be downgraded into ASCII to be used in legacy systems, that
> complete UCS, and can be made more easy to read in encoded
> form.
> At the same time it will allow most people to write their
> name/e-mail address in a way the closely represents their name
> as written using national characters.
>
> What do you think?
>
> Dan
>
>

Arnt Gulbrandsen

2003-12-09 10:38:03 UTC

Permalink

FYI, the only "full latin alphabet" I know about is the Minimum European
Subset, which contains around a thousand characters. Estimating the
long-term viability of that as an email address characer set is left as
an exercise for the reader.

--Arnt

Adam M. Costello

2003-12-08 17:16:32 UTC

Permalink

Dan Oscarsson <***@kiconsulting.se> wrote:

> Best choice I can see to use for "global" e-mail addresses is the
> Latin alphabet.

If someone wants a global address in addition to one or more local
addresses, I think ASCII is a better choice. ASCII is widely supported
in operating systems and keyboards, to a much greater extent than the
diacritics. Keyboards are likely to have the ASCII characters on the
key-caps (maybe not as the primary symbols, but at least as alternate
symbols), but not a full set of diacritics (mine has none). Many
computers/devices have fonts and input routines that support one local
charset and ASCII, but not the Latin diacritics.

Meanwhile, I don't see how the full Latin alphabet is much better
than ASCII at representing non-Latin-based names. A global address
is primarily for people not familiar with the language of the local
address. How is that audience going to benefit from the diacritics?
They still won't know the proper spelling, pronunciation, or meaning of
the name in the original language. What's the point of using diacritics
to trace fine distinctions of pronunciation that the intended audience
can neither hear nor speak, or to trace fine distinctions of spelling
in an original script that the intended audience can neither read nor
write?

AMC

Paul Hoffman / IMC

2003-12-08 20:00:58 UTC

Permalink

At 1:27 PM +0100 12/8/03, Dan Oscarsson wrote:
> >From the above, my choice for representing the Global e-mail address
>(the address everybody will be able to use) is to use the full Latin alphabet.

You didn't mention the encoding you suggest to be used. Nor how to
protect current mail software that is not expecting such an encoding
from failing miserably. This gets to the heart of the matter.

For many years (at least since early 1998), I have heard of people
using various repertoires in various encodings on the LHS (and the
RHS, of course), saying it works, but then saying "as long as
everyone is using Foo MUA and Bar MTA". The suggestion is obvious:
the problems associated with it are not.

If you are proposing using ISO 8859-1 as the encoding, that seems
like a no-brainer: it's just turning on the high bit. But some MTAs
will ignore the high bit, so they will deliver mail to the wrong
mailbox. Some MUAs will ignore the high bit on responding, sending
the mail to the wrong place.

Or did you mean using UTF-8, but limiting the repertoire to "Latin"?
This gets you the worst of the above problems, and adds in others
(such as about two billion people "cheating" and using the characters
that they actually use in their names).

This has been suggested many times before. The folks who tried it
gave up because it failed. If you have tried it in an open mail
environment and had it work, feel free to report your success.

--Paul Hoffman, Director
--Internet Mail Consortium

John C Klensin

2003-12-08 20:16:36 UTC

Permalink

For the record, Paul and I are apparently in complete agreement
about this. And in particular, as I should have mentioned in
my longer response... that this is ultimately an old idea, that
it has been proposed and tried several times before and that,
except in isolated communities (which Paul refers to as
"everyone is using Foo MUA and Bar MTA"), the people who have
done so have given up on it because it doesn't work
interoperably.

john

--On Monday, 08 December, 2003 12:00 -0800 "Paul Hoffman / IMC"
<***@imc.org> wrote:

>
> At 1:27 PM +0100 12/8/03, Dan Oscarsson wrote:
>> > From the above, my choice for representing the Global
>> > e-mail address
>> (the address everybody will be able to use) is to use the
>> full Latin alphabet.
>
> You didn't mention the encoding you suggest to be used. Nor
> how to protect current mail software that is not expecting
> such an encoding from failing miserably. This gets to the
> heart of the matter.
>
> For many years (at least since early 1998), I have heard of
> people using various repertoires in various encodings on the
> LHS (and the RHS, of course), saying it works, but then saying
> "as long as everyone is using Foo MUA and Bar MTA". The
> suggestion is obvious: the problems associated with it are not.
>
> If you are proposing using ISO 8859-1 as the encoding, that
> seems like a no-brainer: it's just turning on the high bit.
> But some MTAs will ignore the high bit, so they will deliver
> mail to the wrong mailbox. Some MUAs will ignore the high bit
> on responding, sending the mail to the wrong place.
>
> Or did you mean using UTF-8, but limiting the repertoire to
> "Latin"? This gets you the worst of the above problems, and
> adds in others (such as about two billion people "cheating"
> and using the characters that they actually use in their
> names).
>
> This has been suggested many times before. The folks who tried
> it gave up because it failed. If you have tried it in an open
> mail environment and had it work, feel free to report your
> success.
>
> --Paul Hoffman, Director
> --Internet Mail Consortium

Dan Oscarsson

2003-12-10 09:32:22 UTC

Permalink

I will here comment several comments frpm John C Klensin, Adan M Costello
and others.

With this topic I did not want to talk about replacing ASCII with ISO 8859-1
or about what character encoding to use. Instead I wanted to discuss
what names could be suitable to use as a global fallback name.

To be able to write a name you need, at least, to be able to have letters
so you can write all phonemes used. ASCII only contains 26 letters and they
cannot represent all phonemes very well. For example, Swedish have
three vowals in addition to the ones available in ASCII. They are
represented by the letters "åäö". These are letters, not an "a" or "o"
with an accent above. Without those three letters you cannot write all
Swedish names. Accents I can live without, but not the letters for our
additional phonemes.

To be able to write most names in the world I think you need to be able to
write about 60 phonems. Not everybody need their own letter (English have
about 45 phonems but the 26 letters are enough). So I would expect by adding
not that many more letters to the ones in ASCII we could get a quite faire
representation of all names in the world. That would be more acceptible
to have on a business card.

And just like Adam my Swedish keyboard do not have any accented or diacritic
letters. But I can still type quite a lot of them by using "compose" or
"alt graph".

ASCII will never be good enough for use as a "global" name for Swedish.
But with a few more letters added it would be possible. I expect the same
would work for most other languages. I would not be surprised if
35-45 letters would be enough to give quite good representation of
the native names.

Dan

John C Klensin

2003-12-10 14:11:39 UTC

Permalink

Dan,

Three observations (short this time)...

(i) Computer geeks and their possible preferences aside, people
tend to not like transliterations (writing of a name that would
normally be written in one character set in the characters of
another). Whether they like "better" transliterations more than
"worse" transliterations is a cultural issue.

(ii) To reasonably transliterate names and languages, one needs
not only a collection of the right phonemes, but an appropriate
and accurate notation for tones. You can't get those out of a
small extension to Latin letters. I'm told that one can get a
reasonable approximation of all of the relevant phonemes and
tones with IPA, but IPA not only uses some characters that are
distinctly non-Latin-based, but also uses a rather complex
collection of combining diacriticals. And, at least unless one
is a professional phonologist, learning IPA and how to use it
accurately is _hard_ (having had people attempt to teach it to
me twice, once when I was young enough to learn these things).
For some hints in a reference that is easily accessible to most
of us, see the discussion of IPA Characters in the Unicode
definition (3.0 or 4.0, take your pick).

(iii) If one wants even an approximation to accurate
transliteration, the symbol-overloading in Latin scripts is bad
news. E.g., the sound of "ö" (o with diaresis, U+00F6) is
different in, e.g., Swedish and German. I.e., they are
different characters, even if they look the same and even if
Unicode "unified" them. If one is trying to transliterate,
e.g., Arabic into Roman characters, does one pick a character on
the basis of the Swedish phoneme or the German ones? (Hint, as
soon as you start down that path, you end up sliding toward IPA.)

It appears to me that you are proposing a very Euro-centric view
of things, and it won't work all that well even for Europe.

regards,
john

--On Wednesday, 10 December, 2003 10:32 +0100 Dan Oscarsson
<***@kiconsulting.se> wrote:

>
> I will here comment several comments frpm John C Klensin, Adan
> M Costello and others.
>
> With this topic I did not want to talk about replacing ASCII
> with ISO 8859-1 or about what character encoding to use.
> Instead I wanted to discuss what names could be suitable to
> use as a global fallback name.
>
> To be able to write a name you need, at least, to be able to
> have letters so you can write all phonemes used. ASCII only
> contains 26 letters and they cannot represent all phonemes
> very well. For example, Swedish have three vowals in addition
> to the ones available in ASCII. They are represented by the
> letters "åäö". These are letters, not an "a" or "o" with an
> accent above. Without those three letters you cannot write all
> Swedish names. Accents I can live without, but not the letters
> for our additional phonemes.
>
> To be able to write most names in the world I think you need
> to be able to write about 60 phonems. Not everybody need their
> own letter (English have about 45 phonems but the 26 letters
> are enough). So I would expect by adding not that many more
> letters to the ones in ASCII we could get a quite faire
> representation of all names in the world. That would be more
> acceptable to have on a business card.
>
> And just like Adam my Swedish keyboard do not have any
> accented or diacritic letters. But I can still type quite a
> lot of them by using "compose" or "alt graph".
>
> ASCII will never be good enough for use as a "global" name for
> Swedish. But with a few more letters added it would be
> possible. I expect the same would work for most other
> languages. I would not be surprised if 35-45 letters would be
> enough to give quite good representation of the native names.
>
> Dan
>
>

Keld Jørn Simonsen

2003-12-10 15:16:39 UTC

Permalink

On Wed, Dec 10, 2003 at 09:11:39AM -0500, John C Klensin wrote:
>
> (iii) If one wants even an approximation to accurate
> transliteration, the symbol-overloading in Latin scripts is bad
> news. E.g., the sound of "ö" (o with diaresis, U+00F6) is
> different in, e.g., Swedish and German. I.e., they are
> different characters, even if they look the same and even if
> Unicode "unified" them.

Actually ö is pronounced the same in German and Swedish, and they
com out of the same typographical tradition, with combining an o and an
e. But they are considered differently, in German it is considered an
"o umlaut" while in Swedish it is considered a genuine letter.
Both Germans and Swedes have a specific sound for the character,
when they spell words, and incidently it is the same sound (more or
less, dialects may vary). If you had said ö as done in French or in
Dutch (where the ö is less frequent) then you were right. There ö is
considered an o with diaresis, where the diaresis accent is placed to
indicate that the o-sound is pronounced individually.

Another example of letters that are pronounced differently, is "i" and
"e" which are pronounced very differently in standard English vs standard
German, French, and Scandinavian.

But I think this does not matter for email addresses, as long as you can
write the correct name, it does not matter how it is pronounced.
Eg my last name "Simonsen" is pronounced differently in Danish and by
uninitiated English speaking persons. But it goes perfectly into one of
my email addresses ***@dkuug.dk .

Furthermore it was not Unicode that unified the different "ö"s, that
was already done in iso-8859-1 and earlier charsets.

> If one is trying to transliterate,
> e.g., Arabic into Roman characters, does one pick a character on
> the basis of the Swedish phoneme or the German ones? (Hint, as
> soon as you start down that path, you end up sliding toward IPA.)

Yes, that is another ballpark with its own problems.

> It appears to me that you are proposing a very Euro-centric view
> of things, and it won't work all that well even for Europe.

I agree completely. I think for common email addresses we can only
use what we have today in current mail rfc's namely a subset of
us-ascii. And then for our initernationalized email addresses, we need
to have all names of persons in the world spelled correctly
in whatever script that is available in ISO 10646.

Best regards
Keld

Mark Davis

2003-12-10 15:33:08 UTC

Permalink

It is easy to fall into a trap of being Eurocentric: to overestimate the ease of
Latin and underestimate the difficulty of transliteration. For many languages
there are no good transliteration standards; or rather, there are many
conflicting ones. And many of these are transcriptions, and not
transliterations. (The difference is that transliteration to Latin is
reversable; one can recover precisely the original text; transcription is not
reversable -- but is more pronouncable).

For example, here is are some sample transliterations (from
http://oss.software.ibm.com/cgi-bin/icu/tr). If you asked an average user of
each of these languages to do a transliteration, even of those who know Latin,
the odds of their coming up with exactly these results are very small.

По своей природе компьютеры могут работать лишь с числами. И для того, чтобы они
могли хранить в памяти буквы или другие символы, каждому такому символу должно
быть поставлено в соответствие число.
=>
Po svoej prirode kompʹûtery mogut rabotatʹ lišʹ s čislami. I dlâ togo, čtoby oni
mogli hranitʹ v pamâti bukvy ili drugie simvoly, každomu takomu simvolu dolžno
bytʹ postavleno v sootvetstvie čislo. (ISO 9)

Οι ηλεκτρονικοί υπολογιστές, σε τελική ανάλυση, χειρίζονται απλώς αριθμούς.
Αποθηκεύουν γράμματα και άλλους χαρακτήρες αντιστοιχώντας στο καθένα τους από
έναν αριθμό (ονομάζουμε μία τέτοια αντιστοιχία κωδικοσελίδα).
=>
Oi i̱lektronikoí ypologistés, se telikí̱ análysi̱, cheirízontai apló̱s
arithmoús. Apothi̱kév̱oun grámmata kai állous charaktí̱res antistoichó̱ntas sto
kathéna tous apó énan arithmó (onomázoume mía tétoia antistoichía
ko̱dikoselída). (ISO 843)

기본적으로 컴퓨터는 숫자만 처리합니다. 글자나 다른 문자에도 숫자를 지정하여 저장합니다.
=>
gibonjeog'euro keompyuteoneun susjaman ceorihabnida. geuljana dareun munja'edo
susjareul jijeonghayeo jeojanghabnida.
(Korean Ministry of Culture & Tourism Transliteration regulations with the
clause 8 variant)

कम्प्यूटर, मूल रूप से, नंबरों से सम्बंध रखते हैं। ये प्रत्येक अक्षर और वर्ण के
लिए एक नंबर निर्धारित करके अक्षर और वर्ण संग्रहित करते हैं।
=>
kampyūṭara, mūla rūpa sē, nambarōṁ sē sambandha rakhatē haiṁ. yē pratyēka akṣara
aura varṇa kē li'ē ēka nambara nirdhārita karakē akṣara aura varṇa saṅgrahita
karatē haiṁ. (ISO 15919)

And there is no known transliteration method for Chinese or Japanese -- there
are (many) transcription standards, but none that preserve the original
characters.

John's point about IPA is particularly apt. Think about the example: if an
English user had to use IPA for web addresses, how would he do it? For
"bort.com" would he use "bɔrt.kɑm", "bɔːt.kɒm", "bɔrʔ.kɑm", ... It would be very
difficult to predict exactly what the spelling in IPA would be, even if he were
conversant with IPA. The same problem is faced by someone using a language
normally written in a non-Latin script when trying to transliterate into Latin.

Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message -----
From: "John C Klensin" <john-***@jck.com>
To: "Dan Oscarsson" <***@kiconsulting.se>; <ietf-***@imc.org>
Sent: Wed, 2003 Dec 10 06:11
Subject: Re: Global and national e-mail address

>
> Dan,
>
> Three observations (short this time)...
>
> (i) Computer geeks and their possible preferences aside, people
> tend to not like transliterations (writing of a name that would
> normally be written in one character set in the characters of
> another). Whether they like "better" transliterations more than
> "worse" transliterations is a cultural issue.
>
> (ii) To reasonably transliterate names and languages, one needs
> not only a collection of the right phonemes, but an appropriate
> and accurate notation for tones. You can't get those out of a
> small extension to Latin letters. I'm told that one can get a
> reasonable approximation of all of the relevant phonemes and
> tones with IPA, but IPA not only uses some characters that are
> distinctly non-Latin-based, but also uses a rather complex
> collection of combining diacriticals. And, at least unless one
> is a professional phonologist, learning IPA and how to use it
> accurately is _hard_ (having had people attempt to teach it to
> me twice, once when I was young enough to learn these things).
> For some hints in a reference that is easily accessible to most
> of us, see the discussion of IPA Characters in the Unicode
> definition (3.0 or 4.0, take your pick).
>
> (iii) If one wants even an approximation to accurate
> transliteration, the symbol-overloading in Latin scripts is bad
> news. E.g., the sound of "ö" (o with diaresis, U+00F6) is
> different in, e.g., Swedish and German. I.e., they are
> different characters, even if they look the same and even if
> Unicode "unified" them. If one is trying to transliterate,
> e.g., Arabic into Roman characters, does one pick a character on
> the basis of the Swedish phoneme or the German ones? (Hint, as
> soon as you start down that path, you end up sliding toward IPA.)
>
> It appears to me that you are proposing a very Euro-centric view
> of things, and it won't work all that well even for Europe.
>
> regards,
> john
>
>
> --On Wednesday, 10 December, 2003 10:32 +0100 Dan Oscarsson
> <***@kiconsulting.se> wrote:
>
> >
> > I will here comment several comments frpm John C Klensin, Adan
> > M Costello and others.
> >
> > With this topic I did not want to talk about replacing ASCII
> > with ISO 8859-1 or about what character encoding to use.
> > Instead I wanted to discuss what names could be suitable to
> > use as a global fallback name.
> >
> > To be able to write a name you need, at least, to be able to
> > have letters so you can write all phonemes used. ASCII only
> > contains 26 letters and they cannot represent all phonemes
> > very well. For example, Swedish have three vowals in addition
> > to the ones available in ASCII. They are represented by the
> > letters "åäö". These are letters, not an "a" or "o" with an
> > accent above. Without those three letters you cannot write all
> > Swedish names. Accents I can live without, but not the letters
> > for our additional phonemes.
> >
> > To be able to write most names in the world I think you need
> > to be able to write about 60 phonems. Not everybody need their
> > own letter (English have about 45 phonems but the 26 letters
> > are enough). So I would expect by adding not that many more
> > letters to the ones in ASCII we could get a quite faire
> > representation of all names in the world. That would be more
> > acceptable to have on a business card.
> >
> > And just like Adam my Swedish keyboard do not have any
> > accented or diacritic letters. But I can still type quite a
> > lot of them by using "compose" or "alt graph".
> >
> > ASCII will never be good enough for use as a "global" name for
> > Swedish. But with a few more letters added it would be
> > possible. I expect the same would work for most other
> > languages. I would not be surprised if 35-45 letters would be
> > enough to give quite good representation of the native names.
> >
> > Dan
> >
> >
>
>
>
>

Dan Oscarsson

2003-12-12 07:20:59 UTC

Permalink

John C Klensin wrote:

I agree that that using Latin as a common/global address is not easy.

>
>It appears to me that you are proposing a very Euro-centric view
>of things, and it won't work all that well even for Europe.

Actually Latin is also Noth/South America and many more.
I tried to find something that could work for at lot more people
than ASCII.

Looking at IETF/W3C you find the standards from them very mych
USA-centric. Both American English spelling as well as preserving
ASCII at all cost. Backward compatibility is only ASCII, but many of us
have been using more characters than ASCII for a very long time.
No backward compatibility for us. And in for example Unicode
the ASCII code points do get a preferred handling that other code points
do not.

While I do not think ASCII is enough for use as a global fallback, an alphabet
with more letters could do.
Alternatively we could use just digits.

Dan

Martin Duerst

2003-12-15 14:35:10 UTC

Permalink

At 08:20 03/12/12 +0100, Dan Oscarsson wrote:

>Looking at IETF/W3C you find the standards from them very mych
>USA-centric. Both American English spelling as well as preserving
>ASCII at all cost.

This may be true for the IETF, but is not correct for W3C. For
an example, please see http://www.w3.org/TR/2003/REC-SVG11-20030114/.

Regards, Martin.