Celeste encoding (was: Duplicate messages in Celeste)
Richard A. O'Keefe
ok at atlas.otago.ac.nz
Tue Mar 14 00:54:51 UTC 2000
Stefan Matthias Aust <sma at 3plus4.de> wrote:
> Make it Latin-15 (so I get my beloved (euro sign) ;-) or WinAnsi (CP1252)
> which is a superset of Latin-15 that contains a few important characters
> and would also ease the convertion to MacRoman, and I strongly second this.
Ahem. There isn't any Latin-15. The ISO 18859-15 character set standard
is called ISO Latin 9 (nine, not fifteen) or, jocularly, Latin 0.
ISO Latin 9 uses the Euro instead of the international currency symbol
(just like recent versions of the Macintosh character sets, although I note
that the fonts installed on my MacOS 8.6 machine are a nasty mix of Apple
fonts that _do_ have the Euro and originally Windows fonts that don't, like
Arial.) It also kicks out a number of accented spaces in favour of
(S,s,Z,z) with caron, Y with diaeresis, and the OE, oe ligatures.
The suggestion to use Windows CP 1252 as a base instead of Latin1 or Latin9
is a good one; it will make the transition from MacRoman _much_ more painless
as it includes a lot of the worthwhile MacRoman characters, like
6...9 66...99 quotation marks. UNIX users _can_ get their electronic hands
on compatible fonts for the X Window system, so not only would the switch
support all the characters UNIX users are used to, it _could_ support the
others as well.
Someone else asked:
Well, isn't Latin-1 the de facto standard for the Internet?
In a word, NO. HTML 3.2 is defined in terms of Latin 1, but a lot of the
Web pages I _have_ to deal with are actually CP 1525. HTML 4.0 is
defined in terms of ISO 10646, but it's still not really practical to
actually _rely_ on that to any marked extent.
So if we are just going to have one single 8-bit encoding, Latin-1 would
seem the easiest to use.
CP 1252 would give us more compatibility with more machines, including UNIX
and MacOS. I don't like it, but them's the breaks.
More information about the Squeak-dev