Translation problem

Richard A. O'Keefe ok at atlas.otago.ac.nz
Tue Dec 12 03:14:55 UTC 2000


	I'm wondering if the problem is 1) a bad choice of a file format
	from my email, 2) the font utilised, 3) a different mapping of
	the caracters using 129 - 256 ascii code between MacOS and NT
	4.0, 4) I don't know what else....,"
	
It's (3).
There is a family of international standards for 8-bit coded character
sets, ISO 8859.  ASCII covers only codes 0-127, codes 128-255 are by
definition not ASCII.

Many UNIX systems use ISO Latin 1 (ISO 8859-1).
Macintoshes use an older character set known as MacRoman.
Latin-1 and MacRoman don't just assign different numbers to the
characters, they disagree about which characters exist.
There are MacRoman characters that cannot be mapped to Latin-1
at all, and vice versa.
Microsoft of course have their own character sets;
most Windows systems use Windows 1252, which is *almost* the same
as ISO Latin 1, except it adds a bunch of characters right where
UNIX expects the C1 controls to be.

I'd hate Microsoft for this, if it weren't for the fact that some of
the added characters are necessary for writing English, and make
Windows 1252 a better match to MacRoman.

	In others words, I know that a program in assembler must be rewritten as often as there is
	differents machines, but I did'nt know it was the same for a text written in an other people
	language than english... 

Correctly written English uses several characters:
    OE and oe ligatures,
    vowels with diaresis,
    en dash and em dash,
    left and right 6-9 quotations marks,
    left and right 66-99 quotation marks,

none of which are in Latin-1, let alone ASCII.  Given the history of the
computer industry, I suppose we should count ourselves lucky to have
capital letters (FIELDATA and ASCII 63 didn't).

It's high time the FTP protocol was revised to have a "transmit as UTF-8"
mode, and that FTP programs were revised to use it.





More information about the Squeak-dev mailing list