File names was Re: Warning: Large Babel translation

Hannes Hirzel hannes.hirzel.squeaklist at bluewin.ch
Sun Nov 16 22:38:06 UTC 2003


Yoshiki Ohshima wrote:

>   Well, most of the non-English, non-Unicode encodings are more or
> less compatible with ASCII^^;

Hoping the following clarifies a point...


UTF-8 is compatible not only on the encoding level (assignment of code 
numbers), but as well on the physical level (sequence of bytes).
Every ASCII string can be considered UTF-8 encoded already.

This is not the case for e.g. UTF-16. The code numbers of an 
English-only text correspond to the ASCII codes but not the sequence of 
bytes.

The physical level is important, I think, if we speak of VMs and 
compatibility across platforms. ASCII is the only encoding for data 
exchange which worked universally in the last 40 years (from a general 
user point of view). UTF-8 might become the "ASCII" of the 21st century.

Hannes


Links:
http://en2.wikipedia.org/wiki/UTF-8
http://en2.wikipedia.org/wiki/UTF-16






More information about the Squeak-dev mailing list