Standard Squeak Font Encodings

Yoshiki Ohshima ohshima at is.titech.ac.jp
Wed Feb 2 11:57:52 UTC 2000


  Hi,

> > Codes 128..159 are only "sort of" free.  The ISO 6429 standard defines a
> > whole lot of so-called "C1 controls", and guess where they go?
> 
> That's why I double quoted "free". I meant "there are no printable
> characters defined in that range". So for font purposes we can use that
> range.

  I completely agree that it's nice if Squeak is going to
use ISO Latin-1 as Character encoding.

  However, from my point of view, there are several
characters in [128..159] that are not suitable for arbitrary
use.  For example, some multi-byte text encoding schemes use
SS2 and SS3 to denote that the succeeding bytes represent a
multi-byte-encoded character.  Even worse, (widely used)
Microsoft kanji encoding scheme uses all of that range.

  There are *many* programs that infer the encoding scheme
of given text from the (byte|bit)-pattern in the text.
Those programs will confuse if Squeak uses the C1 characters
carelessly.

  On the other hand, there are several characters in C0 that
are considered "obsolete (only meaningful for serial data
transmission)".

  If the problem is that we just need several characters
aside from Latin-1, I think it makes sense to use the code
points in C0.

  Just my two ichi-yen-dama.

  -- Yoshiki





More information about the Squeak-dev mailing list