Unicode support

Peter William Lount peter at smalltalk.org
Wed Sep 15 21:50:24 UTC 1999


Hi John,

I agree that a UNICODE implementation must meet the specification.
Unfortunately UNICODE does not currently represent all the languages of the
world. By using 32bit character object instances and seperate
encoders/decoders for each "character set standard" you gain tremendous
flexibility in supporting all langages as they get an encoding.

Furthermore, there can be many other notions associated with "character
objects" beyond their "space efficient" encodings.

All the best,

Peter William Lount
peter at smalltalk.org
http://www.smalltalk.org


>I'm wondering about the space and time efficiency of this prospect. It
>intuitively seems much more reasonable for 256 characters than for
>57,709 16-bit values (some characters, some not) as defined in Unicode
>3.0. I, of course, could be stuck in alarmist mode, so a good set of
>reasons would justify it to me.

>ASCII is such a messy thing, there is nothing wrong with doing it one
>way or another so that it seems intuitive. But for Unicode, there are
>many special circumstances that are spelled out distinctly by the
>standard and many of them are required. Look at the document on
>collation, for an example. Doing Unicode is a non-trivial thing, but
>it will be beautiful if it is done exactly to spec, and all of Squeak
>is transformed to use it.

>-John





More information about the Squeak-dev mailing list