Unicode support

John Duncan jddst19+ at pitt.edu
Fri Sep 24 20:01:44 UTC 1999


Hi-

Getting into the habit of rearranging emails:)

  Thank you for your patience with my English skill:-).

Totemo daijobu de irashaimasu nee.:) Nihongo de watashi wa dame de
gozaimasu.:)

>  Roughly speaking, the character representation in my
> implementation is somewhat similar to
> SmallInteger/LargePositiveInteger integration.  the
> ISO-8859-1 characters are represented in the same way as
> current Character, and the others are represented as an
> object with 30 bit value field.  Currently, there is no
> assist from the VM, so you can test it with vanilla VM.

I think that's grand, but I wonder if better efficiency and
flexibility can be gained from using UCS-2 as the base character
representation, and then breaking out to UCS-4, which can represent
everything, if necessary. Or, since it is improbable that planes
16384-65535 will ever be used, breaking out to the similar 30-bit
representation that you use. UCS-2 would provide character objects for
all of the modern communication languages standard in every model of
Squeak. By using ISO-8859-1 as the base representation, it would be
more likely to have the strange situation that MULE has of being able
to represent all languages but not being able to use them. I don't
think that the characters would be all that costly, considering having
one universal font costs only about 10M, those Unicode characters can
be directly transmitted from any operating system that supports
Unicode, and those characters can be cached. I think it's much more
likely that Unicode fonts will exist on users' machines than other
fonts.

I don't know if users really want 10M of one font in the image. I'm
not sure how your implementation handled the "vast amount of
information" problem.

-John





More information about the Squeak-dev mailing list