Unicode support

ohshima at is.titech.ac.jp ohshima at is.titech.ac.jp
Wed Sep 15 19:03:22 UTC 1999


  Hi Duncan,

> So I don't think it'd be good for someone to go through the hassle of
> implementing a UTF-8 set of string methods. I like the idea of
> bringing Unicode into Squeak. But there's a lot more involved than
> just adding 2-byte arrays.

  I completely agree with you.  I'd like to add other
issues:

* Strictly speaking, UCS-2 is *not* indexable.
  The standard specifies character composition and
  surrogation.  One cannot get n-th character from a 2-octet
  array whose elements are UCS-2 encoded data in O(1) time.

* The XML 1.0 standard requires 21 bit wide character space
  which is a defined part of ISO-10646.
  So one shouldn't to start to write any XML-related program
  with 16-bit fixed representation.

* The glyph of text will vary platform to platform.
  To display a UCS-2 encoded text, one have to choose a
  font for the text.  However, fonts for UCS-2 are something
  like "Japanese font for UCS-2" and "Simplified Chinese
  font for UCS-2" or like that and the glyph for a code 
  point in one font tends to be VERY different from the
  other.  So a system such as Squeak, which needs to control 
  the final (displayed) representation of a text, the 16-bit 
  fixed format for a character won't work.

If you don't mind, please see "Multilingual Support" page
at the wiki and drop some comment.

  Thank you.

                                             OHSHIMA Yoshiki
                Dept. of Mathematical and Computing Sciences
                               Tokyo Institute of Technology 





More information about the Squeak-dev mailing list