Unicode support
ohshima at is.titech.ac.jp
ohshima at is.titech.ac.jp
Wed Sep 15 19:03:22 UTC 1999
Hi Duncan,
> So I don't think it'd be good for someone to go through the hassle of
> implementing a UTF-8 set of string methods. I like the idea of
> bringing Unicode into Squeak. But there's a lot more involved than
> just adding 2-byte arrays.
I completely agree with you. I'd like to add other
issues:
* Strictly speaking, UCS-2 is *not* indexable.
The standard specifies character composition and
surrogation. One cannot get n-th character from a 2-octet
array whose elements are UCS-2 encoded data in O(1) time.
* The XML 1.0 standard requires 21 bit wide character space
which is a defined part of ISO-10646.
So one shouldn't to start to write any XML-related program
with 16-bit fixed representation.
* The glyph of text will vary platform to platform.
To display a UCS-2 encoded text, one have to choose a
font for the text. However, fonts for UCS-2 are something
like "Japanese font for UCS-2" and "Simplified Chinese
font for UCS-2" or like that and the glyph for a code
point in one font tends to be VERY different from the
other. So a system such as Squeak, which needs to control
the final (displayed) representation of a text, the 16-bit
fixed format for a character won't work.
If you don't mind, please see "Multilingual Support" page
at the wiki and drop some comment.
Thank you.
OHSHIMA Yoshiki
Dept. of Mathematical and Computing Sciences
Tokyo Institute of Technology
More information about the Squeak-dev
mailing list
|