UTF8 Squeak

Yoshiki Ohshima yoshiki at squeakland.org
Mon Jun 11 18:27:46 UTC 2007


> It seems that this was already a Yoshiki idea with WideString, so I'm 
> just extending that idea with a TwoByteString to cover 16 bits too.
> Yoshiki, am I right?

  For storing the bare Unicode code points, I think so.  I'm not
convinced that adding 16-bit variation solves any real problems.  But
there may be something.

  My first a few questions are:

  - While vast majority of strings for, say, Japanese can be
    represented with in the characters in BMP, you would use
    FourByteString for Chinese/Japanese/Korean and some others.  Does
    this mean that you would *always* use FourByteString for these
    "languages" (and not scripts?)

  - Suppose you would like to use different line wrapping algorithms
    for different languages, how would you keep that information?

-- Yoshiki

More information about the Squeak-dev mailing list