UTF8 Squeak
Yoshiki Ohshima
yoshiki at squeakland.org
Mon Jun 11 18:27:46 UTC 2007
Janko,
> It seems that this was already a Yoshiki idea with WideString, so I'm
> just extending that idea with a TwoByteString to cover 16 bits too.
>
> Yoshiki, am I right?
For storing the bare Unicode code points, I think so. I'm not
convinced that adding 16-bit variation solves any real problems. But
there may be something.
My first a few questions are:
- While vast majority of strings for, say, Japanese can be
represented with in the characters in BMP, you would use
FourByteString for Chinese/Japanese/Korean and some others. Does
this mean that you would *always* use FourByteString for these
"languages" (and not scripts?)
- Suppose you would like to use different line wrapping algorithms
for different languages, how would you keep that information?
-- Yoshiki
More information about the Squeak-dev
mailing list
|