UTF8 Squeak

Michael Rueger michael at impara.de
Mon Jun 11 18:36:06 UTC 2007


Yoshiki Ohshima wrote:
>   Janko,
> 
>> It seems that this was already a Yoshiki idea with WideString, so I'm 
>> just extending that idea with a TwoByteString to cover 16 bits too.
>>
>> Yoshiki, am I right?
> 
>   For storing the bare Unicode code points, I think so.  I'm not
> convinced that adding 16-bit variation solves any real problems.  But
> there may be something.

A lot of text is basically 8-bit, *except* for the occasional wide dash 
etc, blowing up the text to 32-bit although 16 would be more than enough.

>   - Suppose you would like to use different line wrapping algorithms
>     for different languages, how would you keep that information?

The question is which, if any, language dependent (text layout?!) 
attributes should be encoded into the String rather than kept as text 
attributes.

Michael



More information about the Squeak-dev mailing list