UTF8 Squeak

subbukk subbukk at gmail.com
Fri Jun 8 12:52:24 UTC 2007


On Friday 08 June 2007 11:28 am, Yoshiki Ohshima wrote:
> > Of course, for some encodings (such as UTF-8) there would probably be a
> > performance penalty for accessing characters at an arbitrary index
> > ("aString at: n.") But there may be good ways to mitigate that, using
> > clever implementation tricks (caveat: I haven't actually tried it.) 
> > However, with my proposal, one is free to use UTF-16 for all Strings, or
> > UTF-32 for all Strings, or ASCII for all Strings--based on one's space
> > and performance constraints, and based on the character repertoire one
> > needs for one's user base.  And the conversion to UTF-16 or UTF-32 (or
> > whatever) can be done when the String is read from an external Stream
> > (using the VW stream decorator approach, for example.)
>
>   I *do* see some upsides of this approach, actually, but the
> downsides is overwhelming bigger, if you think that Smalltalk is a
> self-contained system.  Handling keyboard input alone would make the
> system really complex.
I am not sure if Squeak needs multiple transformation formats for Unicode code 
points. A Unicode code point is 16-bits and UTF-8 varies from 8 to 32-bits. I  
Is there any sound case for other UTFs now (outside of VMs)? The Wikipedia 
entry below has a good summary of pros and cons:
    http://en.wikipedia.org/wiki/UTF-8

Rob Pike's note:
  http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
is a very good reality check on the situation.

For children who will be working in multilingual environment, Squeak will be 
spending most of its time in waiting for a button/key push anyways :-).

Regards .. Subbu



More information about the Squeak-dev mailing list