UTF8 Squeak
subbukk
subbukk at gmail.com
Fri Jun 8 12:52:24 UTC 2007
On Friday 08 June 2007 11:28 am, Yoshiki Ohshima wrote:
> > Of course, for some encodings (such as UTF-8) there would probably be a
> > performance penalty for accessing characters at an arbitrary index
> > ("aString at: n.") But there may be good ways to mitigate that, using
> > clever implementation tricks (caveat: I haven't actually tried it.)
> > However, with my proposal, one is free to use UTF-16 for all Strings, or
> > UTF-32 for all Strings, or ASCII for all Strings--based on one's space
> > and performance constraints, and based on the character repertoire one
> > needs for one's user base. And the conversion to UTF-16 or UTF-32 (or
> > whatever) can be done when the String is read from an external Stream
> > (using the VW stream decorator approach, for example.)
>
> I *do* see some upsides of this approach, actually, but the
> downsides is overwhelming bigger, if you think that Smalltalk is a
> self-contained system. Handling keyboard input alone would make the
> system really complex.
I am not sure if Squeak needs multiple transformation formats for Unicode code
points. A Unicode code point is 16-bits and UTF-8 varies from 8 to 32-bits. I
Is there any sound case for other UTFs now (outside of VMs)? The Wikipedia
entry below has a good summary of pros and cons:
http://en.wikipedia.org/wiki/UTF-8
Rob Pike's note:
http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
is a very good reality check on the situation.
For children who will be working in multilingual environment, Squeak will be
spending most of its time in waiting for a button/key push anyways :-).
Regards .. Subbu
More information about the Squeak-dev
mailing list
|