UTC-8 (was Re: Celeste encoding (was: Duplicate messages in Celeste))

Lex Spoon lex at cc.gatech.edu
Thu Mar 16 13:08:08 UTC 2000


Bijan Parsia <bparsia at email.unc.edu> wrote:

> 
> Is there a plan for dealing with Unicode, at least externally? I'm really
> just talking about enough to satisfy a parser...I'm happy to punt on
> display and font issues. Though I would like ASCIIish text to be
> intelligible ;)
> 

It's been batted around before, and I thought a project group had
started up to discuss it further.

In any case, most people would like it, but it's not trivial.  You don't
want to just have a 16-bit encoding being used all the time, because the
number of bytes increase in Squeak 2.7 would be:

>>	String allInstances inject: 0 into: [ :sum :string | sum + string size ] -->  1044616


So it would take a meg to do the simplistic thing.  And so, you almost
certainly need to use some compressed scheme internally.

Furthermore, there is the issue of displaying your nice Unicode stuff. 
I suppose people in practice will load into an image fotns for the
portions of Unicode they see a lot, and other characters will show up as
little squares or something.

Then there is string comparison.  Probably the letter A appears in
oogles of places throughout unicode, and all of them should be
considered equal in a case-insensitive comparison.


I dunno, these are just some of the issues that spring to mind.  It
would be awesome to do the switch while Squeak is still relatively
young, but it ain't trivial.


Lex





More information about the Squeak-dev mailing list