UTC-8 (was Re: Celeste encoding (was: Duplicate messages inCeleste))

Dan Ingalls Dan.Ingalls at disney.com
Thu Mar 16 22:12:28 UTC 2000

AGREE at CarltonFields.com wrote...
>Of course it ain't trivial, but perhaps there's an interim, if not ad hoc solution that serves every relevant purpose?  It seems to me that the Number hierarchy is proof positive that widely disparate, differently sized and incomparable models with similar features can be resolved into a seamless whole.
>In a sense, isn't a pure ASCII string just a subset of UTC-8?  Can't a hierarchy with built-in coercion be used to preserve ALL of the efficiencies of the status quo, while still permitting (or at least paving the way) toward the full generality of UTC-8 and Unicode?
>Why can't the ASCII string be the SmallInteger of a new STRINGTHING hierarchy, where operations within the string world be seamless?  Every time I raise this point, there were countless objections about things Squeak so configured could not do (the biggest deal was auto-reversing Hebrew/Anglo-Numeric text), but it seems that we could still accomodate many of the advantages of Unicode, integrate the whole into Squeak, while preserving ALL of the efficiencies of the present ASCII world for unmixed ASCII and Character stuff.

I agree with this approach entirely.  It's a great Squeak Samuri project (I would do it tonight, but I've got a hot date ;-).  Just put StringThing between ArrayedCollection and String, move all of String's methods up a level, leaving only those that have to do with String's primitive behavior.  It shouldn't take more than an hour, and everything should still work.

Then... define, say, String16 (*) that uses 16 bits and produces characters with codes up to 65535.  Make one up like 'Squ<999>eak', and see if it prints.  Then see if it displays.  Etc.  Lots of things will break, but that's half the fun.  You'll find out if text display handles characters that are not in the font, and you'll have to decide whether all characters will still be unique, but this is what life on the frontier is all about.

When in doubt, try it out.

	- Dan

(*) It's probably worth starting with the most general expansion first.  Then from there on, it's only optimization and engineering to do the others -- the interfaces will have all been worked out.

PS:  I'm not saying SqC will embrace unicode, I'm just saying that it may only take a couple of days to understand most of what is involved.

More information about the Squeak-dev mailing list