UTF8 Squeak
Colin Putney
cputney at wiresong.ca
Tue Jun 12 06:29:24 UTC 2007
On Jun 11, 2007, at 1:31 PM, Janko Mivšek wrote:
>> Is that a fair characterization of your position?
> Yes, or just a bit better said: my position is a separation of
> internal string representation from encodings. Internal strings
> should be in pure Unicode while conversions to other encodings
> should be done separately, probably best with already existing
> TextEncoders. Those text encoders can be extended to meet wider
> requirements, but strings shall stay strings - they shall contain
> characters only.
Well, this is progress, of a sort. What you write above would imply
that Strings should be arrays of pointers to Character objects. Your
proposal is actually to have strings encoded as ISO 8859-1, UCS-2 or
UCS-4. That's a reasonable optimization to save space, so long as the
semantics of strings are preserved - other objects can't tell what
the internal representation is, because all they see are characters.
But if encapsulation works for fixed length encodings, why not for
UTF-8 or UTF-16?
> By the way, I'm a web developer too and porting Aida to Squeak
> actually started my interest on Unicode support here :)
Yeah, I was wondering about that. Does Aida do a whole lot of work on
string buffers or something? Doesn't it use streams? Why are you so
dead set against variable length encodings?
One other thing: you seem to be advocating that Squeak just adopt the
same design that VisualWorks uses. VisualWorks is great, but it does
have immediate Characters, which Squeak does not. That changes the
design constraints a bit.
Colin
More information about the Squeak-dev
mailing list
|