UTF8 Squeak

Janko Mivšek janko.mivsek at eranova.si
Mon Jun 11 20:31:42 UTC 2007

Hi Colin,

Colin Putney wrote:
> On Jun 11, 2007, at 4:04 AM, Janko Mivšek wrote:
>> Anyone can definitively stay with UTF8 encoded strings in plan 
>> BytString or subclass to UTF8String by himself. But I don't know why 
>> we need to have UTF8String as part of string framework. Just because 
>> of meaning? Then we also need to introduce an ASCIIString :)
>> I think that preserving simplicity is also an important goal. We need 
>> to find a general yet simple solution for Unicode Strings, which will 
>> be good enough for most uses, as is the case for numbers for instance. 
>> We deal with more special cases separately. I claim that pure Unicode 
>> strings in Byte, TwoByte or FourByteString is such a general support. 
>> UTF8String is already a specific one.
> Ok, so what you're saying is this: ByteString, TwoByteString and 
> FourByteString are good enough for the most purposes. Web developers and 
> anyone else that needs to work with other encodings should roll their 
> own solutions, so as not to burden the rest of the community with 
> clutter caused by support for other encodings, or even hooks to make 
> such things easy to integrate with the base string code.
> Is that a fair characterization of your position?
Yes, or just a bit better said: my position is a separation of internal 
string representation from encodings. Internal strings should be in pure 
Unicode while conversions to other encodings should be done separately, 
probably best with already existing TextEncoders. Those text encoders 
can be extended to meet wider requirements, but strings shall stay 
strings - they shall contain characters only.

By the way, I'm a web developer too and porting Aida to Squeak actually 
started my interest on Unicode support here :)

Best regards

Janko Mivšek
Smalltalk Web Application Server

More information about the Squeak-dev mailing list