UTF8 Squeak

Janko Mivšek janko.mivsek at eranova.si
Thu Jun 7 20:42:32 UTC 2007

Hi Yoshiki,

Yoshiki Ohshima wrote:
>   It is so true that I should've looked at the class names in VW
> before doing everything...
>> 1. internally everything is in 16bit Unicode, without any additionally
>>     encoding info attached to strings
>   If they use 16-bit per char, how do they deal with surrogated pairs?

I looked once again and there is actually a FourByteString too. This 
probably answer your question. VW also support Japanese locale well.

Best regards

>> 2. there is a class ByteString for pure ASCII(1) and TwoByteString for
>>     Unicode strings. Conversion from Byte to TwoByteString is automatic
>>     when you concatenate two mixed-width strings.
>   This is what Squeak does with ByteString and WideString.
>> 3. streams: external streams(2) are always dealing with
>>     encodings, internal streams never
>   In Squeak to do conversion from/to file useMultiByteFileStream.  For
> memory based strings, use MultiByteBinaryOrTextStream.  Or, you can
> manually create an instance of TextConverter and write some logic to
> pass chars from/to streams.
>> (1) Strings have actually subclasses for 8 bit encodings like
>>      ISO8859L1String etc. but this seems not used much recently
>   So, as in Squeak, having only ByteString and WideString (with a
> common abstract superclass) is better^^;
>> (2) with help of an EncodedStream as a wrapper of original stream. And
>>      it is helped by StreamEncoders, which actually do en/decoding.
>>      There is quite a number of them, from Base64StreamEncoder to for us
>>      more interesting UTF8StreamEncoder.
>   As I wrote, you can write these variation of Streams by youself
> quite easily.  I admit that there is no framework for it.
>> I find VW approach very simple and elegant and I think Squeak can solve 
>> Unicode easily by following VW as an example a bit.
>   Thank you for summarizing it!
> -- Yoshiki

Janko Mivšek
Smalltalk Web Application Server

More information about the Squeak-dev mailing list