UTF8 Squeak
Janko Mivšek
janko.mivsek at eranova.si
Thu Jun 7 20:42:32 UTC 2007
Hi Yoshiki,
Yoshiki Ohshima wrote:
> It is so true that I should've looked at the class names in VW
> before doing everything...
>
>> 1. internally everything is in 16bit Unicode, without any additionally
>> encoding info attached to strings
>
> If they use 16-bit per char, how do they deal with surrogated pairs?
I looked once again and there is actually a FourByteString too. This
probably answer your question. VW also support Japanese locale well.
Best regards
Janko
>
>> 2. there is a class ByteString for pure ASCII(1) and TwoByteString for
>> Unicode strings. Conversion from Byte to TwoByteString is automatic
>> when you concatenate two mixed-width strings.
>
> This is what Squeak does with ByteString and WideString.
>
>> 3. streams: external streams(2) are always dealing with
>> encodings, internal streams never
>
> In Squeak to do conversion from/to file useMultiByteFileStream. For
> memory based strings, use MultiByteBinaryOrTextStream. Or, you can
> manually create an instance of TextConverter and write some logic to
> pass chars from/to streams.
>
>> (1) Strings have actually subclasses for 8 bit encodings like
>> ISO8859L1String etc. but this seems not used much recently
>
> So, as in Squeak, having only ByteString and WideString (with a
> common abstract superclass) is better^^;
>
>> (2) with help of an EncodedStream as a wrapper of original stream. And
>> it is helped by StreamEncoders, which actually do en/decoding.
>> There is quite a number of them, from Base64StreamEncoder to for us
>> more interesting UTF8StreamEncoder.
>
> As I wrote, you can write these variation of Streams by youself
> quite easily. I admit that there is no framework for it.
>
>> I find VW approach very simple and elegant and I think Squeak can solve
>> Unicode easily by following VW as an example a bit.
>
> Thank you for summarizing it!
>
> -- Yoshiki
>
>
--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si
More information about the Squeak-dev
mailing list
|