janko.mivsek at eranova.si
Thu Jun 7 20:42:32 UTC 2007
Yoshiki Ohshima wrote:
> It is so true that I should've looked at the class names in VW
> before doing everything...
>> 1. internally everything is in 16bit Unicode, without any additionally
>> encoding info attached to strings
> If they use 16-bit per char, how do they deal with surrogated pairs?
I looked once again and there is actually a FourByteString too. This
probably answer your question. VW also support Japanese locale well.
>> 2. there is a class ByteString for pure ASCII(1) and TwoByteString for
>> Unicode strings. Conversion from Byte to TwoByteString is automatic
>> when you concatenate two mixed-width strings.
> This is what Squeak does with ByteString and WideString.
>> 3. streams: external streams(2) are always dealing with
>> encodings, internal streams never
> In Squeak to do conversion from/to file useMultiByteFileStream. For
> memory based strings, use MultiByteBinaryOrTextStream. Or, you can
> manually create an instance of TextConverter and write some logic to
> pass chars from/to streams.
>> (1) Strings have actually subclasses for 8 bit encodings like
>> ISO8859L1String etc. but this seems not used much recently
> So, as in Squeak, having only ByteString and WideString (with a
> common abstract superclass) is better^^;
>> (2) with help of an EncodedStream as a wrapper of original stream. And
>> it is helped by StreamEncoders, which actually do en/decoding.
>> There is quite a number of them, from Base64StreamEncoder to for us
>> more interesting UTF8StreamEncoder.
> As I wrote, you can write these variation of Streams by youself
> quite easily. I admit that there is no framework for it.
>> I find VW approach very simple and elegant and I think Squeak can solve
>> Unicode easily by following VW as an example a bit.
> Thank you for summarizing it!
> -- Yoshiki
Smalltalk Web Application Server
More information about the Squeak-dev