[squeak-dev] Xtreams files

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Wed Oct 13 06:13:08 UTC 2010


Martin Kobetic fixed that in the VW repository. Be patient :)
Here is his message about the changes:


The new version in Store includes the following:

* changed default UTF16 encoder setup to be always big-endian,
regardless of current platform; apparently Unicode says that's the
default assumption without BOM and that's also what Squeak does.
Ultimately we probably want our own portable encoder.
* consequently changed the UTF16 tests to use big-endian unconditionally as well
* bunch of tests neglected to close transforming write streams they
created leaving collectable, but lingering processes behind.
* simplified Encoder class initialization (the registerEncodingsIn:
setup was overkill)
* Encoder class>>for: (and therefore also #encoding:) now accepts both
a Symbol or anything that understands #streamingAsEncoder. This allows
passing in a preconfigured Encoder instance for example.

Nicolas

2010/10/13 Levente Uzonyi <leves at elte.hu>:
> On Tue, 12 Oct 2010, Yoshiki Ohshima wrote:
>
>> At Tue, 12 Oct 2010 23:35:36 +0200,
>> Sven Van Caekenberghe wrote:
>>>
>>> Levente,
>>>
>>> On 12 Oct 2010, at 23:08, Levente Uzonyi wrote:
>>>
>>>> Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters
>>>> are not initialized to the platform's endianness, but the test expects that.
>>>
>>> OK, so UTF16TextConverters>>#useLittleEndian: should be called with
>>> Smalltalk isLittleEndian as argument, yes ?
>>> Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very
>>> elegantly. Or would it be better done in an initialize (that is not there) ?
>>
>>  Hmm, doesn't it sound like the test is wrong?  The endianness in
>> UTF16 means the order in two-octet for each code-point.  The external
>> data comes as Byte(Array|String) and internal is UTF-32-ish data, so
>> the platform endianness should not matter.
>
> According to rfc2781 the test is wrong and Squeak's implementation is right:
>
> "4.3 Interpreting text labelled as UTF-16
>
>   Text labelled with the "UTF-16" charset might be serialized in either
>   big-endian or little-endian order. If the first two octets of the
>   text is 0xFE followed by 0xFF, then the text can be interpreted as
>   being big-endian. If the first two octets of the text is 0xFF
>   followed by 0xFE, then the text can be interpreted as being little-
>   endian. If the first two octets of the text is not 0xFE followed by
>   0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
>   interpreted as being big-endian."
>
>
> Levente
>
>>
>> -- Yoshiki
>>
>>
>
>



More information about the Squeak-dev mailing list