[squeak-dev] Re: [Pharo-dev] MC should really write snaphsot/source.st in UTF8

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Thu May 23 10:52:55 UTC 2013


The snapshot/source.st does not contain a mix of ByteString and WideString
because a single String is written during the process (all code is written
into a String new writeStream which will make the String wide at first wide
Character), so it should work.


2013/5/23 Henrik Sperre Johansen <henrik.s.johansen at veloxit.no>

> On 23.05.2013 00:06, Nicolas Cellier wrote:
>
>> That sounds good. We could even try to fallback to UT-32 if we encounter
>> zeros (but his should be very rare...).
>>
>> For write, ZipArchive are un-aware of any encoding... They use latin1.
>> In Squeak, I could place some squeakToUTF8 sends in MCMczWriter, and
>> equivalent UTF8TextConverter in Pharo #serializeDefinitions:, maybe this is
>> needed in some other serialize* (version, dependencies who knows...)
>>
> That won't work, if the file contained sources for both widestring and
> bytestring sourced methods.
> In which case the file would contain code stored BOTH as latin1 bytes, and
> (same endianness as platform saved from) UTF32.
> Which means you'd have to detect and handle jumps back and forth in
> encoding when reading...
> IMHO, just consider those files lost beyond hope.
>
> Cheers,
> Henry
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20130523/54128ec9/attachment.htm


More information about the Squeak-dev mailing list