Nicolas Cellier uploaded a new version of Monticello to project The Trunk: http://source.squeak.org/trunk/Monticello-nice.546.mcz
==================== Summary ====================
Name: Monticello-nice.546 Author: nice Time: 31 May 2013, 12:47:59.453 am UUID: ff2975ee-c296-4d65-8b8a-9b512607a2be Ancestors: Monticello-kb.545
Let's encode .mcz/snapshot/source.st in UTF8 Let's decode UTF8 in MCStReader.
Previously, it was encoded in latin-1 (iso-8859-L1) for ByteString and UTF-32BE for WideString. And would always be decoded in latin-1 by MCStReader.
Note that compatibility with legacy code is handled by catching InvalidUTF8 exception.
An alternative would be to use a BOM in new snapshot. However, since the snapshot/source.st is not used by Squeak tools, it's not really worth.
=============== Diff against Monticello-kb.545 ===============
Item was changed: ----- Method: MCMczWriter>>serializeDefinitions: (in category 'serializing') ----- serializeDefinitions: aCollection + ^(String streamContents: [:aStream | + | writer | + writer := self snapshotWriterClass on: aStream. + writer writeDefinitions: aCollection]) + squeakToUtf8! - | writer s | - s := RWBinaryOrTextStream on: String new. - writer := self snapshotWriterClass on: s. - writer writeDefinitions: aCollection. - ^ s contents!
Item was changed: ----- Method: MCStReader>>readStream (in category 'evaluating') ----- readStream + | contents | + contents := stream contents. + contents := [contents utf8ToSqueak] on: InvalidUTF8 do: [:exc | + "Case of legacy encoding, presumably it is latin-1 and we do not need to do anything + But if contents starts with a null character, it might be a case of WideString encoded in UTF-32BE" + exc return: (((contents beginsWith: Character null asString) and: [ contents size \ 4 = 0 ]) + ifTrue: [WideString fromByteArray: contents asByteArray] + ifFalse: [contents])]. ^ ('!!!!
+ ', contents) readStream! - ', stream contents) readStream!
packages@lists.squeakfoundation.org