Hi Eliot,
I made a snippet to read the source of the method as an array of integers from the .mcz in the package cache:
(MCMczReader versionFromFile: 'package-cache/Multilingual-ul.210.mcz') snapshot definitions detect: [ :each | each isMethodDefinition and: [ each className = #JapaneseEnvironment and: [ each selector = #flapTabTextFor:in: ] ] ] ifFound: [ :definition | Array streamContents: [ :stream | | source | source := definition source. 1 to: source size do: [ :index | stream nextPut: (source basicAt: index) ] ] ] ifNone: [ self error ]
In MCMczReader >> #loadDefinitions, if you change this line
[:m | [^definitions := (DataStream on: m contentStream) next definitions]
to this
[:m | [ self error. ^definitions := (DataStream on: m contentStream) next definitions]
then the definition will be read from the sources instead of the binary snapshot, and you'll get the correct source code.
First I disabled all the ZipPlugin primitives to see if those are responsible for this issue, but they turned out to be okay. Then I dag into DataStream, and I came to the conclusion that the issue is in BitBlt. The mangled characters appear when PositionableStream >> #nextWordsInto: applies some BitBlt magic to convert the read bytes into a WideString.
Here's a snippet triggering the error:
| wideString source pos blt expectedWideString | source := #[1 64 255 14 1 64 48 251]. expectedWideString := WideString fromByteArray: source. wideString := WideString new: source size // 4. pos := 0. blt := (BitBlt toForm: (Form new hackBits: wideString)) sourceForm: (Form new hackBits: source). blt combinationRule: Form over; sourceX: 0; sourceY: pos // 4; height: wideString byteSize // 4; width: 4; destX: 0; destY: 0; copyBits. wideString restoreEndianness. self assert: wideString = expectedWideString
Levente