Hi Eliot,
I started looking into this. So far I could not manage to reproduce this locally using a new trunk image and using a trunk image from May and updating it. So far this looks like a mixture of a double encoding and a wrong decoding issue. The character sequence 'ä' further down (in Volker Bäcker) would be ä when interpreted as UTF-8 which in turn when interpreted as UTF-8 is ä, which would be expected in the string. To get to 'ä' though would require to interpret the ä in UTF-8 as CP1252 and then encode it again in UTF-8 and decode it once again using CP1252.
Sanity check before I continue: Does the source code in the method look right in that image?
(I hope all these weird characters will come through to you :) )
Bests Patrick
________________________________ From: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org on behalf of Eliot Miranda eliot.miranda@gmail.com Sent: Wednesday, July 12, 2017 18:51 To: The general-purpose Squeak developers list Subject: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
Hi All,
I had reason to condense changes and then was curious to look for older versions. But when I came to open a changes browser on the newly condensed changes file the UTF-8 decoder failed to parse the source for SystemNavigation class>>privateAuthorsRaw. Something breaks the string at the e acute in Stéphane, and then the decoder gets hopelessly confused.
To reproduce: In a trunk 6.x image do Smalltalk condenseChanges then open a file list, select the changes file, and then click the recent changes button.
here's the SqueakDebug.log:
InvalidUTF8: Invalid utf8: ©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTLYSTEPHANIE!Stephen Smith#sst!Stephen Travis Pope#stp!Stephen Vincent Pair#svp!Steve Davies#sld!Steve Elkins#sge!Steve Fuller#snf!Steve Gilbert#slg!Steve Hunter#skh!Steve Knight#knighty!Steve Mccusker#smcc!Steve Messamore#slm!Steve Sanderson#sms!Steve Wart#swart!Steve Wessels#!Steven Darcy#SMD!Steven Greenberg#greenbes!Steven Rodriguez#optionshiftk!Steven Swerling#sps!Sudheendra Hangal#hangal!Sungjin Chun#chunsj!Suzuki Tetsuya#tetsuya!Syed Abid#taxman!Syed Masoodahmad#masden56!Sylvia Sharma#sharma!Symon Chalk#symonc!Takashi Yamamiya#tak!Tansel Ersavas#mte#MTE!Tarek Demiati#TD!Ted Bracht#TB#TB1!Ted Kaehler#tk!Terry Jenkins#TCJ!Thierry Reignier#TREG!Thijs Janssen#TJ!Thomas Bernitt#tber!Thomas Fröb#thf!Thomas Hemme#Namamazu!Thomas J Keller#TJK!Thomas Kowark#tk!Thomas M. Breuel#tmb!Thomas Mahler#ThMa!Thomas Stambaugh#tms!Thomas Zimmermann#TZ!Tim Cuthbertson#tec!Tim Felgentreff#tfel!Tim Lewis#TimLewis!Tim Olson#tao!Tim Rowledge#TPR#tpr!Timm Knape#tik!Timothy Falconer#teefal!Timothy M#tty!Timothy Retz#tgr!Tobias Isenberg#ti!Tobias Pape#topa!Todd Blanchard#tb!Tom Counsell#tamc!Tom Dailey#td!Tom Koenig#tlk!Tom Plick#tap!Tom Rushworth#tbr!Tommy Thorn#tt!Tomohiro Oda#TO!Tony Garnock-Jones#tonyg!Tony Zampogna#zamp!Torge Husfeldt#th!Torsten Bergmann#tbn#TBN!Torsten Sadowski#ts!Travis Kay#tkay#tlk!Trygve Reenskaug#TRee!Tyler Coumbes#mtc!Tzaddi Beltaine#tsb!Udo Schneider#udos!Vaidotas Didžbalis#vd!Vassili Bykov#vb!Vernon Marsden#vmars!Vijay Mathew Pandyalakal#vmp!Vladimir Janousek#vj!Volker Bäcker#volker!Wally Cash#wac!Walter Wilhelm#ww!Ward Cunningham#ward!Wayne Braun#wb!Wayne D. Elias#wdelias!Webb Mcdonald#wxm!Wilkes Joiner#dwj!Willem van Asperen#wva!William Hess#WFH!William Hidden#whidden!Wolfgang Eder#edw!Wolfgang Helbig#whg!Woon Yeo#!Wuilmer Olaya Bardales#wob!Yagendra Dutt Tripathi#yd!Yang Ha Nguyen#yhm!Yann Monclair#YM!Yanni Chiu#yj!Yasuji Nakayama#yasuji!Yoshiki Ohshima#yo!Yuji Ichikawa#ich!Yunhee Lee#yhl!Yutaka Kamite#yk!Zdenek Novy#Zdenye#ZN!Zeljko Nesic#Poparasan!Zeynep Besen#zeyno' 12 July 2017 9:42:40.918319 am
VM: Mac OS - Smalltalk Image: Squeak6.0alpha [latest update: #17347]
SecurityManager state: Restricted: false FileAccess: true SocketAccess: true Working Dir /Users/eliot/Squeak/Squeak5.1 Trusted Dir /foobar/tooBar/forSqueak/bogus/ Untrusted Dir /Users/eliot/Library/Preferences/Squeak/Internet/My Squeak/
UTF8TextConverter class>>errorMalformedInput: Receiver: UTF8TextConverter Arguments and temporary variables: aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc... Receiver's instance variables: superclass: TextConverter methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc... format: 65538 instanceVariables: nil organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc... subclasses: nil name: #UTF8TextConverter classPool: a Dictionary(#StrictUtf8Conversions->nil ) sharedPools: nil environment: Smalltalk category: #'Multilingual-TextConversion' latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter class>>decodeByteString: Receiver: UTF8TextConverter Arguments and temporary variables: aByteString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#M...etc... outStream: a WriteStream lastIndex: 1 nextIndex: 1 byte1: 169 byte2: nil byte3: nil byte4: nil unicode: nil Receiver's instance variables: superclass: TextConverter methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc... format: 65538 instanceVariables: nil organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc... subclasses: nil name: #UTF8TextConverter classPool: a Dictionary(#StrictUtf8Conversions->nil ) sharedPools: nil environment: Smalltalk category: #'Multilingual-TextConversion' latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter>>decodeString: Receiver: an UTF8TextConverter Arguments and temporary variables: aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc... result: nil Receiver's instance variables: latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter>>nextChunkFromStream: Receiver: an UTF8TextConverter Arguments and temporary variables: input: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.ch...etc... Receiver's instance variables: latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
MultiByteFileStream>>nextChunk Receiver: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes' Arguments and temporary variables:
Receiver's instance variables:
ChangeList class>>browseRecentLogOn: Receiver: ChangeList Arguments and temporary variables: origChangesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6proj...etc... end: 13286751 done: false block: 7195999 pos: 7198297 changesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectL...etc... position: nil prevBlock: 7197023 chunk: #('privateAuthorsRaw
^ ''Aaron Reichow#ajr!Abigail Sanchez#as!Adam Eng...etc... Receiver's instance variables: superclass: CodeHolder methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc... format: 65548 instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc... organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc... subclasses: {ChangeListForProjects . VersionsBrowser} name: #ChangeList classPool: nil sharedPools: nil environment: nil category: #'Tools-Changes'
ChangeList class>>browseRecentLogOnPath: Receiver: ChangeList Arguments and temporary variables: fullName: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes' Receiver's instance variables: superclass: CodeHolder methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc... format: 65548 instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc... organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc... subclasses: {ChangeListForProjects . VersionsBrowser} name: #ChangeList classPool: nil sharedPools: nil environment: nil category: #'Tools-Changes' _,,,^..^,,,_ best, Eliot