Hi All,
I had reason to condense changes and then was curious to look for older versions. But when I came to open a changes browser on the newly condensed changes file the UTF-8 decoder failed to parse the source for SystemNavigation class>>privateAuthorsRaw. Something breaks the string at the e acute in Stéphane, and then the decoder gets hopelessly confused.
To reproduce: In a trunk 6.x image do Smalltalk condenseChanges then open a file list, select the changes file, and then click the recent changes button.
here's the SqueakDebug.log:
InvalidUTF8: Invalid utf8: ©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTLYSTEPHANIE!Stephen Smith#sst!Stephen Travis Pope#stp!Stephen Vincent Pair#svp!Steve Davies#sld!Steve Elkins#sge!Steve Fuller#snf!Steve Gilbert#slg!Steve Hunter#skh!Steve Knight#knighty!Steve Mccusker#smcc!Steve Messamore#slm!Steve Sanderson#sms!Steve Wart#swart!Steve Wessels#!Steven Darcy#SMD!Steven Greenberg#greenbes!Steven Rodriguez#optionshiftk!Steven Swerling#sps!Sudheendra Hangal#hangal!Sungjin Chun#chunsj!Suzuki Tetsuya#tetsuya!Syed Abid#taxman!Syed Masoodahmad#masden56!Sylvia Sharma#sharma!Symon Chalk#symonc!Takashi Yamamiya#tak!Tansel Ersavas#mte#MTE!Tarek Demiati#TD!Ted Bracht#TB#TB1!Ted Kaehler#tk!Terry Jenkins#TCJ!Thierry Reignier#TREG!Thijs Janssen#TJ!Thomas Bernitt#tber!Thomas Fröb#thf!Thomas Hemme#Namamazu!Thomas J Keller#TJK!Thomas Kowark#tk!Thomas M. Breuel#tmb!Thomas Mahler#ThMa!Thomas Stambaugh#tms!Thomas Zimmermann#TZ!Tim Cuthbertson#tec!Tim Felgentreff#tfel!Tim Lewis#TimLewis!Tim Olson#tao!Tim Rowledge#TPR#tpr!Timm Knape#tik!Timothy Falconer#teefal!Timothy M#tty!Timothy Retz#tgr!Tobias Isenberg#ti!Tobias Pape#topa!Todd Blanchard#tb!Tom Counsell#tamc!Tom Dailey#td!Tom Koenig#tlk!Tom Plick#tap!Tom Rushworth#tbr!Tommy Thorn#tt!Tomohiro Oda#TO!Tony Garnock-Jones#tonyg!Tony Zampogna#zamp!Torge Husfeldt#th!Torsten Bergmann#tbn#TBN!Torsten Sadowski#ts!Travis Kay#tkay#tlk!Trygve Reenskaug#TRee!Tyler Coumbes#mtc!Tzaddi Beltaine#tsb!Udo Schneider#udos!Vaidotas Didžbalis#vd!Vassili Bykov#vb!Vernon Marsden#vmars!Vijay Mathew Pandyalakal#vmp!Vladimir Janousek#vj!Volker Bäcker#volker!Wally Cash#wac!Walter Wilhelm#ww!Ward Cunningham#ward!Wayne Braun#wb!Wayne D. Elias#wdelias!Webb Mcdonald#wxm!Wilkes Joiner#dwj!Willem van Asperen#wva!William Hess#WFH!William Hidden#whidden!Wolfgang Eder#edw!Wolfgang Helbig#whg!Woon Yeo#!Wuilmer Olaya Bardales#wob!Yagendra Dutt Tripathi#yd!Yang Ha Nguyen#yhm!Yann Monclair#YM!Yanni Chiu#yj!Yasuji Nakayama#yasuji!Yoshiki Ohshima#yo!Yuji Ichikawa#ich!Yunhee Lee#yhl!Yutaka Kamite#yk!Zdenek Novy#Zdenye#ZN!Zeljko Nesic#Poparasan!Zeynep Besen#zeyno' 12 July 2017 9:42:40.918319 am
VM: Mac OS - Smalltalk Image: Squeak6.0alpha [latest update: #17347]
SecurityManager state: Restricted: false FileAccess: true SocketAccess: true Working Dir /Users/eliot/Squeak/Squeak5.1 Trusted Dir /foobar/tooBar/forSqueak/bogus/ Untrusted Dir /Users/eliot/Library/Preferences/Squeak/Internet/My Squeak/
UTF8TextConverter class>>errorMalformedInput: Receiver: UTF8TextConverter Arguments and temporary variables: aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc... Receiver's instance variables: superclass: TextConverter methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc... format: 65538 instanceVariables: nil organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc... subclasses: nil name: #UTF8TextConverter classPool: a Dictionary(#StrictUtf8Conversions->nil ) sharedPools: nil environment: Smalltalk category: #'Multilingual-TextConversion' latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter class>>decodeByteString: Receiver: UTF8TextConverter Arguments and temporary variables: aByteString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#M...etc... outStream: a WriteStream lastIndex: 1 nextIndex: 1 byte1: 169 byte2: nil byte3: nil byte4: nil unicode: nil Receiver's instance variables: superclass: TextConverter methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc... format: 65538 instanceVariables: nil organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc... subclasses: nil name: #UTF8TextConverter classPool: a Dictionary(#StrictUtf8Conversions->nil ) sharedPools: nil environment: Smalltalk category: #'Multilingual-TextConversion' latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter>>decodeString: Receiver: an UTF8TextConverter Arguments and temporary variables: aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc... result: nil Receiver's instance variables: latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter>>nextChunkFromStream: Receiver: an UTF8TextConverter Arguments and temporary variables: input: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.ch...etc... Receiver's instance variables: latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
MultiByteFileStream>>nextChunk Receiver: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes' Arguments and temporary variables:
Receiver's instance variables:
ChangeList class>>browseRecentLogOn: Receiver: ChangeList Arguments and temporary variables: origChangesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6proj...etc... end: 13286751 done: false block: 7195999 pos: 7198297 changesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectL...etc... position: nil prevBlock: 7197023 chunk: #('privateAuthorsRaw
^ ''Aaron Reichow#ajr!Abigail Sanchez#as!Adam Eng...etc... Receiver's instance variables: superclass: CodeHolder methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc... format: 65548 instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc... organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc... subclasses: {ChangeListForProjects . VersionsBrowser} name: #ChangeList classPool: nil sharedPools: nil environment: nil category: #'Tools-Changes'
ChangeList class>>browseRecentLogOnPath: Receiver: ChangeList Arguments and temporary variables: fullName: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes' Receiver's instance variables: superclass: CodeHolder methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc... format: 65548 instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc... organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc... subclasses: {ChangeListForProjects . VersionsBrowser} name: #ChangeList classPool: nil sharedPools: nil environment: nil category: #'Tools-Changes' _,,,^..^,,,_ best, Eliot
Hi Eliot,
I started looking into this. So far I could not manage to reproduce this locally using a new trunk image and using a trunk image from May and updating it. So far this looks like a mixture of a double encoding and a wrong decoding issue. The character sequence 'ä' further down (in Volker Bäcker) would be ä when interpreted as UTF-8 which in turn when interpreted as UTF-8 is ä, which would be expected in the string. To get to 'ä' though would require to interpret the ä in UTF-8 as CP1252 and then encode it again in UTF-8 and decode it once again using CP1252.
Sanity check before I continue: Does the source code in the method look right in that image?
(I hope all these weird characters will come through to you :) )
Bests Patrick
________________________________ From: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org on behalf of Eliot Miranda eliot.miranda@gmail.com Sent: Wednesday, July 12, 2017 18:51 To: The general-purpose Squeak developers list Subject: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
Hi All,
I had reason to condense changes and then was curious to look for older versions. But when I came to open a changes browser on the newly condensed changes file the UTF-8 decoder failed to parse the source for SystemNavigation class>>privateAuthorsRaw. Something breaks the string at the e acute in Stéphane, and then the decoder gets hopelessly confused.
To reproduce: In a trunk 6.x image do Smalltalk condenseChanges then open a file list, select the changes file, and then click the recent changes button.
here's the SqueakDebug.log:
InvalidUTF8: Invalid utf8: ©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTLYSTEPHANIE!Stephen Smith#sst!Stephen Travis Pope#stp!Stephen Vincent Pair#svp!Steve Davies#sld!Steve Elkins#sge!Steve Fuller#snf!Steve Gilbert#slg!Steve Hunter#skh!Steve Knight#knighty!Steve Mccusker#smcc!Steve Messamore#slm!Steve Sanderson#sms!Steve Wart#swart!Steve Wessels#!Steven Darcy#SMD!Steven Greenberg#greenbes!Steven Rodriguez#optionshiftk!Steven Swerling#sps!Sudheendra Hangal#hangal!Sungjin Chun#chunsj!Suzuki Tetsuya#tetsuya!Syed Abid#taxman!Syed Masoodahmad#masden56!Sylvia Sharma#sharma!Symon Chalk#symonc!Takashi Yamamiya#tak!Tansel Ersavas#mte#MTE!Tarek Demiati#TD!Ted Bracht#TB#TB1!Ted Kaehler#tk!Terry Jenkins#TCJ!Thierry Reignier#TREG!Thijs Janssen#TJ!Thomas Bernitt#tber!Thomas Fröb#thf!Thomas Hemme#Namamazu!Thomas J Keller#TJK!Thomas Kowark#tk!Thomas M. Breuel#tmb!Thomas Mahler#ThMa!Thomas Stambaugh#tms!Thomas Zimmermann#TZ!Tim Cuthbertson#tec!Tim Felgentreff#tfel!Tim Lewis#TimLewis!Tim Olson#tao!Tim Rowledge#TPR#tpr!Timm Knape#tik!Timothy Falconer#teefal!Timothy M#tty!Timothy Retz#tgr!Tobias Isenberg#ti!Tobias Pape#topa!Todd Blanchard#tb!Tom Counsell#tamc!Tom Dailey#td!Tom Koenig#tlk!Tom Plick#tap!Tom Rushworth#tbr!Tommy Thorn#tt!Tomohiro Oda#TO!Tony Garnock-Jones#tonyg!Tony Zampogna#zamp!Torge Husfeldt#th!Torsten Bergmann#tbn#TBN!Torsten Sadowski#ts!Travis Kay#tkay#tlk!Trygve Reenskaug#TRee!Tyler Coumbes#mtc!Tzaddi Beltaine#tsb!Udo Schneider#udos!Vaidotas Didžbalis#vd!Vassili Bykov#vb!Vernon Marsden#vmars!Vijay Mathew Pandyalakal#vmp!Vladimir Janousek#vj!Volker Bäcker#volker!Wally Cash#wac!Walter Wilhelm#ww!Ward Cunningham#ward!Wayne Braun#wb!Wayne D. Elias#wdelias!Webb Mcdonald#wxm!Wilkes Joiner#dwj!Willem van Asperen#wva!William Hess#WFH!William Hidden#whidden!Wolfgang Eder#edw!Wolfgang Helbig#whg!Woon Yeo#!Wuilmer Olaya Bardales#wob!Yagendra Dutt Tripathi#yd!Yang Ha Nguyen#yhm!Yann Monclair#YM!Yanni Chiu#yj!Yasuji Nakayama#yasuji!Yoshiki Ohshima#yo!Yuji Ichikawa#ich!Yunhee Lee#yhl!Yutaka Kamite#yk!Zdenek Novy#Zdenye#ZN!Zeljko Nesic#Poparasan!Zeynep Besen#zeyno' 12 July 2017 9:42:40.918319 am
VM: Mac OS - Smalltalk Image: Squeak6.0alpha [latest update: #17347]
SecurityManager state: Restricted: false FileAccess: true SocketAccess: true Working Dir /Users/eliot/Squeak/Squeak5.1 Trusted Dir /foobar/tooBar/forSqueak/bogus/ Untrusted Dir /Users/eliot/Library/Preferences/Squeak/Internet/My Squeak/
UTF8TextConverter class>>errorMalformedInput: Receiver: UTF8TextConverter Arguments and temporary variables: aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc... Receiver's instance variables: superclass: TextConverter methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc... format: 65538 instanceVariables: nil organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc... subclasses: nil name: #UTF8TextConverter classPool: a Dictionary(#StrictUtf8Conversions->nil ) sharedPools: nil environment: Smalltalk category: #'Multilingual-TextConversion' latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter class>>decodeByteString: Receiver: UTF8TextConverter Arguments and temporary variables: aByteString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#M...etc... outStream: a WriteStream lastIndex: 1 nextIndex: 1 byte1: 169 byte2: nil byte3: nil byte4: nil unicode: nil Receiver's instance variables: superclass: TextConverter methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc... format: 65538 instanceVariables: nil organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc... subclasses: nil name: #UTF8TextConverter classPool: a Dictionary(#StrictUtf8Conversions->nil ) sharedPools: nil environment: Smalltalk category: #'Multilingual-TextConversion' latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter>>decodeString: Receiver: an UTF8TextConverter Arguments and temporary variables: aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc... result: nil Receiver's instance variables: latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
UTF8TextConverter>>nextChunkFromStream: Receiver: an UTF8TextConverter Arguments and temporary variables: input: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.ch...etc... Receiver's instance variables: latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc... latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...
MultiByteFileStream>>nextChunk Receiver: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes' Arguments and temporary variables:
Receiver's instance variables:
ChangeList class>>browseRecentLogOn: Receiver: ChangeList Arguments and temporary variables: origChangesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6proj...etc... end: 13286751 done: false block: 7195999 pos: 7198297 changesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectL...etc... position: nil prevBlock: 7197023 chunk: #('privateAuthorsRaw
^ ''Aaron Reichow#ajr!Abigail Sanchez#as!Adam Eng...etc... Receiver's instance variables: superclass: CodeHolder methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc... format: 65548 instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc... organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc... subclasses: {ChangeListForProjects . VersionsBrowser} name: #ChangeList classPool: nil sharedPools: nil environment: nil category: #'Tools-Changes'
ChangeList class>>browseRecentLogOnPath: Receiver: ChangeList Arguments and temporary variables: fullName: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes' Receiver's instance variables: superclass: CodeHolder methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc... format: 65548 instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc... organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc... subclasses: {ChangeListForProjects . VersionsBrowser} name: #ChangeList classPool: nil sharedPools: nil environment: nil category: #'Tools-Changes' _,,,^..^,,,_ best, Eliot
Well as feared it did not come through. Let me try this again: The string 'Ãf¤' would be 'Ãf'
when interpreted as bytes which encode UTF-8. In turn 'Ãf' as bytes encoding UTF-8 is 'ä' which
is what we actually want. The rest is as described below.
---
Hi Eliot, I started looking into this. So far I could not manage to reproduce this locally using a new trunk image and using a trunk image from May and updating it. So far this looks like a mixture of a double encoding and a wrong decoding issue. The character sequence 'ÃfÆ'Ã'¤' further down (in Volker BÃfÆ'Ã'¤cker) would be Ãf¤ when interpreted as UTF-8 which in turn when interpreted as UTF-8 is ä, which would be expected in the string. To get to 'ÃfÆ'Ã'¤' though would require to interpret the ä in UTF-8 as CP1252 and then encode it again in UTF-8 and decode it once again using CP1252. Sanity check before I continue: Does the source code in the method look right in that image? (I hope all these weird characters will come through to you :) ) Bests Patrick
On Wed, Jul 19, 2017 at 2:22 PM, Rein, Patrick Patrick.Rein@hpi.de wrote:
Well as feared it did not come through. Let me try this again: The string ' ä' would be 'Ã'
when interpreted as bytes which encode UTF-8. In turn 'Ã' as bytes encoding UTF-8 is 'ä' which
is what we actually want. The rest is as described below.
In my image (updated from some trunk version) the method looks fine. As for the weird encodings, I think you mean:
'ä' squeakToUtf8 => 'ä' 'ä' squeakToUtf8 asByteArray #[195 164]
'ä' utf8ToSqueak 'ä'
#[195 164] asString utf8ToSqueak => 'ä'
I assume this is a copy-paste error? E.g. I cannot copy+paste the result of
'ä' squeakToUtf8 squeakToUtf8
- Bert -
I meant that this:
'ä' squeakToUtf8 squeakToUtf8 asByteArray => #[195 131 194 164]
are the characters which are printed instead of 'ä' in the debug output.
I will look into this tomorrow again. I have not yet investigated the concrete trace to the ChangeList coming from the FileList (so far I have directly opened a ChangeList).
________________________________ From: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org on behalf of Bert Freudenberg bert@freudenbergs.de Sent: Wednesday, July 19, 2017 15:15 To: The general-purpose Squeak developers list Subject: Re: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
On Wed, Jul 19, 2017 at 2:22 PM, Rein, Patrick <Patrick.Rein@hpi.demailto:Patrick.Rein@hpi.de> wrote:
Well as feared it did not come through. Let me try this again: The string 'Ãf¤' would be 'Ãf'
when interpreted as bytes which encode UTF-8. In turn 'Ãf' as bytes encoding UTF-8 is 'ä' which
is what we actually want. The rest is as described below.
?In my image (updated from some trunk version) the method looks fine. As for the weird encodings, I think you mean:
'ä' squeakToUtf8 => 'ä' ? 'ä' squeakToUtf8 asByteArray #[195 164]
'ä' utf8ToSqueak 'ä'
#[195 164] asString utf8ToSqueak => 'ä'
I assume this is a copy-paste error? E.g. I cannot copy+paste the result of
'ä' squeakToUtf8 squeakToUtf8
- Bert -
I noticed that #setConverterForCode still rely on BOM, but my current .changes does not have a BOM... Note that there are not so many senders of #writeBOMOn: mainly those who want to fileOut a class/method/etc... So that explain that I do not have a BOM...
Though (SourceFiles at: 2) has a UTF8TextConverter... Why? That could be a direct send of #converter:, but I rather think that UTF8 is the default converter when we open the file. So things work only because we don't #setConverterForCode on the .changes nor .sources... Except that the path that you used does...
IMO, it's not related to condenseChanges, it should equally fail if you pretend you are Stéphane author, modify a method, and browse recent changes form file list...
2017-07-19 18:55 GMT+02:00 Rein, Patrick Patrick.Rein@hpi.de:
I meant that this:
'ä' squeakToUtf8 squeakToUtf8 asByteArray => #[195 131 194 164]
are the characters which are printed instead of 'ä' in the debug output.
I will look into this tomorrow again. I have not yet investigated the concrete trace to the ChangeList coming from the FileList (so far I have directly opened a ChangeList).
*From:* Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org on behalf of Bert Freudenberg bert@freudenbergs.de *Sent:* Wednesday, July 19, 2017 15:15 *To:* The general-purpose Squeak developers list *Subject:* Re: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
On Wed, Jul 19, 2017 at 2:22 PM, Rein, Patrick Patrick.Rein@hpi.de wrote:
Well as feared it did not come through. Let me try this again: The string 'ä' would be 'Ã'
when interpreted as bytes which encode UTF-8. In turn 'Ã' as bytes encoding UTF-8 is 'ä' which
is what we actually want. The rest is as described below.
In my image (updated from some trunk version) the method looks fine. As for the weird encodings, I think you mean:
'ä' squeakToUtf8 => 'ä' 'ä' squeakToUtf8 asByteArray #[195 164]
'ä' utf8ToSqueak 'ä'
#[195 164] asString utf8ToSqueak => 'ä'
I assume this is a copy-paste error? E.g. I cannot copy+paste the result of
'ä' squeakToUtf8 squeakToUtf8
- Bert -
squeak-dev@lists.squeakfoundation.org