[squeak-dev] [BUG] Timestamps don't work for classes with special character names

Tobias Pape Das.Linux at gmx.de
Sat Dec 21 19:47:50 UTC 2019


> On 21.12.2019, at 20:23, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
> 
> Hi Tobias, thanks for the pointers!
> 
> > (CTTéstClass compiledMethodAt: #foo) preamble
> 
> Like you said:
> 
> 
> I made the following change:
> 
> This seems to fix the conversion issues.
> 
> Outputs are:
> 
> 
> The next problem is the trailing ! for the CTTéstClass preamble.
> Here, the integer returned by expandedSourceFileArray >> #filePositionFromSourcePointer: is too large by one.
> If have no idea where these constants come from, but as this is a constant method, I don't see how this calculation could be wrong.

Because of utf8. it counts raw bytes, but gets returned in count of unicode codepoints. hence + 1...

> 
> I also tried the following:
> 
> yielding correctly:

Seems lucky..

> 
> But that seems hacky again.
> 
> Looking forward to your reply!


Best regards
	-Tobias

PS: maybe copy the code instead of images? its easier to see things then, for me at least :)

> 
> Best,
> Christoph
> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de>
> Gesendet: Samstag, 21. Dezember 2019 19:22:38
> An: The general-purpose Squeak developers list
> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>  
> 
> > On 21.12.2019, at 19:11, Tobias Pape <Das.Linux at gmx.de> wrote:
> > 
> >> 
> >> On 21.12.2019, at 17:36, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
> >> 
> >> Hi Tobias,
> >> 
> >> what do you mean in detail?
> >> 
> >> If I create the class via System Browser and add the method, my change file ends with:
> >> 
> >> Object subclass: #CTTéstClass
> >> instanceVariableNames: ''
> >> classVariableNames: ''
> >> poolDictionaries: ''
> >> category: 'CT-Experiments'!
> >> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
> >> foo! !
> > 
> > 
> > Good. that was what I thought was important.
> > 
> > 
> >> 
> >> However, CompiledMethod >> #timeStamp returns ''.
> > 
> > What is the result of the following?
> > 
> >        (CTTéstClass compiledMethodAt: #foo) preamble
> > 
> > 
> >> 
> >> Here is a snapshot of the #timeStamp stackframe:
> >> 
> >> 
> >> 
> >> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
> > 
> > 
> > I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> > This is BAD.
> 
> Oh, and we were warned:
> 
> CompiledMethod
> getPreambleFrom: aFileStream at: endPosition
>         "This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."
> 
>         | chunkSize chunk |
>         chunkSize := 160 min: endPosition.
>         [
>                 | index |
>                 chunk := aFileStream
>                         position: (endPosition - chunkSize + 1 max: 0);
>                         basicNext: chunkSize.
>                 (index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
>                         ^chunk copyFrom: index + 1 to: chunk size ].
>                 chunkSize := chunkSize * 2.
>                 chunkSize <= endPosition ] whileTrue.
>         ^chunk
> 
> 
> I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.
> 
> Best regards    
>         -Tobias
> 
> > 
> > You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
> > 
> > But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
> > 
> > Hence stamp is nil.
> > 
> > A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
> > 
> > Best regards
> >        -Tobias
> > 
> > 
> >> 
> >> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
> >> 
> >> Which lead me to this:
> >> 
> >> Does not seem related, but still looks somehow wrong ^^
> >> 
> >> Best,
> >> Christoph
> >> 
> >> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de>
> >> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
> >> An: The general-purpose Squeak developers list
> >> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
> >> 
> >> 
> >>> On 21.12.2019, at 15:16, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
> >>> 
> >>> Hi all, found just another bug. If you get tired of them, just tell me :-)
> >>> 
> >>> Steps to reproduce:
> >>> Print it:
> >>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> >>> instanceVariableNames: ''
> >>> classVariableNames: ''
> >>> poolDictionaries: ''
> >>> category: 'CT-Experiments'.
> >>> class compile: 'foo ^ #foo'.
> >>> (class >> #foo) timeStamp
> >>> 
> >>> Expected output:
> >>> Something like 'ct 12/21/2019 15:13'.
> >>> 
> >>> Actual output:
> >>> ''.
> >>> 
> >>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
> >>> 
> >>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
> >>> 
> >>> Cause of infection not yet investigated.
> >> 
> >> Please look at your .changes file whether at some point \00 bytes appear.
> >> 
> >> Best regards
> >>        -Tobias
> 
> 
> 
> 




More information about the Squeak-dev mailing list