Hi Tobias, thanks for the pointers!


(CTTéstClass compiledMethodAt: #foo) preamble


Like you said:


I made the following change:

This seems to fix the conversion issues.

Outputs are:


The next problem is the trailing ! for the CTTéstClass preamble.
Here, the integer returned by expandedSourceFileArray >> #filePositionFromSourcePointer: is too large by one.
If have no idea where these constants come from, but as this is a constant method, I don't see how this calculation could be wrong.

I also tried the following:
yielding correctly:

But that seems hacky again.


Looking forward to your reply!


Best,

Christoph


Von: Squeak-dev <squeak-dev-bounces@lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux@gmx.de>
Gesendet: Samstag, 21. Dezember 2019 19:22:38
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
 

> On 21.12.2019, at 19:11, Tobias Pape <Das.Linux@gmx.de> wrote:
>
>>
>> On 21.12.2019, at 17:36, Thiede, Christoph <Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:
>>
>> Hi Tobias,
>>
>> what do you mean in detail?
>>
>> If I create the class via System Browser and add the method, my change file ends with:
>>
>> Object subclass: #CTTéstClass
>> instanceVariableNames: ''
>> classVariableNames: ''
>> poolDictionaries: ''
>> category: 'CT-Experiments'!
>> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
>> foo! !
>
>
> Good. that was what I thought was important.
>
>
>>
>> However, CompiledMethod >> #timeStamp returns ''.
>
> What is the result of the following?
>
>        (CTTéstClass compiledMethodAt: #foo) preamble
>
>
>>
>> Here is a snapshot of the #timeStamp stackframe:
>>
>>
>>
>> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
>
>
> I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> This is BAD.

Oh, and we were warned:

CompiledMethod
getPreambleFrom: aFileStream at: endPosition
        "This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."

        | chunkSize chunk |
        chunkSize := 160 min: endPosition.
        [
                | index |
                chunk := aFileStream
                        position: (endPosition - chunkSize + 1 max: 0);
                        basicNext: chunkSize.
                (index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
                        ^chunk copyFrom: index + 1 to: chunk size ].
                chunkSize := chunkSize * 2.
                chunkSize <= endPosition ] whileTrue.
        ^chunk


I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.

Best regards   
        -Tobias

>
> You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
>
> But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
>
> Hence stamp is nil.
>
> A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
>
> Best regards
>        -Tobias
>
>
>>
>> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
>>
>> Which lead me to this:
>>
>> Does not seem related, but still looks somehow wrong ^^
>>
>> Best,
>> Christoph
>>
>> Von: Squeak-dev <squeak-dev-bounces@lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux@gmx.de>
>> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
>> An: The general-purpose Squeak developers list
>> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>>
>>
>>> On 21.12.2019, at 15:16, Thiede, Christoph <Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:
>>>
>>> Hi all, found just another bug. If you get tired of them, just tell me :-)
>>>
>>> Steps to reproduce:
>>> Print it:
>>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
>>> instanceVariableNames: ''
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'CT-Experiments'.
>>> class compile: 'foo ^ #foo'.
>>> (class >> #foo) timeStamp
>>>
>>> Expected output:
>>> Something like 'ct 12/21/2019 15:13'.
>>>
>>> Actual output:
>>> ''.
>>>
>>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>>>
>>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>>>
>>> Cause of infection not yet investigated.
>>
>> Please look at your .changes file whether at some point \00 bytes appear.
>>
>> Best regards
>>        -Tobias