[squeak-dev] [BUG] Timestamps don't work for classes with special character names

Tobias Pape Das.Linux at gmx.de
Sat Dec 21 18:22:38 UTC 2019


> On 21.12.2019, at 19:11, Tobias Pape <Das.Linux at gmx.de> wrote:
> 
>> 
>> On 21.12.2019, at 17:36, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>> 
>> Hi Tobias,
>> 
>> what do you mean in detail?
>> 
>> If I create the class via System Browser and add the method, my change file ends with:
>> 
>> Object subclass: #CTTéstClass
>> instanceVariableNames: ''
>> classVariableNames: ''
>> poolDictionaries: ''
>> category: 'CT-Experiments'!
>> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
>> foo! !
> 
> 
> Good. that was what I thought was important.
> 
> 
>> 
>> However, CompiledMethod >> #timeStamp returns ''.
> 
> What is the result of the following?
> 
> 	(CTTéstClass compiledMethodAt: #foo) preamble
> 
> 
>> 
>> Here is a snapshot of the #timeStamp stackframe:
>> 
>> 
>> 
>> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
> 
> 
> I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> This is BAD.

Oh, and we were warned:

CompiledMethod
getPreambleFrom: aFileStream at: endPosition
	"This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."

	| chunkSize chunk |
	chunkSize := 160 min: endPosition.
	[
		| index |
		chunk := aFileStream
			position: (endPosition - chunkSize + 1 max: 0);
			basicNext: chunkSize.
		(index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
			^chunk copyFrom: index + 1 to: chunk size ].
		chunkSize := chunkSize * 2.
		chunkSize <= endPosition ] whileTrue.
	^chunk


I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.

Best regards	
	-Tobias

> 
> You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
> 
> But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
> 
> Hence stamp is nil.
> 
> A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
> 
> Best regards
> 	-Tobias
> 
> 
>> 
>> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
>> 
>> Which lead me to this:
>> 
>> Does not seem related, but still looks somehow wrong ^^
>> 
>> Best,
>> Christoph
>> 
>> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de>
>> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
>> An: The general-purpose Squeak developers list
>> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>> 
>> 
>>> On 21.12.2019, at 15:16, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>> 
>>> Hi all, found just another bug. If you get tired of them, just tell me :-)
>>> 
>>> Steps to reproduce:
>>> Print it:
>>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
>>> instanceVariableNames: ''
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'CT-Experiments'.
>>> class compile: 'foo ^ #foo'.
>>> (class >> #foo) timeStamp
>>> 
>>> Expected output:
>>> Something like 'ct 12/21/2019 15:13'.
>>> 
>>> Actual output:
>>> ''.
>>> 
>>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>>> 
>>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>>> 
>>> Cause of infection not yet investigated.
>> 
>> Please look at your .changes file whether at some point \00 bytes appear.
>> 
>> Best regards
>>        -Tobias




More information about the Squeak-dev mailing list