[squeak-dev] Encoding issue with CompressedSourceFiles

David T. Lewis lewis at mail.msen.com
Thu May 5 21:55:55 UTC 2022

Hi Christoph,

On Wed, May 04, 2022 at 07:47:01PM +0000, Thiede, Christoph wrote:
> Hi all,
> while updating one of my images today, in which I have enabled the "cache source files" preference, I stumbled upon a scanner error from CompiledMethod>>#timeStamp because a method had an incomplete preamble like:
> HTTPClient class methodsFor: 'utilities' stamp: 'mir 2/2/2001
> Starting at some point in the mid of the update, this error occurs reproducibly (I discarded and re-applied the update a few times) for different methods, so somewhere there must be a wrong offset in the cached source files. If I turn off the preference before installing the updates, no errors are raised.
> Unfortunately, I do not really have an idea where to start searching for the cause of this error.
> But Marcel mentioned a potential hint recently [1]: CompressedSourceFiles does not have a converter. Could this lead to this error? Dave, Marcel, do you have any ideas where to start? :-)

I do not know, but I can mention one thing that I noticed.

Take a look at this method:

	self flag: #workAround. 	"all accessors should decode utf8"
	^super nextChunk utf8ToSqueak

Dan Ingalls wrote CompressedSourceStream nearly 20 years ago, and
amazingly it still works today. But back then, Squeak did not have
multibyte strings or character converters.

Vanessa added #nextChunk in 2010, and documented it as some sort
of workaround that was needed until we could get Squeak to use
utf8 more consistently.

When I was first experimenting with using CompressedSourceStream
to internalize the sourced file, I noticed that I sometimes saw
source code comments or time stamps incorrectly rendered in a
versions browser. In particular there were methods with Göran Hultgren's
name rendered incorrectly.

Unfortunately, I have not been able to reproduce that problem (sorry,
quite frustrating), so I never figured out the root cause. But
what I did notice at that time is that I could remove the workaround
CompressedSourceStream>>nextChunk completely, and the versions
browser started displaying the old source code versions correctly.

I'm really sorry I can't provide a repeatable scenario, but maybe
this will give you some ideas. I think it likely that the method
CompressedSourceStream>>nextChunk can (and should) be removed
entirely, and that some other updates may be needed to handle
utf8 properly for sources in a CompressedSourceStream.

Hmm... now that I have written this down, I want to go back and
look at it again. I was testing the internalized sources in several
images. Maybe I saw the error in old method versions that were still
present in SqueakV46.sources but no longer in SqueakV50.sources.
That might account for why the problem seemed to go away for no
reason. I'll report back if I can find anything.

Meanwhile, if you (Christoph) have a repeatable error, try deleting
the CompressedSourceStream>>nextChunk method and see if it makes
a difference.


More information about the Squeak-dev mailing list