[squeak-dev] Why not just get rid of the sources file entirely (was: The Inbox: System-dtl.1277.mcz)

David T. Lewis lewis at mail.msen.com
Mon Jan 31 00:17:48 UTC 2022


Hi Christoph,

On Sun, Jan 30, 2022 at 08:33:53PM +0100, christoph.thiede at student.hpi.uni-potsdam.de wrote:
> > In the mean time, if you move the FileStream>>setConverterForCode method up the hierarchy to ReadWriteStream>>setConverterForCode, it will work around the problem for now.
> 
> Thank you, now it works!
> 
> What exact accesses to the source files are changed by your patch? When I try to access the comment of a class, there is still an I/O attempt.
> 

When you bring the sources file into the image by activating the
"Cache sources file" preference, you are avoiding everything that
requires access to the actual SqueakV50.sources file on your file
system (or the equivalent .stc compressed sources file).

This will include all open and close operations on the file, as well
as all references to sources that are stored in that file. It also
avoids the search through a list of directories to locate a sources
file to open.

The source code for many methods (especially anything that has been
updated recently either in trunk or in your own work) are stored in
the changes file. These are not affected by the "Cache sources file"
preference. And all open and close operation on your changes file
are not affected by the preference.

Every compiled method has a source pointer that encodes the information
needed to retrieve its source code from SourceFiles. Here is how you
can find compiled methods with source code that is currently stored in
the sources file, as opposed to the changes file:

  CompiledMethod allInstances select: [:e |
      1 = (SourceFiles fileIndexFromSourcePointer: e sourcePointer)].

So these would be all of the methods that require reading from the
sources file in order to obtain their source code.

> > Still I am quite interested to know if hiding the sources file in the image this way will cause your virus scanner to start being nicer to you :-)
> 
> I have still no evidence that this is a problem with antivirus software. The only way to find out is to monitor this for some time as the slowdowns occur sporadically only. If you declare your patch as sufficiently robust, I could load in my production image for some weeks. :-)
> 

If it is a production image, then you should *not* apply any of my
patches. They are intended only as proof of concept, and I do not
expect to recommend anything for trunk until after the next release.

However, if you have an image that is flexible for experimentation,
and if you do not mind reverting the experimental changes later, then
yes for sure you should try running it with the "Cache sources file"
preference activated and see if it has a positive effect.

Given that we have only two entries in the SourceFiles array, I would
expect about a 50% chance of seeing a significant improvement if virus
scanning over one of those two files is the source of the problem.
And I expect a 100% chance that virus scanning is indeed the root
of the problem. This gives you a very good overall probability of
making things better ;-)

Dave


> Best,
> Christoph
> 
> ---
> Sent from Squeak Inbox Talk
> 
> On 2022-01-29T19:35:35-05:00, lewis at mail.msen.com wrote:
> 
> > Hi Christoph,
> > 
> > Confirmed, I see the same thing when I test in a freshly downloaded
> > image so I must have missed something. I cannot explain why I did
> > not run into this in previous testing, so I will not clutter the
> > inbox with a fix until I can take a better look at it.
> > 
> > In the mean time, if you move the FileStream>>setConverterForCode
> > method up the hierarchy to ReadWriteStream>>setConverterForCode, it
> > will work around the problem for now.
> > 
> > Still I am quite interested to know if hiding the sources file in
> > the image this way will cause your virus scanner to start being
> > nicer to you :-)
> > 
> > Dave
> > 
> > On Sun, Jan 30, 2022 at 12:09:46AM +0100, christoph.thiede at student.hpi.uni-potsdam.de wrote:
> > > Hi Dave,
> > > 
> > > > If anyone has tried loading DTL-internal-sources-dtl.4.mcz from the inbox, I would be interested to hear what you think of the idea after trying it in your image. But please respond only if you have actually tried it :-)
> > > 
> > > I wanted to try it, but you already moved DTL-internal-sources-dtl.4.mcz to the treated inbox, so I loaded DTL-internal-sources-dtl.11 instead and enabled the new preference. After loading for some time, it said: MessageNotUnderstood: CompressedSources>>setConverterForCode
> > > 
> > > 30 January 2022 12:07:22.939085 am
> > > 
> > > VM: Win32 - Smalltalk
> > > Image: Squeak6.0alpha [latest update: #21121]
> > > 
> > > SecurityManager state:
> > > Restricted: false
> > > FileAccess: true
> > > SocketAccess: true
> > > Working Dir C:\Users\Christoph\OneDrive\Dokumente\Squeak
> > > Trusted Dir C:\Users\Christoph\OneDrive\Dokumente\Squeak\Christoph
> > > Untrusted Dir C:\Users\Christoph\OneDrive\Dokumente\My Squeak
> > > 
> > > CompressedSources(Object)>>doesNotUnderstand: #setConverterForCode
> > > ????Receiver: a CompressedSources
> > > ????Arguments and temporary variables: 
> > > ????????t1: ????setConverterForCode
> > > ????????t2: ????MessageNotUnderstood: CompressedSources>>setConverterForCode
> > > ????????t3: ????nil
> > > ????Receiver's instance variables: 
> > > ????????collection: ????'''From Squeak5.0 of 20 July 2015 [latest update: #15110] on 20 July 2015 at 4:13:52 pm'...etc...
> > > ????????position: ????0
> > > ????????readLimit: ????65536
> > > ????????writeLimit: ????65536
> > > ????????initialPositionOrNil: ????nil
> > > ????????segmentFile: ????a ReadWriteStream
> > > ????????segmentSize: ????65536
> > > ????????nSegments: ????538
> > > ????????segmentTable: ????#(2168 13371 25000 37251 49586 62109 73771 87155 99139 109212 120...etc...
> > > ????????segmentIndex: ????1
> > > ????????dirty: ????false
> > > ????????endOfFile: ????35184983
> > > 
> > > FileDirectory class>>openSources:andChanges:forImage:
> > > ????Receiver: FileDirectory
> > > ????Arguments and temporary variables: 
> > > ????????t1: ????'C:\Program Files (x86)\Squeak\SqueakV50.sources'
> > > ????????t2: ????'C:\Users\Christoph\OneDrive\Dokumente\Squeak\FreshTrunk.changes'
> > > ????????t3: ????'C:\Users\Christoph\OneDrive\Dokumente\Squeak\FreshTrunk.image'
> > > ????????t4: ????a CompressedSources
> > > ????????t5: ????nil
> > > ????????t6: ????'Squeak cannot locate &fileRef.
> > > 
> > > Please check that the file is named properly and is in the
> > > same directory as this image....etc...
> > > ????????t7: ????'Squeak cannot write to &fileRef.
> > > 
> > > Please check that you have write permission for this file.
> > > 
> > > You won'...etc...
> > > ????Receiver's instance variables: 
> > > ????????superclass: ????Object
> > > ????????methodDict: ????a MethodDictionary(size 125)
> > > ????????format: ????65537
> > > ????????instanceVariables: ????#('pathName')
> > > ????????organization: ????('enumeration' containingDirectory directoryEntries directoryEntry...etc...
> > > ????????subclasses: ????{UnixFileDirectory . AcornFileDirectory . MacFileDirectory . DosFileDirectory...etc...
> > > ????????name: ????#FileDirectory
> > > ????????classPool: ????a Dictionary(#DefaultDirectory->DosFileDirectory on 'C:\Users\Christoph\OneDrive\Dokumente\Squeak...etc...
> > > ????????sharedPools: ????nil
> > > ????????environment: ????Smalltalk
> > > ????????category: ????#'Files-Directories'
> > > 
> > > SmalltalkImage>>openSourceFiles
> > > ????Receiver: Smalltalk
> > > ????Arguments and temporary variables: 
> > > 
> > > ????Receiver's instance variables: 
> > > ????????globals: ????Smalltalk
> > > 
> > > CompressedSources class>>internalizeSources:
> > > ????Receiver: CompressedSources
> > > ????Arguments and temporary variables: 
> > > ????????t1: ????true
> > > ????????t2: ????'No external SqueakV50 sources file found'
> > > ????Receiver's instance variables: 
> > > ????????superclass: ????CompressedSourceStream
> > > ????????methodDict: ????a MethodDictionary(#asCompressedSources->(CompressedSources>>#asCom...etc...
> > > ????????format: ????65548
> > > ????????instanceVariables: ????nil
> > > ????????organization: ????('converting' asCompressedSources)
> > > ('file open/close' readOnlyCopy...etc...
> > > ????????subclasses: ????nil
> > > ????????name: ????#CompressedSources
> > > ????????classPool: ????a Dictionary(#CachedSources->a CompressedSources )
> > > ????????sharedPools: ????nil
> > > ????????environment: ????Smalltalk
> > > ????????category: ????#'DTL-internal-sources'
> > > <clipped>
> > > 
> > > 
> > > --- The full stack ---
> > > CompressedSources(Object)>>doesNotUnderstand: #setConverterForCode
> > > FileDirectory class>>openSources:andChanges:forImage:
> > > SmalltalkImage>>openSourceFiles
> > > CompressedSources class>>internalizeSources:
> > > [] in PragmaPreference>>rawValue:
> > > <clipped>
> > > 
> > > Did I load the right version, and do you have any idea how to solve this? :-)
> > > 
> > > Best,
> > > Christoph
> > > 
> > > ---
> > > Sent from Squeak Inbox Talk
> > > 
> > > On 2022-01-20T21:50:45-05:00, lewis at mail.msen.com wrote:
> > > 
> > > > On Thu, Jan 20, 2022 at 11:43:08AM -0800, Eliot Miranda wrote:
> > > > > Hi All,
> > > > > 
> > > > > > On Jan 19, 2022, at 12:38 AM, Marcel Taeumel <marcel.taeumel at hpi.de> wrote:
> > > > > > 
> > > > > > ???
> > > > > > Hi Dave --
> > > > > > 
> > > > > > > [...] which point the sources file no longer lives on rotating media
> > > > > > 
> > > > > > Quick comment on this one. :-) Many computers use SSD/Flash storage these days and I am pretty sure that modern operating systems have their tricks with caching bigger files even further without the application ever noticing. However, considering external file scanners scanning for viruses, yes, it can be beneficial to avoid an extra use of some OS file API.
> > > > > > 
> > > > > 
> > > > > Again our performance issues accessing sources on windows are much more likely to be rooted in the absurdity of opening a file for every file access, using the ugly CurrentReadOnlySources nonsense to compensate.  If we were to maintain the in-image source files correctly our performance would improve markedly.
> > > > > 
> > > > > The issue with source files is fundamentally to do with providing a way for the debugger to access source through different files while debugging source file access.  A substituteSourceFilesDuring: aBlock protocol would work, be infinitely preferable than cacheDuring:.  Why are we still unable to do something constructive here?
> > > > 
> > > > Hi Eliot,
> > > > 
> > > > I opened this thread to discuss something else. If anyone has
> > > > tried loading DTL-internal-sources-dtl.4.mcz from the inbox, I
> > > > would be interested to hear what you think of the idea after
> > > > trying it in your image. But please respond only if you have
> > > > actually tried it :-)
> > > > 
> > > > To the question "Why are we still unable to do something
> > > > constructive here?", I believe you are referring to the discussion
> > > > that was taking place in the "Proper use of SourceFiles and
> > > > "CurrentReadOnlySourceFiles cacheDuring:"" thread. Levente
> > > > advocated using process local variables, and you advocated
> > > > the protocol that you mention above.
> > > > 
> > > > Levente: http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-January/218110.html
> > > > 
> > > > Eliot: http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-January/218111.html
> > > > 
> > > > I don't know which approach would be better. As far as I know
> > > > neither proposal has been implemented.
> > > > 
> > > > Dave
> > > > 
> > > > 
> > >
> > 
> > 
> 



More information about the Squeak-dev mailing list