[Pharo-dev] [squeak-dev] The .changes file should be bound to a single image

Eliot Miranda eliot.miranda at gmail.com
Fri Jul 1 07:33:56 UTC 2016


Ben,

> On Jun 29, 2016, at 9:48 PM, Ben Coman <btc at openinworld.com> wrote:
> 
>> On Thu, Jun 30, 2016 at 7:07 AM, David T. Lewis <lewis at mail.msen.com> wrote:
>>> On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote:
>>> Let's not solve the wrong problem folks. I only looked at this for 10
>>> minutes this morning, and I think (but I am not sure) that the issue
>>> affects the case of saving the image, and that the normal writing of
>>> changes is fine.
>> 
>> I am wrong.
>> 
>> I spent some more time with this, and it is clear that two images saving
>> chunks to the same changes file will result in corrupted change records
>> in the changes file. It is not just an issue related to the image save
>> as I suggested above.
>> 
>> In practice, this is not an issue that either Chris or I have noticed,
>> probably because we are not doing software development (saving method
>> changes) at the same time that we are running RemoteTask and similar.
>> But I can certainly see how it might be a problem if, for example, I
>> had a bunch of students running the same image from a network shared
>> folder.
> 
> Maybe its time to consider a fundamental change in how method-sources
> are referred to.

The changes file us not merely the repository for sources on newly minted methods.  It is also a log file, a crash recovery mechanism.  It is simple.  It works.  You propose something horribly complex to solve a problem that a) died t affect very many people, b) is easy to work around and c) feasible to fix with a well-known approach.  If doesn't wash for me.

> Taking inspiration from git... A content addressable key-value file
> store might solve concurrent access.  Each CompiledMethod gets written
> to a file named for the hash of its contents, which is the only
> reference the Image getsto a method's source.  Each such file would
> *only* need be written once and thereafter could be read
> simultaneously by multiple Images.  Anyone on the network wanting
> store the same source would see the file already exists and have
> nothing to do.
> Perhaps having many individual files implies abysmal performance,
> 
> Or maybe something similar to Mecurial's reflog format [1] could be
> used, one file per class.
> 
> The thing about the Image *only* referring to a method's source by its
> content hash would seem to great flexibility in backends to
> locate/store that source.  Possibly...
> * stored as individual files as above
> * bundled in a zip file in random order
> * a school could configure a database server in Image provided to students
> * hashes could be thrown at a service on the Internet
> * cached locally with a key-value database like LMDB [2]
> * remote replication to multiple internet backup locations
> * in an emergency you could throw bundle of hashes as a query to the
> mail list and get an adhoc response of individual files.
> * Inter-Smalltalk image communication
> 
> Pharo has a stated goal to get rid of the changes file.  Changing to
> content-hash-addressable method-source seems a logicial step along
> that road. Even if the Squeak community doesn't want to go so far as
> eliminating the .changes file, can they see value in changing method
> source references to be content-hashes rather than indexes into a
> particular file?
> 
> [1] http://blog.prasoonshukla.com/mercurial-vs-git-scaling
> [2] https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database
> 
> 
> Just having a poke at this, it seems a new form of
> CompiledMethodTrailer may need to be defined, being invoked from
> CompiledMethod>>sourceCode.  CompiledMethodTrailer>>sourceCode would
> find the source code based on a content-hash held by the
> CompiledMethod.  If found, the call to #getSourceFromFile that
> accesses the .changes file will be bypassed, and could remain as a
> backup.
> 
> cheers -ben
> 
>> 
>> Dave
>> 
>> 
>>> 
>>> Max was running on Pharo, which may or may not be handling changes the
>>> same way. I think he may be seeing a different problem from the one I
>>> confirmed.
>>> 
>>> So a bit more testing and verification would be in order. I can't look at
>>> it now though.
>>> 
>>> Dave
>>> 
>>>> 
>>>>> On 29-06-2016, at 10:35 AM, Eliot Miranda <eliot.miranda at gmail.com>
>>>>> wrote:
>>>> {snip much rant}
>>>> 
>>>>> The most obvious place where this is an issue is where two images are
>>>>> using the same changes file and think they???re appending. Image A seeks
>>>>> to the end of the file, ???writes??? stuff. Image B near-simultaneously
>>>>> does the same. Eventually each process gets around to pushing data to
>>>>> hardware. Oops! And let???s not dwell too much on the problems possible
>>>>> if either process causes a truncation of the file. Oh, wait, I think we
>>>>> actually had a problem with that some years ago.
>>>>> 
>>>>> The thing is that this problem bites even if we have a unitary primitive
>>>>> that both positions and writes if that primitive is written above a
>>>>> substrate that, as unix and stdio streams do, separates positioning from
>>>>> writing.  The primitive is neat but it simply drives the problem further
>>>>> underground.
>>>> 
>>>> 
>>>> Oh absolutely - we only have real control over a small part of it. It
>>>> would probably be worth making use of that where we can.
>>>> 
>>>>> 
>>>>> A more robust solution might be to position, write, reposition, read,
>>>>> and compare, shortening on corruption, and retrying, using exponential
>>>>> back-off like ethernet packet transmission.  Most of the time this adds
>>>>> only the overhead of reading what's written.
>>>> 
>>>> Yes, for anything we want reliable that???s probably a good way. A limit
>>>> on the number of retries would probably be smart to stop infinite
>>>> recursion. Imagine the fun of an error causing infinite retries of writing
>>>> an error log about an infinite recursion. On an infinitely large Beowulf
>>>> cluster!
>>>> 
>>>> It???s all yet another example of where software meeting reality leads to
>>>> nightmares.
>>>> 
>>>> 
>>>> tim
>>>> --
>>>> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
>>>> If it was easy, the hardware people would take care of it.
> 


More information about the Squeak-dev mailing list