[Pharo-dev] [squeak-dev] The .changes file should be bound to a
single image
Eliot Miranda
eliot.miranda at gmail.com
Fri Jul 1 07:33:56 UTC 2016
Ben,
> On Jun 29, 2016, at 9:48 PM, Ben Coman <btc at openinworld.com> wrote:
>
>> On Thu, Jun 30, 2016 at 7:07 AM, David T. Lewis <lewis at mail.msen.com> wrote:
>>> On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote:
>>> Let's not solve the wrong problem folks. I only looked at this for 10
>>> minutes this morning, and I think (but I am not sure) that the issue
>>> affects the case of saving the image, and that the normal writing of
>>> changes is fine.
>>
>> I am wrong.
>>
>> I spent some more time with this, and it is clear that two images saving
>> chunks to the same changes file will result in corrupted change records
>> in the changes file. It is not just an issue related to the image save
>> as I suggested above.
>>
>> In practice, this is not an issue that either Chris or I have noticed,
>> probably because we are not doing software development (saving method
>> changes) at the same time that we are running RemoteTask and similar.
>> But I can certainly see how it might be a problem if, for example, I
>> had a bunch of students running the same image from a network shared
>> folder.
>
> Maybe its time to consider a fundamental change in how method-sources
> are referred to.
The changes file us not merely the repository for sources on newly minted methods. It is also a log file, a crash recovery mechanism. It is simple. It works. You propose something horribly complex to solve a problem that a) died t affect very many people, b) is easy to work around and c) feasible to fix with a well-known approach. If doesn't wash for me.
> Taking inspiration from git... A content addressable key-value file
> store might solve concurrent access. Each CompiledMethod gets written
> to a file named for the hash of its contents, which is the only
> reference the Image getsto a method's source. Each such file would
> *only* need be written once and thereafter could be read
> simultaneously by multiple Images. Anyone on the network wanting
> store the same source would see the file already exists and have
> nothing to do.
> Perhaps having many individual files implies abysmal performance,
>
> Or maybe something similar to Mecurial's reflog format [1] could be
> used, one file per class.
>
> The thing about the Image *only* referring to a method's source by its
> content hash would seem to great flexibility in backends to
> locate/store that source. Possibly...
> * stored as individual files as above
> * bundled in a zip file in random order
> * a school could configure a database server in Image provided to students
> * hashes could be thrown at a service on the Internet
> * cached locally with a key-value database like LMDB [2]
> * remote replication to multiple internet backup locations
> * in an emergency you could throw bundle of hashes as a query to the
> mail list and get an adhoc response of individual files.
> * Inter-Smalltalk image communication
>
> Pharo has a stated goal to get rid of the changes file. Changing to
> content-hash-addressable method-source seems a logicial step along
> that road. Even if the Squeak community doesn't want to go so far as
> eliminating the .changes file, can they see value in changing method
> source references to be content-hashes rather than indexes into a
> particular file?
>
> [1] http://blog.prasoonshukla.com/mercurial-vs-git-scaling
> [2] https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database
>
>
> Just having a poke at this, it seems a new form of
> CompiledMethodTrailer may need to be defined, being invoked from
> CompiledMethod>>sourceCode. CompiledMethodTrailer>>sourceCode would
> find the source code based on a content-hash held by the
> CompiledMethod. If found, the call to #getSourceFromFile that
> accesses the .changes file will be bypassed, and could remain as a
> backup.
>
> cheers -ben
>
>>
>> Dave
>>
>>
>>>
>>> Max was running on Pharo, which may or may not be handling changes the
>>> same way. I think he may be seeing a different problem from the one I
>>> confirmed.
>>>
>>> So a bit more testing and verification would be in order. I can't look at
>>> it now though.
>>>
>>> Dave
>>>
>>>>
>>>>> On 29-06-2016, at 10:35 AM, Eliot Miranda <eliot.miranda at gmail.com>
>>>>> wrote:
>>>> {snip much rant}
>>>>
>>>>> The most obvious place where this is an issue is where two images are
>>>>> using the same changes file and think they???re appending. Image A seeks
>>>>> to the end of the file, ???writes??? stuff. Image B near-simultaneously
>>>>> does the same. Eventually each process gets around to pushing data to
>>>>> hardware. Oops! And let???s not dwell too much on the problems possible
>>>>> if either process causes a truncation of the file. Oh, wait, I think we
>>>>> actually had a problem with that some years ago.
>>>>>
>>>>> The thing is that this problem bites even if we have a unitary primitive
>>>>> that both positions and writes if that primitive is written above a
>>>>> substrate that, as unix and stdio streams do, separates positioning from
>>>>> writing. The primitive is neat but it simply drives the problem further
>>>>> underground.
>>>>
>>>>
>>>> Oh absolutely - we only have real control over a small part of it. It
>>>> would probably be worth making use of that where we can.
>>>>
>>>>>
>>>>> A more robust solution might be to position, write, reposition, read,
>>>>> and compare, shortening on corruption, and retrying, using exponential
>>>>> back-off like ethernet packet transmission. Most of the time this adds
>>>>> only the overhead of reading what's written.
>>>>
>>>> Yes, for anything we want reliable that???s probably a good way. A limit
>>>> on the number of retries would probably be smart to stop infinite
>>>> recursion. Imagine the fun of an error causing infinite retries of writing
>>>> an error log about an infinite recursion. On an infinitely large Beowulf
>>>> cluster!
>>>>
>>>> It???s all yet another example of where software meeting reality leads to
>>>> nightmares.
>>>>
>>>>
>>>> tim
>>>> --
>>>> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
>>>> If it was easy, the hardware people would take care of it.
>
More information about the Squeak-dev
mailing list
|