Re: [Pharo-dev] [squeak-dev] The .changes file should be bound to a single image

30 Jun 2016


      On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote:
...
Let's not solve the wrong problem folks. I only looked at this for 10
minutes this morning, and I think (but I am not sure) that the issue
affects the case of saving the image, and that the normal writing of
changes is fine.
I am wrong.
I spent some more time with this, and it is clear that two images saving
chunks to the same changes file will result in corrupted change records
in the changes file. It is not just an issue related to the image save
as I suggested above.
In practice, this is not an issue that either Chris or I have noticed,
probably because we are not doing software development (saving method
changes) at the same time that we are running RemoteTask and similar.
But I can certainly see how it might be a problem if, for example, I
had a bunch of students running the same image from a network shared
folder.
Dave
...
Max was running on Pharo, which may or may not be handling changes the
same way. I think he may be seeing a different problem from the one I
confirmed.
So a bit more testing and verification would be in order. I can't look at
it now though.
Dave
...
...
On 29-06-2016, at 10:35 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
{snip much rant}
...
The most obvious place where this is an issue is where two images are
using the same changes file and think they???re appending. Image A seeks
to the end of the file, ???writes??? stuff. Image B near-simultaneously
does the same. Eventually each process gets around to pushing data to
hardware. Oops! And let???s not dwell too much on the problems possible
if either process causes a truncation of the file. Oh, wait, I think we
actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive
that both positions and writes if that primitive is written above a
substrate that, as unix and stdio streams do, separates positioning from
writing.  The primitive is neat but it simply drives the problem further
underground.
Oh absolutely - we only have real control over a small part of it. It
would probably be worth making use of that where we can.
...
A more robust solution might be to position, write, reposition, read,
and compare, shortening on corruption, and retrying, using exponential
back-off like ethernet packet transmission.  Most of the time this adds
only the overhead of reading what's written.
Yes, for anything we want reliable that???s probably a good way. A limit
on the number of retries would probably be smart to stop infinite
recursion. Imagine the fun of an error causing infinite retries of writing
an error log about an infinite recursion. On an infinitely large Beowulf
cluster!
It???s all yet another example of where software meeting reality leads to
nightmares.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim
If it was easy, the hardware people would take care of it.