[Pharo-dev] [squeak-dev] The .changes file should be bound to a single image

Fri Jul 1 06:06:11 UTC 2016

On Fri, Jul 1, 2016 at 4:10 AM, Chris Muller <asqueaker at gmail.com> wrote:
>>> In practice, this is not an issue that either Chris or I have noticed,
>>> probably because we are not doing software development (saving method
>>> changes) at the same time that we are running RemoteTask and similar.
>>> But I can certainly see how it might be a problem if, for example, I
>>> had a bunch of students running the same image from a network shared
>>> folder.
>>
>> Maybe its time to consider a fundamental change in how method-sources
>> are referred to.
>> Taking inspiration from git... A content addressable key-value file
>> store might solve concurrent access.  Each CompiledMethod gets written
>> to a file named for the hash of its contents, which is the only
>> reference the Image gets to a method's source.  Each such file would
>
> It sounds like a lot of files.. so how would I move an image to
> another computer?  I gotta know which files go with which image?

Yes, that would be a sticking point. You couldn't just grab any saved
Image file off disk. The image would first need to generate an archive
transfer file.  Except if these methods were automatically pushed
through to a private web service, then presuming pervasive web access
you, that sleeping Image would pull down its sources where ever it
boots back up (which even if that would be cool, is not the problem of
the original post.)

>
> Plus, it doesn't really solve the fundamental problem of two images
> writing to the same file.  Multiple images could still change the same
> method to the same contents at the same time.

The hash-named-file would never be written to twice.  Its a fixed
point in space-time ;)
A second image with the same hash would write the *same* contents, so
there is no need to write.
If the hash-named-file exists, do nothing.  To handle any race
condition between checking file existence and writing to it, the first
image could take an exclusive write lock.

> You may have made the
> problem less-likely, except for when you have your first
> hash-collision of *different* sources (it COULD happen),

Some equivalent things...

* Pick a random atom from the volume of the moon, then another random
pick gets the same atom.
  http://stackoverflow.com/a/23253149

* Win the national lottery 11 times in a row
  http://stackoverflow.com/a/29146396

* Your chances of winning the Powerball lottery are far better than
finding a hash collision. After all, lotteries often have actual
winners. The probability of a hash collision is more like a lottery
that has been running since prehistoric times and has never had a
winner and will probably not have a winner for billions of years.
  http://ericsink.com/vcbe/html/cryptographic_hashes.html

> in which case it wouldn't even require the changes to occur at the same time.

When the second Image finds the hash-named-file already exists,
it could check the contents and flag an error if they don't match,
so at least its not a silent error. The same when integrating
different repositories.

>
> I guess it would also lose the order-sequence of the change log too...
> unless you were to try to use the underlying filesystem's timestamps
> on each file but...  it wouldn't work after I've copied all the files
> via scp and because they all get new timestamps...

good point.  This would complicate changes-replay for a crashed image.
Although this case is only important "now" and could be handled by
"/tmp/${username}.${last-image-save-checkpoint-id}" file that records
the order of commits for a session, that would be checked for on Image
startup - which is similar to what you already suggested...

> Upon launching of the image, start a, temporary changes file,
> [image-name]-[some UUID].changes.
>
> Upon image save, append the temp changes file to the main changes
> file, but in an atomic way (first do the append as a new unique
> filename, then rename it to the original changes file name).
>

Good idea.  This would eliminate the need for my idea here.  You'd
need some way to match the UUID with the Image being opened, so I
guess the UUID would need to stored in the saved Image and be constant
for the session, and be updated each save of the Image.  The temporary
changes filename could include username to distinguish between users.
If the same user opens an Image twice, there would be two files and
upon recovering from a crash the user would be presented a choice
between the two files.

>
> Might be better to teach the class, who are learning about Smalltalk
> anyway, about the nature of the changes file..?

This seemed more of a classroom system administration issue.  Actually
in that case, maybe the network executable startup script just copied
both image and changes file to the user's personal area?

cheers -ben