On Fri, Jul 1, 2016 at 4:10 AM, Chris Muller asqueaker@gmail.com wrote:
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to. Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image gets to a method's source. Each such file would
It sounds like a lot of files.. so how would I move an image to another computer? I gotta know which files go with which image?
Yes, that would be a sticking point. You couldn't just grab any saved Image file off disk. The image would first need to generate an archive transfer file. Except if these methods were automatically pushed through to a private web service, then presuming pervasive web access you, that sleeping Image would pull down its sources where ever it boots back up (which even if that would be cool, is not the problem of the original post.)
Plus, it doesn't really solve the fundamental problem of two images writing to the same file. Multiple images could still change the same method to the same contents at the same time.
The hash-named-file would never be written to twice. Its a fixed point in space-time ;) A second image with the same hash would write the *same* contents, so there is no need to write. If the hash-named-file exists, do nothing. To handle any race condition between checking file existence and writing to it, the first image could take an exclusive write lock.
You may have made the problem less-likely, except for when you have your first hash-collision of *different* sources (it COULD happen),
Some equivalent things...
* Pick a random atom from the volume of the moon, then another random pick gets the same atom. http://stackoverflow.com/a/23253149
* Win the national lottery 11 times in a row http://stackoverflow.com/a/29146396
* Your chances of winning the Powerball lottery are far better than finding a hash collision. After all, lotteries often have actual winners. The probability of a hash collision is more like a lottery that has been running since prehistoric times and has never had a winner and will probably not have a winner for billions of years. http://ericsink.com/vcbe/html/cryptographic_hashes.html
in which case it wouldn't even require the changes to occur at the same time.
When the second Image finds the hash-named-file already exists, it could check the contents and flag an error if they don't match, so at least its not a silent error. The same when integrating different repositories.
I guess it would also lose the order-sequence of the change log too... unless you were to try to use the underlying filesystem's timestamps on each file but... it wouldn't work after I've copied all the files via scp and because they all get new timestamps...
good point. This would complicate changes-replay for a crashed image. Although this case is only important "now" and could be handled by "/tmp/${username}.${last-image-save-checkpoint-id}" file that records the order of commits for a session, that would be checked for on Image startup - which is similar to what you already suggested...
Upon launching of the image, start a, temporary changes file, [image-name]-[some UUID].changes.
Upon image save, append the temp changes file to the main changes file, but in an atomic way (first do the append as a new unique filename, then rename it to the original changes file name).
Good idea. This would eliminate the need for my idea here. You'd need some way to match the UUID with the Image being opened, so I guess the UUID would need to stored in the saved Image and be constant for the session, and be updated each save of the Image. The temporary changes filename could include username to distinguish between users. If the same user opens an Image twice, there would be two files and upon recovering from a crash the user would be presented a choice between the two files.
Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..?
This seemed more of a classroom system administration issue. Actually in that case, maybe the network executable startup script just copied both image and changes file to the user's personal area?
cheers -ben