Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
Max Leske wrote
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
Hi Max,
if I have a running VM in Windows 10, I cannot open a second one on the same image and also get write access to it. A warning appears. So, it never happens that two images write into the same changes file.
Best, Marcel
-- View this message in context: http://forum.world.st/The-changes-file-should-be-bound-to-a-single-image-tp4... Sent from the Squeak - Dev mailing list archive at Nabble.com.
On 28.06.2016, at 14:23, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Max Leske wrote
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
Hi Max,
if I have a running VM in Windows 10, I cannot open a second one on the same image and also get write access to it. A warning appears. So, it never happens that two images write into the same changes file.
On Mac and Linux this is a problem, however.
Best -Tobias
Now don’t get me started.. I’ve ranted about this off and on since ’96! Be happy that I have to leave now to take my subaru in for service...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Out there where the buses don't run.
I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature.
I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good.
Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file.
That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins"....
On Tue, Jun 28, 2016 at 5:04 AM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote:
On Tue, Jun 28, 2016 at 5:04 AM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing.
Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases.
I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature.
That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works.
I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good.
My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works.
Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file.
That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins"....
There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that.
Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine.
Max, which operating system/VM/image are you using? Is this on a Mac?
Dave
On 29 Jun 2016, at 02:06, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote:
On Tue, Jun 28, 2016 at 5:04 AM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing.
Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases.
I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature.
Doing work is not the problem. Modifying source code is the problem.
That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works.
I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good.
My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works.
Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file.
That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins"....
There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that.
:) Thanks Dave!
Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine.
Max, which operating system/VM/image are you using? Is this on a Mac?
Mac OS X 10.11.5, Pharo 6 (60086)
Dave
I actually didn’t open the issue for myself but because of a student who ran into this. I’ve been in the same situation before but I’m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem.
Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for.
Here’s a reproducible case (at least on my machine):
1. create a new method in both images:
foo ^ nil
2. Modify it in one image:
foo "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ^ nil + 1
3. Modify it in the other image:
foo ^ nil - 1 isEmpty ifTrue: [ "blah" nil ]
In my case saving in step three produces a syntax error when the source is loaded from file again. I don’t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing.
I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn’t clean up the lock…).
Cheers, Max
Max,
Confirming on Linux and Squeak. See below.
On Wed, Jun 29, 2016 at 08:53:12AM +0200, Max Leske wrote:
On 29 Jun 2016, at 02:06, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote:
On Tue, Jun 28, 2016 at 5:04 AM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing.
Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases.
I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature.
Doing work is not the problem. Modifying source code is the problem.
That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works.
I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good.
My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works.
Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file.
That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins"....
There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that.
:) Thanks Dave!
Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine.
Max, which operating system/VM/image are you using? Is this on a Mac?
Mac OS X 10.11.5, Pharo 6 (60086)
Dave
I actually didn???t open the issue for myself but because of a student who ran into this. I???ve been in the same situation before but I???m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem.
Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for.
Here???s a reproducible case (at least on my machine):
- create a new method in both images:
foo ^ nil
- Modify it in one image:
foo "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ^ nil + 1
- Modify it in the other image:
foo ^ nil - 1 isEmpty ifTrue: [ "blah" nil ]
Confirmed on Linux + Squeak.
I did your test above using #forkSqueak so that I had two identical images sharing the same changes file. In each image, I saved the #foo method. At that point, the changes file conntained exactly what I would expect.
I then did a save and exit from the child image, followed by a save and exit from the original image. I can see that the changes from the child image are now overwriting the changes from the original parent image. Since the parent image is the one that was saved last, its #foo method now has corrupted source.
This is not a scenario that I have ever encountered, but I can see how it might happen in a classroom setting.
I can't look into this further right now, but it seems possible that the problem happens only when saving the image, in which case we could force the changes file to seek to end of file before doing the save. But we'll need to do some more testing to make sure that this is the only scenario in which it happens.
Dave
In my case saving in step three produces a syntax error when the source is loaded from file again. I don???t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing.
I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn???t clean up the lock???).
Cheers, Max
On 29 Jun 2016, at 14:45, David T. Lewis lewis@mail.msen.com wrote:
Max,
Confirming on Linux and Squeak. See below.
On Wed, Jun 29, 2016 at 08:53:12AM +0200, Max Leske wrote:
On 29 Jun 2016, at 02:06, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote:
On Tue, Jun 28, 2016 at 5:04 AM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing.
Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases.
I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature.
Doing work is not the problem. Modifying source code is the problem.
That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works.
I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good.
My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works.
Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file.
That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins"....
There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that.
:) Thanks Dave!
Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine.
Max, which operating system/VM/image are you using? Is this on a Mac?
Mac OS X 10.11.5, Pharo 6 (60086)
Dave
I actually didn???t open the issue for myself but because of a student who ran into this. I???ve been in the same situation before but I???m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem.
Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for.
Here???s a reproducible case (at least on my machine):
- create a new method in both images:
foo ^ nil
- Modify it in one image:
foo "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ^ nil + 1
- Modify it in the other image:
foo ^ nil - 1 isEmpty ifTrue: [ "blah" nil ]
Confirmed on Linux + Squeak.
I did your test above using #forkSqueak so that I had two identical images sharing the same changes file. In each image, I saved the #foo method. At that point, the changes file conntained exactly what I would expect.
I then did a save and exit from the child image, followed by a save and exit from the original image. I can see that the changes from the child image are now overwriting the changes from the original parent image. Since the parent image is the one that was saved last, its #foo method now has corrupted source.
This is not a scenario that I have ever encountered, but I can see how it might happen in a classroom setting.
I can't look into this further right now, but it seems possible that the problem happens only when saving the image, in which case we could force the changes file to seek to end of file before doing the save. But we'll need to do some more testing to make sure that this is the only scenario in which it happens.
Dave
Great, thanks.
In the scenario I described I did not save either image. Of course, without saving the problem will not exist as soon as you start the image anew (the old pointers are still valid and new content will be written to the end). The problem does exhibit itself without saving though.
Since this is not anything critical, don’t put too much effort into it. I’ll have time in a couple of weeks to look at it in detail and then, once we understand the problem, we can discuss possible solutions.
Cheers, Max
In my case saving in step three produces a syntax error when the source is loaded from file again. I don???t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing.
I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn???t clean up the lock???).
Cheers, Max
This seems to be a missing #flush after changes are written to the file. Without #flush both processes (unix) will maintain their own version of the file in memory.
Levente
On Wed, 29 Jun 2016, Max Leske wrote:
On 29 Jun 2016, at 14:45, David T. Lewis <lewis@mail.msen.com> wrote:
Max,
Confirming on Linux and Squeak. See below.
On Wed, Jun 29, 2016 at 08:53:12AM +0200, Max Leske wrote:
On 29 Jun 2016, at 02:06, David T. Lewis <lewis@mail.msen.com> wrote: On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote: On Tue, Jun 28, 2016 at 5:04 AM, Max Leske <maxleske@gmail.com> wrote: Hi, Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing. Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases. I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature. Doing work is not the problem. Modifying source code is the problem. That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works. I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good. My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works. Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file. That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins".... There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that. :) Thanks Dave! Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine. Max, which operating system/VM/image are you using? Is this on a Mac? Mac OS X 10.11.5, Pharo 6 (60086) Dave I actually didn???t open the issue for myself but because of a student who ran into this. I???ve been in the same situation before but I???m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem. Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for. Here???s a reproducible case (at least on my machine): 1. create a new method in both images: foo ^ nil 2. Modify it in one image: foo "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ^ nil + 1 3. Modify it in the other image: foo ^ nil - 1 isEmpty ifTrue: [ "blah" nil ]
Confirmed on Linux + Squeak.
I did your test above using #forkSqueak so that I had two identical images sharing the same changes file. In each image, I saved the #foo method. At that point, the changes file conntained exactly what I would expect.
I then did a save and exit from the child image, followed by a save and exit from the original image. I can see that the changes from the child image are now overwriting the changes from the original parent image. Since the parent image is the one that was saved last, its #foo method now has corrupted source.
This is not a scenario that I have ever encountered, but I can see how it might happen in a classroom setting.
I can't look into this further right now, but it seems possible that the problem happens only when saving the image, in which case we could force the changes file to seek to end of file before doing the save. But we'll need to do some more testing to make sure that this is the only scenario in which it happens.
Dave
Great, thanks.
In the scenario I described I did not save either image. Of course, without saving the problem will not exist as soon as you start the image anew (the old pointers are still valid and new content will be written to the end). The problem does exhibit itself without saving though.
Since this is not anything critical, don’t put too much effort into it. I’ll have time in a couple of weeks to look at it in detail and then, once we understand the problem, we can discuss possible solutions.
Cheers, Max
In my case saving in step three produces a syntax error when the source is loaded from file again. I don???t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing. I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn???t clean up the lock???). Cheers, Max
Hi Levente,
Without having looked into this at all I think you are on to something with the missing #flush and maybe even a #close is needed because jumping to the end of a file unclosed in another process may not (probably does not) go to the end.
Lou
On Wed, 29 Jun 2016 16:08:50 +0200 (CEST), Levente Uzonyi leves@caesar.elte.hu wrote:
This seems to be a missing #flush after changes are written to the file. Without #flush both processes (unix) will maintain their own version of the file in memory.
Levente
On Wed, 29 Jun 2016, Max Leske wrote:
On 29 Jun 2016, at 14:45, David T. Lewis <lewis@mail.msen.com> wrote:
Max,
Confirming on Linux and Squeak. See below.
On Wed, Jun 29, 2016 at 08:53:12AM +0200, Max Leske wrote:
On 29 Jun 2016, at 02:06, David T. Lewis <lewis@mail.msen.com> wrote: On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote: On Tue, Jun 28, 2016 at 5:04 AM, Max Leske <maxleske@gmail.com> wrote: Hi, Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing. Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases. I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image I have several applications which launch multiple copies of the same image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature. Doing work is not the problem. Modifying source code is the problem. That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works. I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good. My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works. Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file. That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins".... There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that. :) Thanks Dave! Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine. Max, which operating system/VM/image are you using? Is this on a Mac? Mac OS X 10.11.5, Pharo 6 (60086) Dave I actually didn???t open the issue for myself but because of a student who ran into this. I???ve been in the same situation before but I???m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem. Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for. Here???s a reproducible case (at least on my machine): 1. create a new method in both images: foo ^ nil 2. Modify it in one image: foo "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ^ nil + 1 3. Modify it in the other image: foo ^ nil - 1 isEmpty ifTrue: [ "blah" nil ]
Confirmed on Linux + Squeak.
I did your test above using #forkSqueak so that I had two identical images sharing the same changes file. In each image, I saved the #foo method. At that point, the changes file conntained exactly what I would expect.
I then did a save and exit from the child image, followed by a save and exit from the original image. I can see that the changes from the child image are now overwriting the changes from the original parent image. Since the parent image is the one that was saved last, its #foo method now has corrupted source.
This is not a scenario that I have ever encountered, but I can see how it might happen in a classroom setting.
I can't look into this further right now, but it seems possible that the problem happens only when saving the image, in which case we could force the changes file to seek to end of file before doing the save. But we'll need to do some more testing to make sure that this is the only scenario in which it happens.
Dave
Great, thanks.
In the scenario I described I did not save either image. Of course, without saving the problem will not exist as soon as you start the image anew (the old pointers are still valid and new content will be written to the end). The problem does exhibit itself without saving though.
Since this is not anything critical, dont put too much effort into it. Ill have time in a couple of weeks to look at it in detail and then, once we understand the problem, we can discuss possible solutions.
Cheers, Max
In my case saving in step three produces a syntax error when the source is loaded from file again. I don???t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing. I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn???t clean up the lock???). Cheers, Max
On 29-06-2016, at 7:08 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
This seems to be a missing #flush after changes are written to the file. Without #flush both processes (unix) will maintain their own version of the file in memory.
Pretty much exactly what I was about to type. We just had part of this discussion wrt Scratch project files on the Pi - adding flush/sync etc.
In many cases letting an OS buffer the buffering of the buffer’s buffer buffer is tolerable - though insane, and wasteful, and a symptom of the lack of careful analysis that seems to pervade the world of software these days - because nothing goes horribly wrong in most cases. Everything eventually gets pushed to actual hardware, the system doesn’t crash, evaporate, get zapped by Zargon DeathRay(™) emissions, the power doesn’t get ripped out etc. Evidently, on a Pi in a classroom we can’t assign quite such a low probability to the Zargon problem.
However, the changes file is supposed to be a transaction log and as such I claim the data ought to hit hardware as soon as possible and in a way that as near as dammit guarantees correct results. So the mega-layer buffering is An Issue so far as I’m concerned.
We also, still, and for decades now, have the behaviour I consider stupid beyond all reason whereby a file write is done by a) tell the file pointer to move to a certain location b) think about it c) oh, finally write some text to the file. With the obvious possibility that the file pointer can be changed in b) Then if you can open-for-write a file multiple times, how much confusion can that actually cause? What about a forked process with nominally the same file objects? Are we at all sure any OS properly deals with it? Are we sure that what is purportedly ‘proper’ makes any sense for our requirements?
The most obvious place where this is an issue is where two images are using the same changes file and think they’re appending. Image A seeks to the end of the file, ‘writes’ stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And let’s not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "Daddy, what does FORMATTING DRIVE C mean?"
Hi Tim,
On Wed, Jun 29, 2016 at 10:24 AM, tim Rowledge tim@rowledge.org wrote:
On 29-06-2016, at 7:08 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
This seems to be a missing #flush after changes are written to the file. Without #flush both processes (unix) will maintain their own version of
the file in memory.
Pretty much exactly what I was about to type. We just had part of this discussion wrt Scratch project files on the Pi - adding flush/sync etc.
In many cases letting an OS buffer the buffering of the buffer’s buffer buffer is tolerable - though insane, and wasteful, and a symptom of the lack of careful analysis that seems to pervade the world of software these days - because nothing goes horribly wrong in most cases. Everything eventually gets pushed to actual hardware, the system doesn’t crash, evaporate, get zapped by Zargon DeathRay(™) emissions, the power doesn’t get ripped out etc. Evidently, on a Pi in a classroom we can’t assign quite such a low probability to the Zargon problem.
However, the changes file is supposed to be a transaction log and as such I claim the data ought to hit hardware as soon as possible and in a way that as near as dammit guarantees correct results. So the mega-layer buffering is An Issue so far as I’m concerned.
We also, still, and for decades now, have the behaviour I consider stupid beyond all reason whereby a file write is done by a) tell the file pointer to move to a certain location b) think about it c) oh, finally write some text to the file. With the obvious possibility that the file pointer can be changed in b) Then if you can open-for-write a file multiple times, how much confusion can that actually cause? What about a forked process with nominally the same file objects? Are we at all sure any OS properly deals with it? Are we sure that what is purportedly ‘proper’ makes any sense for our requirements?
The most obvious place where this is an issue is where two images are using the same changes file and think they’re appending. Image A seeks to the end of the file, ‘writes’ stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And let’s not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive that both positions and writes if that primitive is written above a substrate that, as unix and stdio streams do, separates positioning from writing. The primitive is neat but it simply drives the problem further underground.
A more robust solution might be to position, write, reposition, read, and compare, shortening on corruption, and retrying, using exponential back-off like ethernet packet transmission. Most of the time this adds only the overhead of reading what's written.
_,,,^..^,,,_ best, Eliot
On 29-06-2016, at 10:35 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
{snip much rant}
The most obvious place where this is an issue is where two images are using the same changes file and think they’re appending. Image A seeks to the end of the file, ‘writes’ stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And let’s not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive that both positions and writes if that primitive is written above a substrate that, as unix and stdio streams do, separates positioning from writing. The primitive is neat but it simply drives the problem further underground.
Oh absolutely - we only have real control over a small part of it. It would probably be worth making use of that where we can.
A more robust solution might be to position, write, reposition, read, and compare, shortening on corruption, and retrying, using exponential back-off like ethernet packet transmission. Most of the time this adds only the overhead of reading what's written.
Yes, for anything we want reliable that’s probably a good way. A limit on the number of retries would probably be smart to stop infinite recursion. Imagine the fun of an error causing infinite retries of writing an error log about an infinite recursion. On an infinitely large Beowulf cluster!
It’s all yet another example of where software meeting reality leads to nightmares.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim If it was easy, the hardware people would take care of it.
Let's not solve the wrong problem folks. I only looked at this for 10 minutes this morning, and I think (but I am not sure) that the issue affects the case of saving the image, and that the normal writing of changes is fine.
Max was running on Pharo, which may or may not be handling changes the same way. I think he may be seeing a different problem from the one I confirmed.
So a bit more testing and verification would be in order. I can't look at it now though.
Dave
On 29-06-2016, at 10:35 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
{snip much rant}
The most obvious place where this is an issue is where two images are using the same changes file and think theyâre appending. Image A seeks to the end of the file, âwritesâ stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And letâs not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive that both positions and writes if that primitive is written above a substrate that, as unix and stdio streams do, separates positioning from writing. The primitive is neat but it simply drives the problem further underground.
Oh absolutely - we only have real control over a small part of it. It would probably be worth making use of that where we can.
A more robust solution might be to position, write, reposition, read, and compare, shortening on corruption, and retrying, using exponential back-off like ethernet packet transmission. Most of the time this adds only the overhead of reading what's written.
Yes, for anything we want reliable thatâs probably a good way. A limit on the number of retries would probably be smart to stop infinite recursion. Imagine the fun of an error causing infinite retries of writing an error log about an infinite recursion. On an infinitely large Beowulf cluster!
Itâs all yet another example of where software meeting reality leads to nightmares.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim If it was easy, the hardware people would take care of it.
On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote:
Let's not solve the wrong problem folks. I only looked at this for 10 minutes this morning, and I think (but I am not sure) that the issue affects the case of saving the image, and that the normal writing of changes is fine.
I am wrong.
I spent some more time with this, and it is clear that two images saving chunks to the same changes file will result in corrupted change records in the changes file. It is not just an issue related to the image save as I suggested above.
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Dave
Max was running on Pharo, which may or may not be handling changes the same way. I think he may be seeing a different problem from the one I confirmed.
So a bit more testing and verification would be in order. I can't look at it now though.
Dave
On 29-06-2016, at 10:35 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
{snip much rant}
The most obvious place where this is an issue is where two images are using the same changes file and think they???re appending. Image A seeks to the end of the file, ???writes??? stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And let???s not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive that both positions and writes if that primitive is written above a substrate that, as unix and stdio streams do, separates positioning from writing. The primitive is neat but it simply drives the problem further underground.
Oh absolutely - we only have real control over a small part of it. It would probably be worth making use of that where we can.
A more robust solution might be to position, write, reposition, read, and compare, shortening on corruption, and retrying, using exponential back-off like ethernet packet transmission. Most of the time this adds only the overhead of reading what's written.
Yes, for anything we want reliable that???s probably a good way. A limit on the number of retries would probably be smart to stop infinite recursion. Imagine the fun of an error causing infinite retries of writing an error log about an infinite recursion. On an infinitely large Beowulf cluster!
It???s all yet another example of where software meeting reality leads to nightmares.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim If it was easy, the hardware people would take care of it.
On Thu, Jun 30, 2016 at 7:07 AM, David T. Lewis lewis@mail.msen.com wrote:
On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote:
Let's not solve the wrong problem folks. I only looked at this for 10 minutes this morning, and I think (but I am not sure) that the issue affects the case of saving the image, and that the normal writing of changes is fine.
I am wrong.
I spent some more time with this, and it is clear that two images saving chunks to the same changes file will result in corrupted change records in the changes file. It is not just an issue related to the image save as I suggested above.
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to. Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image getsto a method's source. Each such file would *only* need be written once and thereafter could be read simultaneously by multiple Images. Anyone on the network wanting store the same source would see the file already exists and have nothing to do. Perhaps having many individual files implies abysmal performance,
Or maybe something similar to Mecurial's reflog format [1] could be used, one file per class.
The thing about the Image *only* referring to a method's source by its content hash would seem to great flexibility in backends to locate/store that source. Possibly... * stored as individual files as above * bundled in a zip file in random order * a school could configure a database server in Image provided to students * hashes could be thrown at a service on the Internet * cached locally with a key-value database like LMDB [2] * remote replication to multiple internet backup locations * in an emergency you could throw bundle of hashes as a query to the mail list and get an adhoc response of individual files. * Inter-Smalltalk image communication
Pharo has a stated goal to get rid of the changes file. Changing to content-hash-addressable method-source seems a logicial step along that road. Even if the Squeak community doesn't want to go so far as eliminating the .changes file, can they see value in changing method source references to be content-hashes rather than indexes into a particular file?
[1] http://blog.prasoonshukla.com/mercurial-vs-git-scaling [2] https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database
Just having a poke at this, it seems a new form of CompiledMethodTrailer may need to be defined, being invoked from CompiledMethod>>sourceCode. CompiledMethodTrailer>>sourceCode would find the source code based on a content-hash held by the CompiledMethod. If found, the call to #getSourceFromFile that accesses the .changes file will be bypassed, and could remain as a backup.
cheers -ben
Dave
Max was running on Pharo, which may or may not be handling changes the same way. I think he may be seeing a different problem from the one I confirmed.
So a bit more testing and verification would be in order. I can't look at it now though.
Dave
On 29-06-2016, at 10:35 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
{snip much rant}
The most obvious place where this is an issue is where two images are using the same changes file and think they???re appending. Image A seeks to the end of the file, ???writes??? stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And let???s not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive that both positions and writes if that primitive is written above a substrate that, as unix and stdio streams do, separates positioning from writing. The primitive is neat but it simply drives the problem further underground.
Oh absolutely - we only have real control over a small part of it. It would probably be worth making use of that where we can.
A more robust solution might be to position, write, reposition, read, and compare, shortening on corruption, and retrying, using exponential back-off like ethernet packet transmission. Most of the time this adds only the overhead of reading what's written.
Yes, for anything we want reliable that???s probably a good way. A limit on the number of retries would probably be smart to stop infinite recursion. Imagine the fun of an error causing infinite retries of writing an error log about an infinite recursion. On an infinitely large Beowulf cluster!
It???s all yet another example of where software meeting reality leads to nightmares.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim If it was easy, the hardware people would take care of it.
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to. Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image getsto a method's source. Each such file would
It sounds like a lot of files.. so how would I move an image to another computer? I gotta know which files go with which image?
Plus, it doesn't really solve the fundamental problem of two images writing to the same file. Mutliple images could still change the same method to the same contents at the same time. You may have made the problem less-likely, except for when you have your first hash-collision of *different* sources (it COULD happen), in which case it wouldn't even require the changes to occur at the same time.
I guess it would also lose the order-sequence of the change log too... unless you were to try to use the underlying filesystem's timestamps on each file but... it wouldn't work after I've copied all the files via scp and because they all get new timestamps...
Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..?
Another thought...
Upon launching of the image, start a, temporary changes file, [image-name]-[some UUID].changes.
Upon image save, append the temp changes file to the main changes file, but in an atomic way (first do the append as a new unique filename, then rename it to the original changes file name).
Hmm, but then we would have to check two changes files when accessing sources..
On Thu, Jun 30, 2016 at 3:10 PM, Chris Muller asqueaker@gmail.com wrote:
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to. Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image getsto a method's source. Each such file would
It sounds like a lot of files.. so how would I move an image to another computer? I gotta know which files go with which image?
Plus, it doesn't really solve the fundamental problem of two images writing to the same file. Mutliple images could still change the same method to the same contents at the same time. You may have made the problem less-likely, except for when you have your first hash-collision of *different* sources (it COULD happen), in which case it wouldn't even require the changes to occur at the same time.
I guess it would also lose the order-sequence of the change log too... unless you were to try to use the underlying filesystem's timestamps on each file but... it wouldn't work after I've copied all the files via scp and because they all get new timestamps...
Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..?
Sounds like a better idea to me, but I don't think it would solve the problem of multiple images almost simultaneously attempting to update themselves (as in a classroom)
Sent from my iPad
On Jun 30, 2016, at 13:31, Chris Muller asqueaker@gmail.com wrote:
Another thought...
Upon launching of the image, start a, temporary changes file, [image-name]-[some UUID].changes.
Upon image save, append the temp changes file to the main changes file, but in an atomic way (first do the append as a new unique filename, then rename it to the original changes file name).
Hmm, but then we would have to check two changes files when accessing sources..
On Thu, Jun 30, 2016 at 3:10 PM, Chris Muller asqueaker@gmail.com wrote:
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to. Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image getsto a method's source. Each such file would
It sounds like a lot of files.. so how would I move an image to another computer? I gotta know which files go with which image?
Plus, it doesn't really solve the fundamental problem of two images writing to the same file. Mutliple images could still change the same method to the same contents at the same time. You may have made the problem less-likely, except for when you have your first hash-collision of *different* sources (it COULD happen), in which case it wouldn't even require the changes to occur at the same time.
I guess it would also lose the order-sequence of the change log too... unless you were to try to use the underlying filesystem's timestamps on each file but... it wouldn't work after I've copied all the files via scp and because they all get new timestamps...
Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..?
On Fri, Jul 1, 2016 at 4:10 AM, Chris Muller asqueaker@gmail.com wrote:
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to. Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image gets to a method's source. Each such file would
It sounds like a lot of files.. so how would I move an image to another computer? I gotta know which files go with which image?
Yes, that would be a sticking point. You couldn't just grab any saved Image file off disk. The image would first need to generate an archive transfer file. Except if these methods were automatically pushed through to a private web service, then presuming pervasive web access you, that sleeping Image would pull down its sources where ever it boots back up (which even if that would be cool, is not the problem of the original post.)
Plus, it doesn't really solve the fundamental problem of two images writing to the same file. Multiple images could still change the same method to the same contents at the same time.
The hash-named-file would never be written to twice. Its a fixed point in space-time ;) A second image with the same hash would write the *same* contents, so there is no need to write. If the hash-named-file exists, do nothing. To handle any race condition between checking file existence and writing to it, the first image could take an exclusive write lock.
You may have made the problem less-likely, except for when you have your first hash-collision of *different* sources (it COULD happen),
Some equivalent things...
* Pick a random atom from the volume of the moon, then another random pick gets the same atom. http://stackoverflow.com/a/23253149
* Win the national lottery 11 times in a row http://stackoverflow.com/a/29146396
* Your chances of winning the Powerball lottery are far better than finding a hash collision. After all, lotteries often have actual winners. The probability of a hash collision is more like a lottery that has been running since prehistoric times and has never had a winner and will probably not have a winner for billions of years. http://ericsink.com/vcbe/html/cryptographic_hashes.html
in which case it wouldn't even require the changes to occur at the same time.
When the second Image finds the hash-named-file already exists, it could check the contents and flag an error if they don't match, so at least its not a silent error. The same when integrating different repositories.
I guess it would also lose the order-sequence of the change log too... unless you were to try to use the underlying filesystem's timestamps on each file but... it wouldn't work after I've copied all the files via scp and because they all get new timestamps...
good point. This would complicate changes-replay for a crashed image. Although this case is only important "now" and could be handled by "/tmp/${username}.${last-image-save-checkpoint-id}" file that records the order of commits for a session, that would be checked for on Image startup - which is similar to what you already suggested...
Upon launching of the image, start a, temporary changes file, [image-name]-[some UUID].changes.
Upon image save, append the temp changes file to the main changes file, but in an atomic way (first do the append as a new unique filename, then rename it to the original changes file name).
Good idea. This would eliminate the need for my idea here. You'd need some way to match the UUID with the Image being opened, so I guess the UUID would need to stored in the saved Image and be constant for the session, and be updated each save of the Image. The temporary changes filename could include username to distinguish between users. If the same user opens an Image twice, there would be two files and upon recovering from a crash the user would be presented a choice between the two files.
Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..?
This seemed more of a classroom system administration issue. Actually in that case, maybe the network executable startup script just copied both image and changes file to the user's personal area?
cheers -ben
Sent from my iPad
Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..?
This seemed more of a classroom system administration issue. Actually in that case, maybe the network executable startup script just copied both image and changes file to the user's personal area?
cheers -ben
This is the best idea of all...
Ben,
On Jun 29, 2016, at 9:48 PM, Ben Coman btc@openinworld.com wrote:
On Thu, Jun 30, 2016 at 7:07 AM, David T. Lewis lewis@mail.msen.com wrote:
On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote: Let's not solve the wrong problem folks. I only looked at this for 10 minutes this morning, and I think (but I am not sure) that the issue affects the case of saving the image, and that the normal writing of changes is fine.
I am wrong.
I spent some more time with this, and it is clear that two images saving chunks to the same changes file will result in corrupted change records in the changes file. It is not just an issue related to the image save as I suggested above.
In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder.
Maybe its time to consider a fundamental change in how method-sources are referred to.
The changes file us not merely the repository for sources on newly minted methods. It is also a log file, a crash recovery mechanism. It is simple. It works. You propose something horribly complex to solve a problem that a) died t affect very many people, b) is easy to work around and c) feasible to fix with a well-known approach. If doesn't wash for me.
Taking inspiration from git... A content addressable key-value file store might solve concurrent access. Each CompiledMethod gets written to a file named for the hash of its contents, which is the only reference the Image getsto a method's source. Each such file would *only* need be written once and thereafter could be read simultaneously by multiple Images. Anyone on the network wanting store the same source would see the file already exists and have nothing to do. Perhaps having many individual files implies abysmal performance,
Or maybe something similar to Mecurial's reflog format [1] could be used, one file per class.
The thing about the Image *only* referring to a method's source by its content hash would seem to great flexibility in backends to locate/store that source. Possibly...
- stored as individual files as above
- bundled in a zip file in random order
- a school could configure a database server in Image provided to students
- hashes could be thrown at a service on the Internet
- cached locally with a key-value database like LMDB [2]
- remote replication to multiple internet backup locations
- in an emergency you could throw bundle of hashes as a query to the
mail list and get an adhoc response of individual files.
- Inter-Smalltalk image communication
Pharo has a stated goal to get rid of the changes file. Changing to content-hash-addressable method-source seems a logicial step along that road. Even if the Squeak community doesn't want to go so far as eliminating the .changes file, can they see value in changing method source references to be content-hashes rather than indexes into a particular file?
[1] http://blog.prasoonshukla.com/mercurial-vs-git-scaling [2] https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database
Just having a poke at this, it seems a new form of CompiledMethodTrailer may need to be defined, being invoked from CompiledMethod>>sourceCode. CompiledMethodTrailer>>sourceCode would find the source code based on a content-hash held by the CompiledMethod. If found, the call to #getSourceFromFile that accesses the .changes file will be bypassed, and could remain as a backup.
cheers -ben
Dave
Max was running on Pharo, which may or may not be handling changes the same way. I think he may be seeing a different problem from the one I confirmed.
So a bit more testing and verification would be in order. I can't look at it now though.
Dave
On 29-06-2016, at 10:35 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
{snip much rant}
The most obvious place where this is an issue is where two images are using the same changes file and think they???re appending. Image A seeks to the end of the file, ???writes??? stuff. Image B near-simultaneously does the same. Eventually each process gets around to pushing data to hardware. Oops! And let???s not dwell too much on the problems possible if either process causes a truncation of the file. Oh, wait, I think we actually had a problem with that some years ago.
The thing is that this problem bites even if we have a unitary primitive that both positions and writes if that primitive is written above a substrate that, as unix and stdio streams do, separates positioning from writing. The primitive is neat but it simply drives the problem further underground.
Oh absolutely - we only have real control over a small part of it. It would probably be worth making use of that where we can.
A more robust solution might be to position, write, reposition, read, and compare, shortening on corruption, and retrying, using exponential back-off like ethernet packet transmission. Most of the time this adds only the overhead of reading what's written.
Yes, for anything we want reliable that???s probably a good way. A limit on the number of retries would probably be smart to stop infinite recursion. Imagine the fun of an error causing infinite retries of writing an error log about an infinite recursion. On an infinitely large Beowulf cluster!
It???s all yet another example of where software meeting reality leads to nightmares.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim If it was easy, the hardware people would take care of it.
On Tue, Jun 28, 2016 at 6:04 PM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
I just learnt something quite surprising that is probably important to be aware of... "Locks given by fcntl are not associated with the file-descriptor or open-file table entries. Instead, they are bound to the process itself. For example, a process has multiple open file descriptors for a particular file and gets a read/write lock using any one of these descriptors. Now closing any of these file descriptors will release the lock, the process holds on the file. The descriptor that was used to acquire the lock in the first place might still be open, but the process will loose its lock. So, it does not require an explicit unlock or a close ONLY on the descriptor that was used to acquire the lock in fcntl call. Doing unlock or close on any of the open file descriptors will release the lock owned by the process on the particular file."
https://loonytek.com/2015/01/15/advisory-file-locking-differences-between-po...
cheers -ben
On 30 Jun 2016, at 05:09, Ben Coman btc@openInWorld.com wrote:
On Tue, Jun 28, 2016 at 6:04 PM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
I just learnt something quite surprising that is probably important to be aware of... "Locks given by fcntl are not associated with the file-descriptor or open-file table entries. Instead, they are bound to the process itself. For example, a process has multiple open file descriptors for a particular file and gets a read/write lock using any one of these descriptors. Now closing any of these file descriptors will release the lock, the process holds on the file. The descriptor that was used to acquire the lock in the first place might still be open, but the process will loose its lock. So, it does not require an explicit unlock or a close ONLY on the descriptor that was used to acquire the lock in fcntl call. Doing unlock or close on any of the open file descriptors will release the lock owned by the process on the particular file."
https://loonytek.com/2015/01/15/advisory-file-locking-differences-between-po...
cheers -ben
Which would solve the problem of a crashed image not cleaning up its lock. Thanks for sharing Ben.
On Thu, Jun 30, 2016 at 09:59:37AM +0200, Max Leske wrote:
On 30 Jun 2016, at 05:09, Ben Coman btc@openInWorld.com wrote:
On Tue, Jun 28, 2016 at 6:04 PM, Max Leske maxleske@gmail.com wrote:
Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello.
I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file.
I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-...
Cheers, Max
I just learnt something quite surprising that is probably important to be aware of... "Locks given by fcntl are not associated with the file-descriptor or open-file table entries. Instead, they are bound to the process itself. For example, a process has multiple open file descriptors for a particular file and gets a read/write lock using any one of these descriptors. Now closing any of these file descriptors will release the lock, the process holds on the file. The descriptor that was used to acquire the lock in the first place might still be open, but the process will loose its lock. So, it does not require an explicit unlock or a close ONLY on the descriptor that was used to acquire the lock in fcntl call. Doing unlock or close on any of the open file descriptors will release the lock owned by the process on the particular file."
https://loonytek.com/2015/01/15/advisory-file-locking-differences-between-po...
cheers -ben
Which would solve the problem of a crashed image not cleaning up its lock. Thanks for sharing Ben.
FYI, file locking for Unix/Linux/OS X is supported in OSProcess, see UnixProcessFileLockTestCase and the 'file locking' tests in UnixProcessAccessorTestCase.
Dave
squeak-dev@lists.squeakfoundation.org