Hi all,
some months ago, I corrupted my image by accidentally shutting down the host system while saving the image file (many of my images are > 500 MB, so this can take a few seconds even on an SSD). The same can happen due to various other IO/connection issues, so here's an idea: Couldn't we always use overwrite-by-rename when saving the image file? I. e., first the image into a new temporary file and, after saving has completed, replace the original file with that temp file (via mv)? This would ensure the image file's integrity.
A possible disadvantage, though, would be that some filesystems, such as NTFS, associate meta-information with the file identity, which changes when using the overwrite-by-rename approach. Also, technologies such as FileSystemWatcher would be confused for the same reason. However, afaik overwrite-by-rename is a quite common approach, in primary for big and sensitive files.
However, what are your opinions about this topic? :-)
Best,
Christoph
That sounds like a great idea.
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
Perhaps the image save primitive could respond to a VM command-line switch (or in-image VM parameter?) selecting among three behaviours:
1. The current overwrite-in-place, risk-of-corruption behaviour 2. Overwrite-by-rename if possible 3. Make backup copy before overwrite-in-place
Regards, Tony
On 1/29/20 6:00 PM, Thiede, Christoph wrote:
Hi all,
some months ago, I corrupted my image by accidentally shutting down the host system while saving the image file (many of my images are > 500 MB, so this can take a few seconds even on an SSD). The same can happen due to various other IO/connection issues, so here's an idea: Couldn't we always use overwrite-by-rename when saving the image file? I. e., first the image into a new temporary file and, after saving has completed, replace the original file with that temp file (via mv)? This would ensure the image file's integrity.
A possible disadvantage, though, would be that some filesystems, such as NTFS, associate meta-information with the file identity, which changes when using the overwrite-by-rename approach. Also, technologies such as FileSystemWatcher would be confused for the same reason. However, afaik overwrite-by-rename is a quite common approach, in primary for big and sensitive files.
However, what are your opinions about this topic? :-)
Best,
Christoph
It's certainly do-able; we had a system like this at Interval. Basically write an image to a suitably chosen name, check if it was ok (which could involve a lot of work if you want to be paranoid) and if so rename (and check the rename worked!) and quit. Craig might possibly have the code around? I certainly don't.
On 2020-01-29, at 12:10 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
That sounds like a great idea.
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
Perhaps the image save primitive could respond to a VM command-line switch (or in-image VM parameter?) selecting among three behaviours:
- The current overwrite-in-place, risk-of-corruption behaviour
- Overwrite-by-rename if possible
- Make backup copy before overwrite-in-place
Regards, Tony
On 1/29/20 6:00 PM, Thiede, Christoph wrote:
Hi all,
some months ago, I corrupted my image by accidentally shutting down the host system while saving the image file (many of my images are > 500 MB, so this can take a few seconds even on an SSD). The same can happen due to various other IO/connection issues, so here's an idea: Couldn't we always use overwrite-by-rename when saving the image file? I. e., first the image into a new temporary file and, after saving has completed, replace the original file with that temp file (via mv)? This would ensure the image file's integrity.
A possible disadvantage, though, would be that some filesystems, such as NTFS, associate meta-information with the file identity, which changes when using the overwrite-by-rename approach. Also, technologies such as FileSystemWatcher would be confused for the same reason. However, afaik overwrite-by-rename is a quite common approach, in primary for big and sensitive files.
However, what are your opinions about this topic? :-)
Best,
Christoph
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Oxymorons: Living dead
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
... it'd be a Preference, I suppose?
Tony
On 1/29/20 9:19 PM, tim Rowledge wrote:
It's certainly do-able; we had a system like this at Interval. Basically write an image to a suitably chosen name, check if it was ok (which could involve a lot of work if you want to be paranoid) and if so rename (and check the rename worked!) and quit. Craig might possibly have the code around? I certainly don't.
On 2020-01-29, at 12:10 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
That sounds like a great idea.
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
Perhaps the image save primitive could respond to a VM command-line switch (or in-image VM parameter?) selecting among three behaviours:
- The current overwrite-in-place, risk-of-corruption behaviour
- Overwrite-by-rename if possible
- Make backup copy before overwrite-in-place
Regards, Tony
On 1/29/20 6:00 PM, Thiede, Christoph wrote:
Hi all,
some months ago, I corrupted my image by accidentally shutting down the host system while saving the image file (many of my images are > 500 MB, so this can take a few seconds even on an SSD). The same can happen due to various other IO/connection issues, so here's an idea: Couldn't we always use overwrite-by-rename when saving the image file? I. e., first the image into a new temporary file and, after saving has completed, replace the original file with that temp file (via mv)? This would ensure the image file's integrity.
A possible disadvantage, though, would be that some filesystems, such as NTFS, associate meta-information with the file identity, which changes when using the overwrite-by-rename approach. Also, technologies such as FileSystemWatcher would be confused for the same reason. However, afaik overwrite-by-rename is a quite common approach, in primary for big and sensitive files.
However, what are your opinions about this topic? :-)
Best,
Christoph
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Oxymorons: Living dead
On 2020-01-29, at 12:24 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
Pretty sure it could be done without VM support, yes. One might even use the OSProcess forking trick to do it, I think.
... it'd be a Preference, I suppose?
Yet another ...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Kdatlyno does it take to change a lightbulb?” "None. It sounds perfectly OK to them."
Hi all!
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
http://www.hpi.de/ Compared to overwrite-by-rename, this proposal would double the storage effort. Provided that I understand you correctly, -1 :-)
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
What would you like to do with this backup file? Keep them permanently? As we speak about hundreds-of-megabytes file sizes, I think this could be quite storage extensive ... Also, it messes up your image folder. We already have two files for each image: .image and .changes. No need for even more files, imho. But there may always be some special application areas, of course :)
+1 for making a preference for it :-) However, my personal flavor would be to rule this behavior via the Squeak.ini file (not sure what's the equivalent on other host platforms), so I would prefer to store this preference image-invariant.
VM support: What would be the pros and cons of implementing this in the VM? First, I don't know whether we already support a way to read the Squeak.ini file from within the image (see above)? Second, I *could* imagine (though this is spoken hypothetically) that certain host systems might provide convenient ways for implementing overwrite-by-rename. See my initial mail for my worries about a naive implementation. Again, wouldn't this be an argument for implementing this rather at VM side?
Ad validation: Sounds interesting! How high would be the effort for that? Could you do this from within the VM (it's also a question of performance, I guess)? Wouldn't this double the store time? Maybe it would be a good idea to have a second (VM) preference for toggling validation.
Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Mittwoch, 29. Januar 2020 22:05:07 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-01-29, at 12:24 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
Pretty sure it could be done without VM support, yes. One might even use the OSProcess forking trick to do it, I think.
... it'd be a Preference, I suppose?
Yet another ...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Kdatlyno does it take to change a lightbulb?” "None. It sounds perfectly OK to them."
My suggestion is to just try some ideas in your own image and see if it's something you want to live with. The intentions are good but I have a feeling that this is the kind of thing where the inintended side effects are worse than the problem to be solved.
In my own experience, I have encountered an IO error while saving the image several times over the years. In every case, the cause has been a file system full condition. A solution that uses more disc space would not have been helpful.
Tim mentions using OSProcess, so you can try something based on "UnixProcess saveImageInBackground" if you want.
Dave
On Thu, Jan 30, 2020 at 12:13:27PM +0000, Thiede, Christoph wrote:
Hi all!
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
http://www.hpi.de/ Compared to overwrite-by-rename, this proposal would double the storage effort. Provided that I understand you correctly, -1 :-)
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
What would you like to do with this backup file? Keep them permanently? As we speak about hundreds-of-megabytes file sizes, I think this could be quite storage extensive ... Also, it messes up your image folder. We already have two files for each image: .image and .changes. No need for even more files, imho. But there may always be some special application areas, of course :)
+1 for making a preference for it :-) However, my personal flavor would be to rule this behavior via the Squeak.ini file (not sure what's the equivalent on other host platforms), so I would prefer to store this preference image-invariant.
VM support: What would be the pros and cons of implementing this in the VM? First, I don't know whether we already support a way to read the Squeak.ini file from within the image (see above)? Second, I *could* imagine (though this is spoken hypothetically) that certain host systems might provide convenient ways for implementing overwrite-by-rename. See my initial mail for my worries about a naive implementation. Again, wouldn't this be an argument for implementing this rather at VM side?
Ad validation: Sounds interesting! How high would be the effort for that? Could you do this from within the VM (it's also a question of performance, I guess)? Wouldn't this double the store time? Maybe it would be a good idea to have a second (VM) preference for toggling validation.
Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Mittwoch, 29. Januar 2020 22:05:07 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-01-29, at 12:24 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
Pretty sure it could be done without VM support, yes. One might even use the OSProcess forking trick to do it, I think.
... it'd be a Preference, I suppose?
Yet another ...
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Kdatlyno does it take to change a lightbulb?? "None. It sounds perfectly OK to them."
Hi Dave,
On Jan 30, 2020, at 5:18 AM, David T. Lewis lewis@mail.msen.com wrote:
My suggestion is to just try some ideas in your own image and see if it's something you want to live with. The intentions are good but I have a feeling that this is the kind of thing where the inintended side effects are worse than the problem to be solved.
+1
In my own experience, I have encountered an IO error while saving the image several times over the years. In every case, the cause has been a file system full condition. A solution that uses more disc space would not have been helpful.
Good point. The snapshot primitive *could* make a conservative estimate of the file size needed (easy; it knows how big the heap is), create a file, write that many zeros (only way to actually commit the disc space), and then overwrite with the real data, but that’s twice the disc traffic.
Tim mentions using OSProcess, so you can try something based on "UnixProcess saveImageInBackground" if you want.
Dave
On Thu, Jan 30, 2020 at 12:13:27PM +0000, Thiede, Christoph wrote: Hi all!
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
http://www.hpi.de/ Compared to overwrite-by-rename, this proposal would double the storage effort. Provided that I understand you correctly, -1 :-)
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
What would you like to do with this backup file? Keep them permanently? As we speak about hundreds-of-megabytes file sizes, I think this could be quite storage extensive ... Also, it messes up your image folder. We already have two files for each image: .image and .changes. No need for even more files, imho. But there may always be some special application areas, of course :)
+1 for making a preference for it :-) However, my personal flavor would be to rule this behavior via the Squeak.ini file (not sure what's the equivalent on other host platforms), so I would prefer to store this preference image-invariant.
VM support: What would be the pros and cons of implementing this in the VM? First, I don't know whether we already support a way to read the Squeak.ini file from within the image (see above)? Second, I *could* imagine (though this is spoken hypothetically) that certain host systems might provide convenient ways for implementing overwrite-by-rename. See my initial mail for my worries about a naive implementation. Again, wouldn't this be an argument for implementing this rather at VM side?
Ad validation: Sounds interesting! How high would be the effort for that? Could you do this from within the VM (it's also a question of performance, I guess)? Wouldn't this double the store time? Maybe it would be a good idea to have a second (VM) preference for toggling validation.
Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Mittwoch, 29. Januar 2020 22:05:07 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-01-29, at 12:24 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
Pretty sure it could be done without VM support, yes. One might even use the OSProcess forking trick to do it, I think.
... it'd be a Preference, I suppose?
Yet another ...
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Kdatlyno does it take to change a lightbulb?? "None. It sounds perfectly OK to them."
On Thu, Jan 30, 2020 at 06:20:01AM -0800, Eliot Miranda wrote:
Hi Dave,
On Jan 30, 2020, at 5:18 AM, David T. Lewis lewis@mail.msen.com wrote:
???My suggestion is to just try some ideas in your own image and see if it's something you want to live with. The intentions are good but I have a feeling that this is the kind of thing where the inintended side effects are worse than the problem to be solved.
+1
In my own experience, I have encountered an IO error while saving the image several times over the years. In every case, the cause has been a file system full condition. A solution that uses more disc space would not have been helpful.
Good point. The snapshot primitive *could* make a conservative estimate of the file size needed (easy; it knows how big the heap is), create a file, write that many zeros (only way to actually commit the disc space), and then overwrite with the real data, but that???s twice the disc traffic.
That's a good idea, and on a unix platform we can use statvfs() to check space availability without adding any disc traffic. To prove out the idea, I implemented it as a primitive in the unix OSProcess plugin so that you can test it like this:
primSpaceFor: byteSize InDirectoryPath: dirPath <primitive: 'primitiveSpaceForByteSizeInDirectoryPath' module: 'UnixOSProcessPlugin'> ^ self primitiveFailed
If you want to give it a try, the primitive is now in the latest UnixOSProcessPlugin in www.squeaksource.com/OSProcessPlugin in VMConstruction-Plugins-OSProcessPlugin-dtl.47, and I merged it into VMConstruction-Plugins-OSProcessPlugin.oscog-dtl.67 for the Cog/Spur VMs.
I also added access from OSProcess is added in OSProcess-dtl.114.
I have not really looked into how best to put this into the VM proper, but we could consider either adding a primitive similar to the one in OSPP, or maybe add a check directly into the image write function (which is currently a macro that we could override).
I also have not looked into how to implement this on Windows. I'm sure there is a way but I have not yet checked. It's likely that statvsf() is available on Windows but I have not looked.
In any case, treat this is a proof of concept to illustrate a way to handle the file system full scenario.
Dave
On 2020-02-01, at 11:48 AM, David T. Lewis lewis@mail.msen.com wrote:
That's a good idea, and on a unix platform we can use statvfs() to check space availability without adding any disc traffic.
The way we used to do this on RISC OS was (is, for the remaining users!) to allocate a file of the required size and it would be filled with 0. That way if you got a success return code you knew for certain (barring fire, flood, bomb or bear attack) that the file could be over-written with your real content. Do any other filing systems actually really definitely allocate space when you ask for it? No idea.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: FR: Flip Record
On 02/02/20 7:57 AM, tim Rowledge wrote:
The way we used to do this on RISC OS was (is, for the remaining users!) to allocate a file of the required size and it would be filled with 0. That way if you got a success return code you knew for certain (barring fire, flood, bomb or bear attack) that the file could be over-written with your real content. Do any other filing systems actually really definitely allocate space when you ask for it? No idea.
Writing twice into the same file will increase wear and tear in SSDs unnecessarily. An image is just an array of bytes, so one of the following techniques could be adopted:
* create dummy files in fixed size units (say 128MB) and write the image as usual. If the image write returns a disk full error, then delete one or more of these dummy files to complete the operation.
* create a two file partitions of max size (say A and B). Alternate writing to these partitions and mark the latest successful write as the real McCoy.
* create a file with two segments each of max size. Alternate writing into these two as in the case above. The header will need a flag to identify which one is latest.
Regards .. Subbu
On 2020-02-02, at 6:03 AM, K K Subbu kksubbu.ml@gmail.com wrote:
On 02/02/20 7:57 AM, tim Rowledge wrote:
The way we used to do this on RISC OS was (is, for the remaining users!) to allocate a file of the required size and it would be filled with 0. That way if you got a success return code you knew for certain (barring fire, flood, bomb or bear attack) that the file could be over-written with your real content. Do any other filing systems actually really definitely allocate space when you ask for it? No idea.
Writing twice into the same file will increase wear and tear in SSDs unnecessarily.
True, but in defence of RISC OS it does predate even the fantasy of SSDs by perhaps a couple of decades :-)
Obviously the key point is actually, definitely, allocating the required space, as opposed to optimistically claiming to have some room and then getting all upset when your application tries to use it.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Carlos Wus does it take to change a lightbulb?” "With an unlimited breeding licence, who needs lightbulbs?"
Just lost the sixth image within a few days due to crazy Windows errors. Squeak does not bear the blame for this, but maybe we could just come up with some simple validation? When saving the image, validate it afterward, and warn the user/do not close the image if the stored .image file is corrupt.
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Sonntag, 2. Februar 2020 20:03:35 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-02-02, at 6:03 AM, K K Subbu kksubbu.ml@gmail.com wrote:
On 02/02/20 7:57 AM, tim Rowledge wrote:
The way we used to do this on RISC OS was (is, for the remaining users!) to allocate a file of the required size and it would be filled with 0. That way if you got a success return code you knew for certain (barring fire, flood, bomb or bear attack) that the file could be over-written with your real content. Do any other filing systems actually really definitely allocate space when you ask for it? No idea.
Writing twice into the same file will increase wear and tear in SSDs unnecessarily.
True, but in defence of RISC OS it does predate even the fantasy of SSDs by perhaps a couple of decades :-)
Obviously the key point is actually, definitely, allocating the required space, as opposed to optimistically claiming to have some room and then getting all upset when your application tries to use it.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Carlos Wus does it take to change a lightbulb?” "With an unlimited breeding licence, who needs lightbulbs?"
Or is there maybe, just maybe some kind of automatic image repair tool? One that does not require in-depth knowledge of the file format internals of .image files? :-)
Best,
Christoph
________________________________ Von: Thiede, Christoph Gesendet: Freitag, 3. September 2021 22:56:57 An: The general-purpose Squeak developers list Betreff: AW: [squeak-dev] Image damaged due to IO error while saving
Just lost the sixth image within a few days due to crazy Windows errors. Squeak does not bear the blame for this, but maybe we could just come up with some simple validation? When saving the image, validate it afterward, and warn the user/do not close the image if the stored .image file is corrupt.
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Sonntag, 2. Februar 2020 20:03:35 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-02-02, at 6:03 AM, K K Subbu kksubbu.ml@gmail.com wrote:
On 02/02/20 7:57 AM, tim Rowledge wrote:
The way we used to do this on RISC OS was (is, for the remaining users!) to allocate a file of the required size and it would be filled with 0. That way if you got a success return code you knew for certain (barring fire, flood, bomb or bear attack) that the file could be over-written with your real content. Do any other filing systems actually really definitely allocate space when you ask for it? No idea.
Writing twice into the same file will increase wear and tear in SSDs unnecessarily.
True, but in defence of RISC OS it does predate even the fantasy of SSDs by perhaps a couple of decades :-)
Obviously the key point is actually, definitely, allocating the required space, as opposed to optimistically claiming to have some room and then getting all upset when your application tries to use it.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Carlos Wus does it take to change a lightbulb?” "With an unlimited breeding licence, who needs lightbulbs?"
On Fri, Sep 03, 2021 at 09:02:07PM +0000, Thiede, Christoph wrote:
Or is there maybe, just maybe some kind of automatic image repair tool? One that does not require in-depth knowledge of the file format internals of .image files? :-)
No, if you truncate the image file, it's junk.
Dave
Truncate? It has still a very usual file size ... But when I try to open it, the VM asks me for another image file, so apparently, it cannot read it. No idea what's going on here ...
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von David T. Lewis lewis@mail.msen.com Gesendet: Samstag, 4. September 2021 00:01:54 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On Fri, Sep 03, 2021 at 09:02:07PM +0000, Thiede, Christoph wrote:
Or is there maybe, just maybe some kind of automatic image repair tool? One that does not require in-depth knowledge of the file format internals of .image files? :-)
No, if you truncate the image file, it's junk.
Dave
I don't know what the failure mode was on Windows (file system full? something else?). Regardless of the underlying failure, if you write the image file and the write does not successfully complete, then you almost certainly have a junk file that cannot be fixed.
But to your other question, could we check for the failure and not exit the image, that's definitely worth looking into.
Take a look at SmalltalkImage>>snapshotEmbeddedPrimitive and SmalltalkImage>>snapshotPrimitive. If the method comments are right, then a nil result would indicate a write faiure. I don't know if the actual primitives in the VM behave this way (if not they could be fixed). But if that is how the primitives behave, then it should be possible to add a check for nil to prevent exiting the image after a failed write.
I do not see any such check in SmalltalkImage>>snapshot:andQuit:withExitCode:embedded: so it would be worth looking at this to see if a nil check could be added to protect against the kind of failure you saw.
Dave
On Fri, Sep 03, 2021 at 10:04:18PM +0000, Thiede, Christoph wrote:
Truncate? It has still a very usual file size ... But when I try to open it, the VM asks me for another image file, so apparently, it cannot read it. No idea what's going on here ...
Best,
Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von David T. Lewis lewis@mail.msen.com Gesendet: Samstag, 4. September 2021 00:01:54 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On Fri, Sep 03, 2021 at 09:02:07PM +0000, Thiede, Christoph wrote:
Or is there maybe, just maybe some kind of automatic image repair tool? One that does not require in-depth knowledge of the file format internals of .image files? :-)
No, if you truncate the image file, it's junk.
Dave
On Fri, Sep 03, 2021 at 06:24:32PM -0400, David T. Lewis wrote:
On Fri, Sep 03, 2021 at 10:04:18PM +0000, Thiede, Christoph wrote:
Truncate? It has still a very usual file size ... But when I try to open it, the VM asks me for another image file, so apparently, it cannot read it. No idea what's going on here ...
Best,
Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von David T. Lewis lewis@mail.msen.com Gesendet: Samstag, 4. September 2021 00:01:54 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On Fri, Sep 03, 2021 at 09:02:07PM +0000, Thiede, Christoph wrote:
Or is there maybe, just maybe some kind of automatic image repair tool? One that does not require in-depth knowledge of the file format internals of .image files? :-)
No, if you truncate the image file, it's junk.
Dave
I don't know what the failure mode was on Windows (file system full? something else?). Regardless of the underlying failure, if you write the image file and the write does not successfully complete, then you almost certainly have a junk file that cannot be fixed.
But to your other question, could we check for the failure and not exit the image, that's definitely worth looking into.
Take a look at SmalltalkImage>>snapshotEmbeddedPrimitive and SmalltalkImage>>snapshotPrimitive. If the method comments are right, then a nil result would indicate a write faiure. I don't know if the actual primitives in the VM behave this way (if not they could be fixed). But if that is how the primitives behave, then it should be possible to add a check for nil to prevent exiting the image after a failed write.
I do not see any such check in SmalltalkImage>>snapshot:andQuit:withExitCode:embedded: so it would be worth looking at this to see if a nil check could be added to protect against the kind of failure you saw.
Actually, I need to retract this. There actually *is* a nil check, and it works as advertised for the case of a Unix VM trying to save the image to a write protected file. So I think we would need to understand the actual write failure that happened on Windows, and see if the Windows VM is handling it in an appropriate way.
Dave
I just had a look at the broken image file. It is three hundred megabytes of zeros, indeed.
I don't know what the failure mode was on Windows (file system full? something else?).
I don't know, too. I can only tell you that my system was so buggy that I could not even start a new cmd window or even restart the system without using the physical reset button. ¯_(ツ)_/¯
As I said, this is nothing to blame our VM for, but yes, if we could detect an error code, this would be helpful, of course.
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von David T. Lewis lewis@mail.msen.com Gesendet: Samstag, 4. September 2021 00:48:37 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On Fri, Sep 03, 2021 at 06:24:32PM -0400, David T. Lewis wrote:
On Fri, Sep 03, 2021 at 10:04:18PM +0000, Thiede, Christoph wrote:
Truncate? It has still a very usual file size ... But when I try to open it, the VM asks me for another image file, so apparently, it cannot read it. No idea what's going on here ...
Best,
Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von David T. Lewis lewis@mail.msen.com Gesendet: Samstag, 4. September 2021 00:01:54 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On Fri, Sep 03, 2021 at 09:02:07PM +0000, Thiede, Christoph wrote:
Or is there maybe, just maybe some kind of automatic image repair tool? One that does not require in-depth knowledge of the file format internals of .image files? :-)
No, if you truncate the image file, it's junk.
Dave
I don't know what the failure mode was on Windows (file system full? something else?). Regardless of the underlying failure, if you write the image file and the write does not successfully complete, then you almost certainly have a junk file that cannot be fixed.
But to your other question, could we check for the failure and not exit the image, that's definitely worth looking into.
Take a look at SmalltalkImage>>snapshotEmbeddedPrimitive and SmalltalkImage>>snapshotPrimitive. If the method comments are right, then a nil result would indicate a write faiure. I don't know if the actual primitives in the VM behave this way (if not they could be fixed). But if that is how the primitives behave, then it should be possible to add a check for nil to prevent exiting the image after a failed write.
I do not see any such check in SmalltalkImage>>snapshot:andQuit:withExitCode:embedded: so it would be worth looking at this to see if a nil check could be added to protect against the kind of failure you saw.
Actually, I need to retract this. There actually *is* a nil check, and it works as advertised for the case of a Unix VM trying to save the image to a write protected file. So I think we would need to understand the actual write failure that happened on Windows, and see if the Windows VM is handling it in an appropriate way.
Dave
Hi Christoph,
On Jan 30, 2020, at 4:13 AM, Thiede, Christoph Christoph.Thiede@student.hpi.uni-potsdam.de wrote:
Hi all!
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
Compared to overwrite-by-rename, this proposal would double the storage effort. Provided that I understand you correctly, -1 :-)
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
If you want a backup, even temporarily, then you can’t avoid needing twice the file storage while the new snapshot is being written. So careful what you wish for. All implementations have this as a consequence, by definition.
What would you like to do with this backup file? Keep them permanently? As we speak about hundreds-of-megabytes file sizes, I think this could be quite storage extensive ... Also, it messes up your image folder. We already have two files for each image: .image and .changes. No need for even more files, imho. But there may always be some special application areas, of course :)
Well, if it stays around then it gets replaced on every save. So one only has one copy per image. One presumably would never rename the backup to save it when creating the next backup. So in fact the operation is
- if saving to an existing file - delete the backup foo.imagebak if it exists - rename foo.image to foo.imagebak - save the image - optionally validate the new image - optionally delete the backup
+1 for making a preference for it :-) However, my personal flavor would be to rule this behavior via the Squeak.ini file (not sure what's the equivalent on other host platforms), so I would prefer to store this preference image-invariant.
VM support: What would be the pros and cons of implementing this in the VM? First, I don't know whether we already support a way to read the Squeak.ini file from within the image (see above)? Second, I *could* imagine (though this is spoken hypothetically) that certain host systems might provide convenient ways for implementing overwrite-by-rename. See my initial mail for my worries about a naive implementation. Again, wouldn't this be an argument for implementing this rather at VM side?
Good questions. I think implementing image side is better. The snapshot primitive is separate from the quit primitive, so if the snapshot primitive succeeds there is time for the image to eg run validation and/or delete the backup before quitting.
This seems to me relayed to the other snapshot bug, which is that we GC in the snapshot primitive. This is completely wrong because it elides finalization actions. Instead we should do a full GC in the image *before* doing the snapshot, allow any finalization actions to complete and then do the snapshot. VW does this correctly.
Ad validation: Sounds interesting! How high would be the effort for that? Could you do this from within the VM (it's also a question of performance, I guess)? Wouldn't this double the store time? Maybe it would be a good idea to have a second (VM) preference for toggling validation.
Validation could (and IMO /should/) be fine via the new image leak checker. This is a cut down vm that only loads an image and applies the leak checker before quitting. To make this runnable from the image eg vis OSProcess. That makes this an optional project because OSProcess is not in the base image.
Best, Christoph Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Mittwoch, 29. Januar 2020 22:05:07 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-01-29, at 12:24 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
Pretty sure it could be done without VM support, yes. One might even use the OSProcess forking trick to do it, I think.
... it'd be a Preference, I suppose?
Yet another ...
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Kdatlyno does it take to change a lightbulb?” "None. It sounds perfectly OK to them."
Hi Eliot,
If you want a backup, even temporarily, then you can’t avoid needing twice the file storage while the new snapshot is being written. So careful what you wish for. All implementations have this as a consequence, by definition.
Actually, I did not want to have a backup :) All I requested was overwrite-by-rename to ensure the atomicity of the snapshot operation. I think there are enough tools out there that provide clever backup mechanism, we do not need to reinvent the wheel here. (Personally, I'm fine with OneDrive, which keeps old versions of all my images around every 15 minutes.)
if saving to an existing file, then
- rename existing file to some backup, eg foo.imagebak
- write new image file foo.image
- delete foo.imagebak
if not saving to an existing file, then
- write new image file foo.image
+1, sounds perfect. You could also consider the following instead: - if saving to an existing file, then - write new image file ~foo.image - rename ~foo.image to foo.image Then foo.image will never be corrupted. Afaik this is the way Chromium or MS Office go, for example.
Best, Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Eliot Miranda eliot.miranda@gmail.com Gesendet: Donnerstag, 30. Januar 2020 15:14:30 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
Hi Christoph,
On Jan 30, 2020, at 4:13 AM, Thiede, Christoph Christoph.Thiede@student.hpi.uni-potsdam.de wrote:
Hi all!
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
http://www.hpi.de/ Compared to overwrite-by-rename, this proposal would double the storage effort. Provided that I understand you correctly, -1 :-)
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
If you want a backup, even temporarily, then you can’t avoid needing twice the file storage while the new snapshot is being written. So careful what you wish for. All implementations have this as a consequence, by definition.
What would you like to do with this backup file? Keep them permanently? As we speak about hundreds-of-megabytes file sizes, I think this could be quite storage extensive ... Also, it messes up your image folder. We already have two files for each image: .image and .changes. No need for even more files, imho. But there may always be some special application areas, of course :)
Well, if it stays around then it gets replaced on every save. So one only has one copy per image. One presumably would never rename the backup to save it when creating the next backup. So in fact the operation is
- if saving to an existing file - delete the backup foo.imagebak if it exists - rename foo.image to foo.imagebak - save the image - optionally validate the new image - optionally delete the backup
+1 for making a preference for it :-) However, my personal flavor would be to rule this behavior via the Squeak.ini file (not sure what's the equivalent on other host platforms), so I would prefer to store this preference image-invariant.
VM support: What would be the pros and cons of implementing this in the VM? First, I don't know whether we already support a way to read the Squeak.ini file from within the image (see above)? Second, I *could* imagine (though this is spoken hypothetically) that certain host systems might provide convenient ways for implementing overwrite-by-rename. See my initial mail for my worries about a naive implementation. Again, wouldn't this be an argument for implementing this rather at VM side?
Good questions. I think implementing image side is better. The snapshot primitive is separate from the quit primitive, so if the snapshot primitive succeeds there is time for the image to eg run validation and/or delete the backup before quitting.
This seems to me relayed to the other snapshot bug, which is that we GC in the snapshot primitive. This is completely wrong because it elides finalization actions. Instead we should do a full GC in the image *before* doing the snapshot, allow any finalization actions to complete and then do the snapshot. VW does this correctly.
Ad validation: Sounds interesting! How high would be the effort for that? Could you do this from within the VM (it's also a question of performance, I guess)? Wouldn't this double the store time? Maybe it would be a good idea to have a second (VM) preference for toggling validation.
Validation could (and IMO /should/) be fine via the new image leak checker. This is a cut down vm that only loads an image and applies the leak checker before quitting. To make this runnable from the image eg vis OSProcess. That makes this an optional project because OSProcess is not in the base image.
Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Mittwoch, 29. Januar 2020 22:05:07 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Image damaged due to IO error while saving
On 2020-01-29, at 12:24 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
Oh, I see, so it's probably something that can be arranged entirely image-side, no VM support needed. Right?
Pretty sure it could be done without VM support, yes. One might even use the OSProcess forking trick to do it, I think.
... it'd be a Preference, I suppose?
Yet another ...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Kdatlyno does it take to change a lightbulb?” "None. It sounds perfectly OK to them."
Hi Christoph, Hi Tony,
On Jan 29, 2020, at 12:10 PM, Tony Garnock-Jones tonyg@leastfixedpoint.com wrote:
That sounds like a great idea.
+1
On configurations where overwrite-by-rename is a problem, perhaps an alternate of "copy the existing image to a *.bak file" would work?
+1. This is IMO a safer and easier implementation path. I would use @rename to a backup” though. So the snapshot file operation is
- if saving to an existing file, then - rename existing file to some backup, eg foo.imagebak - write new image file foo.image - delete foo.imagebak
- if not saving to an existing file, then - write new image file foo.image
Perhaps the image save primitive could respond to a VM command-line switch (or in-image VM parameter?) selecting among three behaviours:
- The current overwrite-in-place, risk-of-corruption behaviour
- Overwrite-by-rename if possible
- Make backup copy before overwrite-in-place
Why would the rename be possible and the save not? Ah, if the file is writable but the directory is not the rename would fail but the write would not, right? But then both copy and rename would fail. So I think we only need to support rename and the snapshot primitive should fail if the directory is not writable.
P.S. volunteers welcomed to do the work...
Regards, Tony
On 1/29/20 6:00 PM, Thiede, Christoph wrote: Hi all,
some months ago, I corrupted my image by accidentally shutting down the host system while saving the image file (many of my images are > 500 MB, so this can take a few seconds even on an SSD). The same can happen due to various other IO/connection issues, so here's an idea: Couldn't we always use overwrite-by-rename when saving the image file? I. e., first the image into a new temporary file and, after saving has completed, replace the original file with that temp file (via mv)? This would ensure the image file's integrity.
A possible disadvantage, though, would be that some filesystems, such as NTFS, associate meta-information with the file identity, which changes when using the overwrite-by-rename approach. Also, technologies such as FileSystemWatcher would be confused for the same reason. However, afaik overwrite-by-rename is a quite common approach, in primary for big and sensitive files.
However, what are your opinions about this topic? :-)
Best,
Christoph
squeak-dev@lists.squeakfoundation.org