[Vm-dev] Re: RFC: Unix 3.11.3-2116 VM
dfarber at numenor.com
Tue Sep 1 20:21:02 UTC 2009
On Sep 1, 2009, at 10:12 AM, Andreas Raab wrote:
> David Farber wrote:
>> On Sep 1, 2009, at 9:28 AM, Ian Piumarta wrote:
>>> Is enabling IMAGE_DUMP in sqUnixMain.c sufficient for what you
>>> need? We could also have an option that names an image dump
>>> file, disabling the dump if no name is given but still printing
>>> the stack.
>> I was looking at this code earlier this year. I couldn't convince
>> myself that the resulting image would actually be usable. If the
>> VM just dumps the image, then won't you (at the very least) lose
>> all your file handles including the handle to the changes file?
>> And if you've lost your handle to the changes file then the image
>> won't, for any practical purposes be usable.
>> Am I missing something?
> The simulator. It can be used to inspect the contents of an image
> file regardless of whether you run it or it. Useful for post-mortem
So how would you use the simulator to do a post-mortem?
Here is what happened to me back at the end of March. I was running
a Seaside/Pier site headless  on CentOS 5. For persistency, Pier
was supposed to snapshot the image after any relevant changes.
Somewhere, somehow an error arose in the snapshot codepath, so Pier
stopped snapshotting the image (even though the rest of the app ran
fine and was accumulating data). I tried to manually snapshot the
image (and save precious data) but, because the error was in the
snapshot codepath, all I managed to do was hang the web interface.
Out of the box, Seaside doesn't do any error logging  and I wasn't
running a VM that had IMAGE_DUMP or stack-printing enabled. So I
was stuck with an image that was running (with unsaved data) but
I was able to core dump the running image  and manually
reconstruct an image file (i.e. what I would have had if my VM had
IMAGE_DUMP enabled). I loaded the image into the simulator, but I
wasn't able to really do anything with it. Specifically, I didn't
see any way to debug why the image was failing to snapshot. I was
able to recover the data in the image. But I still don't know
what killed my image. How could the simulator help me figure out
what when wrong?
 I wasn't running the image under any kind of remote X setup,
which seems to be popular amongst Seaside deployers.
 I won't make the mistake of deploying a Seaside app without
error logging again.
 I won't make the mistake of deploying a Seaside app on a VM
without stack printing again.
 On Linux, gcore will give you a core dump without terminating
the process. A version for OS X is at
 I wrote code that will recover an object tree and write it to a
ReferenceStream. At some point I should package and release it.
More information about the Vm-dev