[Vm-dev] Re: RFC: Unix 3.11.3-2116 VM

David Farber dfarber at numenor.com
Tue Sep 1 20:21:02 UTC 2009

On Sep 1, 2009, at 10:12 AM, Andreas Raab wrote:

> David Farber wrote:
>> On Sep 1, 2009, at 9:28 AM, Ian Piumarta wrote:
>>> Is enabling IMAGE_DUMP in sqUnixMain.c sufficient for what you  
>>> need?  We could also have an option that names an image dump  
>>> file, disabling the dump if no name is given but still printing  
>>> the stack.
>> I was looking at this code earlier this year.  I couldn't convince  
>> myself that the resulting image would actually be usable.  If the  
>> VM just dumps the image, then won't you (at the very least) lose  
>> all your file handles including the handle to the changes file?   
>> And if you've lost your handle to the changes file then the image  
>> won't, for any practical purposes be usable.
>> Am I missing something?
> The simulator. It can be used to inspect the contents of an image  
> file regardless of whether you run it or it. Useful for post-mortem  
> analysis.

So how would you use the simulator to do a post-mortem?

Here is what happened to me back at the end of March.  I was running  
a Seaside/Pier site headless [1] on CentOS 5.  For persistency, Pier  
was supposed to snapshot the image after any relevant changes.   
Somewhere, somehow an error arose in the snapshot codepath, so Pier  
stopped snapshotting the image (even though the rest of the app ran  
fine and was accumulating data).  I tried to manually snapshot the  
image (and save precious data) but, because the error was in the  
snapshot codepath, all I managed to do was hang the web interface.   
Out of the box, Seaside doesn't do any error logging [2] and I wasn't  
running a VM that had IMAGE_DUMP or stack-printing enabled[3].  So I  
was stuck with an image that was running (with unsaved data) but  
completely incommunicado.

I was able to core dump the running image [4] and manually  
reconstruct an image file (i.e. what I would have had if my VM had  
IMAGE_DUMP enabled).  I loaded the image into the simulator, but I  
wasn't able to really do anything with it.  Specifically, I didn't  
see any way to debug why the image was failing to snapshot.  I was  
able to recover the data in the image[5].  But I still don't know  
what killed my image.  How could the simulator help me figure out  
what when wrong?


[1]  I wasn't running the image under any kind of remote X setup,  
which seems to be popular amongst Seaside deployers.
[2]  I won't make the mistake of deploying a Seaside app without  
error logging again.
[3]  I won't make the mistake of deploying a Seaside app on a VM  
without stack printing again.
[4]  On Linux, gcore will give you a core dump without terminating  
the process.  A version for OS X is at
[5]  I wrote code that will recover an object tree and write it to a  
ReferenceStream.  At some point I should package and release it.

More information about the Vm-dev mailing list