[Vm-dev] Crashes on snapshot with the new compactor

Sat Mar 25 20:27:52 UTC 2017

Hi All,

    a number of people are being affected by crashes on snapshotting the
image, the worst possible time for a crash.  There is a bug in the new
compactor that unfortunately bites when saving.  The compactor is invoked
as part of a full garbage collect after the garbage collector has feed
unreachable objects.  Normally the new compactor makes only a single pass
through the heap, which may not move all the objects that are possible to
move.  (The amount of objects that can be moved in a single pass is limited
by available free space.)  But on snapshot the compactor makes as may
passes as are necessary to slide all movable objects down as far as
possible.  Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority.  I have an example image from Esteban
Lorenzano to test.  I am asking anyone else that can provide an image that
reliably crashes when trying to save it to make the image and changes
available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full
garbage collect before snapshot.  This should do a GC with a single
compaction pass which should not fail, and then make it much more likely
that the GC during snapshot will do a single compaction pass, since fewer
objects should be mobile after the single pass compaction in the explicit
GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit
| isImageStarting snapshotResult |
ChangesLog default logSnapshot: save andQuit: quit.

>> SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the
session is restarted!"
...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag
"Mark the changes file and close all files as part of #processShutdownList.
If save is true, save the current state of this Smalltalk in the image file.
If quit is true, then exit to the outer OS shell.
If exitCode is not nil, then use it as exit code.
The latter part of this method runs when resuming a previously saved image.
This resume logic checks for a document file to process when starting up."

| resuming msg |
Object flushDependents.
Object flushEvents.

...
Smalltalk processShutDownList: quit.
>> SmalltalkImage current primitiveGarbageCollect.
Cursor write show.
save ifTrue: [resuming := embeddedFlag
ifTrue: [self snapshotEmbeddedPrimitive]
ifFalse: [self snapshotPrimitive]]  "<-- PC frozen here on image file"
ifFalse: [resuming := false].

I do apologise for the bug.  I hope it will be fixed within a few days.

_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170325/b9122f78/attachment.html>