Crashes on snapshot with the new compactor

List overview All Threads
Download

newer

older

The Inbox: EToys-cbc.294.mcz

The Inbox: FFI-Tests-cbc.9.mcz

Eliot Miranda

25 Mar 2017 25 Mar '17

9:27 p.m.

Hi All,

a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...

...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...

...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Attachments:

attachment.html (text/html — 4.8 KB)

Show replies by date

Ben Coman

26 Mar 26 Mar

4:41 a.m.

On Sun, Mar 26, 2017 at 4:27 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...

Hi All,
a number of people are being affected by crashes on snapshotting the
image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

Rather than avoid the problem, in which case you'll get less samples, can we temporarily have the snapshot create a second file "my.image.beforeSnapshotGC". so when it crashes, we'll have a great sample for you.

I'm sure we are all keen (and grateful) to get a reliable compactor. The pain is not so much that it crashes, but that the image is corrupted. If its possible/likely that "my.image.beforeSnapshotGC" might be renamed and successfully opened, I'm sure those of use following bleeding edge are capable and will to operate like that, to help bring a faster resolution.

cheers -ben

...

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Ben Coman

4:49 a.m.

On Sun, Mar 26, 2017 at 10:41 AM, Ben Coman btc@openinworld.com wrote:

...

On Sun, Mar 26, 2017 at 4:27 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi All,
a number of people are being affected by crashes on snapshotting the
image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.
Rather than avoid the problem, in which case you'll get less samples, can we temporarily have the snapshot create a second file "my.image.beforeSnapshotGC". so when it crashes, we'll have a great sample for you.

I'm sure we are all keen (and grateful) to get a reliable compactor. The pain is not so much that it crashes, but that the image is corrupted. If its possible/likely that "my.image.beforeSnapshotGC" might be renamed and successfully opened, I'm sure those of use following bleeding edge are capable and will to operate like that, to help bring a faster resolution.

cheers -ben

Another thing (seeing Andrei's post about a crash during a big computation) what would be the performance hit to create a file my.image.beforeCompaction" prior to *every* compaction. The double benefit is: * recoverable for user * good ready to crash sample for you

This could be a good permanent feature enabled by command line or in-Image setting/preference.

cheers -ben

...

...
To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

H. Hirzel

27 Mar 27 Mar

10:12 a.m.

On 3/26/17, Ben Coman btc@openinworld.com wrote:

...

On Sun, Mar 26, 2017 at 10:41 AM, Ben Coman btc@openinworld.com wrote:

...
On Sun, Mar 26, 2017 at 4:27 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi All,
a number of people are being affected by crashes on snapshotting the
image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.
Rather than avoid the problem, in which case you'll get less samples, can we temporarily have the snapshot create a second file "my.image.beforeSnapshotGC". so when it crashes, we'll have a great sample for you.

I'm sure we are all keen (and grateful) to get a reliable compactor. The pain is not so much that it crashes, but that the image is corrupted. If its possible/likely that "my.image.beforeSnapshotGC" might be renamed and successfully opened, I'm sure those of use following bleeding edge are capable and will to operate like that, to help bring a faster resolution.

cheers -ben
Another thing (seeing Andrei's post about a crash during a big computation) what would be the performance hit to create a file my.image.beforeCompaction" prior to *every* compaction. The double benefit is:

recoverable for user

good ready to crash sample for you

This could be a good permanent feature enabled by command line or in-Image setting/preference.

+1 I suggest as well that such an additional image save before compacting is added to the trunk.

...

cheers -ben

...
...
To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Eliot Miranda

29 Mar 29 Mar

4:25 p.m.

Hi Ben,

...

On Mar 25, 2017, at 7:41 PM, Ben Coman btc@openinworld.com wrote:

...
On Sun, Mar 26, 2017 at 4:27 AM, Eliot Miranda eliot.miranda@gmail.com wrote: Hi All,

a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

Rather than avoid the problem, in which case you'll get less samples, can we temporarily have the snapshot create a second file "my.image.beforeSnapshotGC". so when it crashes, we'll have a great sample for you.

I'm sure we are all keen (and grateful) to get a reliable compactor. The pain is not so much that it crashes, but that the image is corrupted. If its possible/likely that "my.image.beforeSnapshotGC" might be renamed and successfully opened, I'm sure those of use following bleeding edge are capable and will to operate like that, to help bring a faster resolution.

This sounds like a good idea but the machinations involved in loading an image make it non-trivial. I'd much rather implement lemming debugging in the real vm. In the simulator the vm is cloned on every GC and the GC is run in the clone, and repeated in the original if it succeeds. In the real VM it would fork and execute the GC in the child, waiting for the exit status.

This approach allows a buggy GC to be repeated as many times as it takes to understand it. And it could be altered to snapshot too, also to a different name if desired.

In any case let's hope the issue is moot :-).

...

cheers -ben

...
To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Ben Coman

17 Apr 17 Apr

9:51 a.m.

On 29 Mar 2017 10:25 PM, "Eliot Miranda" eliot.miranda@gmail.com wrote:

...

Hi Ben,

...
On Mar 25, 2017, at 7:41 PM, Ben Coman btc@openinworld.com wrote:

...
On Sun, Mar 26, 2017 at 4:27 AM, Eliot Miranda eliot.miranda@gmail.com

wrote:

...

...
...
Hi All,

a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is

invoked as

...

...
...
part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single

pass

...

...
...
through the heap, which may not move all the objects that are possible

...

...
...
move. (The amount of objects that can be moved in a single pass is

limited

...

...
...
by available free space.) But on snapshot the compactor makes as may

passes

...

...
...
as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from

Esteban

...

...
...
Lorenzano to test. I am asking anyone else that can provide an image

that

...

...
...
reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a

full

...

...
...
garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more

likely

...

...
...
that the GC during snapshot will do a single compaction pass, since

fewer

...

...
...
objects should be mobile after the single pass compaction in the

explicit

...

...
...
GC.

Rather than avoid the problem, in which case you'll get less samples, can we temporarily have the snapshot create a second file "my.image.beforeSnapshotGC". so when it crashes, we'll have a great sample for you.

I'm sure we are all keen (and grateful) to get a reliable compactor. The pain is not so much that it crashes, but that the image is

corrupted.

...

...
If its possible/likely that "my.image.beforeSnapshotGC" might be renamed and successfully opened, I'm sure those of use following bleeding edge are capable and will to operate like that, to help bring a faster

resolution.

...

This sounds like a good idea but the machinations involved in loading an

image make it non-trivial. I'd much rather implement lemming debugging in the real vm. In the simulator the vm is cloned on every GC and the GC is run in the clone, and repeated in the original if it succeeds. In the real VM it would fork and execute the GC in the child, waiting for the exit status.

Slightly different idea, considering the case of Save&Continuing with potentially very large 64bit images, I was wondering how feasible/ worth while it might be to fork a process to do the save - so that the main process only needs to pause long enough to make a COW clone of the page table.

Cheers -ben

...

This approach allows a buggy GC to be repeated as many times as it takes

to understand it. And it could be altered to snapshot too, also to a different name if desired.

...

In any case let's hope the issue is moot :-).

...
cheers -ben

...
To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded:

embeddedFlag

...

...
...
"Mark the changes file and close all files as part of

#processShutdownList.

...

...
...
If save is true, save the current state of this Smalltalk in the image

file.

...

...
...
If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved

image.

...

...
...
This resume logic checks for a document file to process when starting

up."

...

...
...
| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Nicolas Cellier

8:26 p.m.

2017-04-17 9:51 GMT+02:00 Ben Coman benjamin.t.coman@gmail.com:

...

On 29 Mar 2017 10:25 PM, "Eliot Miranda" eliot.miranda@gmail.com wrote:

...
Hi Ben,

...
On Mar 25, 2017, at 7:41 PM, Ben Coman btc@openinworld.com wrote:

...
On Sun, Mar 26, 2017 at 4:27 AM, Eliot Miranda <

eliot.miranda@gmail.com> wrote:

...
...
...
Hi All,

a number of people are being affected by crashes on snapshotting

the

...
...
...
image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is

invoked as

...
...
...
part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single

pass

...
...
...
through the heap, which may not move all the objects that are

possible to

...
...
...
move. (The amount of objects that can be moved in a single pass is

limited

...
...
...
by available free space.) But on snapshot the compactor makes as may

passes

...
...
...
as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from

Esteban

...
...
...
Lorenzano to test. I am asking anyone else that can provide an image

that

...
...
...
reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing

a full

...
...
...
garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more

likely

...
...
...
that the GC during snapshot will do a single compaction pass, since

fewer

...
...
...
objects should be mobile after the single pass compaction in the

explicit

...
...
...
GC.

Rather than avoid the problem, in which case you'll get less samples, can we temporarily have the snapshot create a second file "my.image.beforeSnapshotGC". so when it crashes, we'll have a great sample for you.

I'm sure we are all keen (and grateful) to get a reliable compactor. The pain is not so much that it crashes, but that the image is

corrupted.

...
...
If its possible/likely that "my.image.beforeSnapshotGC" might be

renamed

...
...
and successfully opened, I'm sure those of use following bleeding edge are capable and will to operate like that, to help bring a faster

resolution.

...
This sounds like a good idea but the machinations involved in loading an

image make it non-trivial. I'd much rather implement lemming debugging in the real vm. In the simulator the vm is cloned on every GC and the GC is run in the clone, and repeated in the original if it succeeds. In the real VM it would fork and execute the GC in the child, waiting for the exit status.

Slightly different idea, considering the case of Save&Continuing with potentially very large 64bit images, I was wondering how feasible/ worth while it might be to fork a process to do the save - so that the main process only needs to pause long enough to make a COW clone of the page table.

Cheers -ben

The fork has another advantage: we can do whatever clean-up before saving (close files, free heap, etc...).

...

...
This approach allows a buggy GC to be repeated as many times as it takes

to understand it. And it could be altered to snapshot too, also to a different name if desired.

...
In any case let's hope the issue is moot :-).

...
cheers -ben

...
...
...
To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded:

embeddedFlag

...
...
...
"Mark the changes file and close all files as part of

#processShutdownList.

...
...
...
If save is true, save the current state of this Smalltalk in the

image file.

...
...
...
If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved

image.

...
...
...
This resume logic checks for a document file to process when starting

up."

...
...
...
| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few

days.

...
...
...
_,,,^..^,,,_ best, Eliot

Juan Vuletich

27 Mar 27 Mar

5:26 a.m.

New subject: [Cuis-dev] Crashes on snapshot with the new compactor

Hi Eliot,

Nobody has reported crashes on image save on Cuis. I never experienced one. So, I guess it is ok to wait for the VM fix, as the extra GC as a workaround doesn't seem needed in Cuis.

Thanks,

On 25/03/2017 05:27 p.m., Eliot Miranda via Cuis-dev wrote:

...

Hi All,
a number of people are being affected by crashes on snapshotting 
the image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit."Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Cuis-dev mailing list Cuis-dev@cuis-smalltalk.org http://cuis-smalltalk.org/mailman/listinfo/cuis-dev_cuis-smalltalk.org

-- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich

Eliot Miranda

28 Mar 28 Mar

4:16 a.m.

Hi All,

I have fixed a bug in the compactor that accounts for the two cases I've analysed and the two fairly repeatable crashes I have at hand (three cases in all). I hope that all those who have been experiencing crashes can start using the latest build asap.

It is fixed in these commits:

Name: VMMaker.oscog-eem.2187 Author: eem Time: 27 March 2017, 3:00:06.676146 pm UUID: 2259d299-65a4-42d0-a01b-4b25f5a89745 Ancestors: VMMaker.oscog-rsf.2186

SpurPlanningCompactor: Fix a big in resetting the free chunk used for the firstUnusedFieldsSpace after non-final pasxses (i.e. on snapshot). The old code didn't check to see if a free chunk was actually found(!!).

and

Branch: refs/heads/Cog Home: https://github.com/OpenSmalltalk/opensmalltalk-vm Commit: 4ceff23323bcd0f2d3d0a4a43c2995f43d09c98a https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/4ceff23323bcd0f2d3d... Author: Eliot Miranda eliot.miranda@gmail.com Date: 2017-03-27 (Mon, 27 Mar 2017)

The bintray files are here: https://bintray.com/opensmalltalk/vm/cog/201703272314 _,,,^..^,,,_ (phone)

...

On Mar 25, 2017, at 1:27 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

Hi All,
a number of people are being affected by crashes on snapshotting the image, the worst possible time for a crash.  There is a bug in the new compactor that unfortunately bites when saving.  The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects.  Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move.  (The amount of objects that can be moved in a single pass is limited by available free space.)  But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible.  Unfortunately there is a bug in this second pass.
Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit. "Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot

Juan Vuletich

17 Apr 17 Apr

3:46 a.m.

New subject: [Cuis-dev] Crashes on snapshot with the new compactor

Thanks Eliot!

On 27/03/2017 11:16 p.m., Eliot Miranda via Cuis-dev wrote:

...

Hi All,
I have fixed a bug in the compactor that accounts for the two 
cases I've analysed and the two fairly repeatable crashes I have at hand (three cases in all). I hope that all those who have been experiencing crashes can start using the latest build asap.

It is fixed in these commits:

Name: VMMaker.oscog-eem.2187 Author: eem Time: 27 March 2017, 3:00:06.676146 pm UUID: 2259d299-65a4-42d0-a01b-4b25f5a89745 Ancestors: VMMaker.oscog-rsf.2186

SpurPlanningCompactor: Fix a big in resetting the free chunk used for the firstUnusedFieldsSpace after non-final pasxses (i.e. on snapshot). The old code didn't check to see if a free chunk was actually found(!!).

and

Branch: refs/heads/Cog Home: https://github.com/OpenSmalltalk/opensmalltalk-vm Commit: 4ceff23323bcd0f2d3d0a4a43c2995f43d09c98a https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/4ceff23323bcd0f2d3d... Author: Eliot Miranda <eliot.miranda@gmail.com mailto:eliot.miranda@gmail.com> Date: 2017-03-27 (Mon, 27 Mar 2017)

The bintray files are here: https://bintray.com/opensmalltalk/vm/cog/201703272314 _,,,^..^,,,_ (phone)

On Mar 25, 2017, at 1:27 PM, Eliot Miranda <eliot.miranda@gmail.com mailto:eliot.miranda@gmail.com> wrote:

...
Hi All,
a number of people are being affected by crashes on snapshotting 
the image, the worst possible time for a crash. There is a bug in the new compactor that unfortunately bites when saving. The compactor is invoked as part of a full garbage collect after the garbage collector has feed unreachable objects. Normally the new compactor makes only a single pass through the heap, which may not move all the objects that are possible to move. (The amount of objects that can be moved in a single pass is limited by available free space.) But on snapshot the compactor makes as may passes as are necessary to slide all movable objects down as far as possible. Unfortunately there is a bug in this second pass.

Fixing this bug is now my priority. I have an example image from Esteban Lorenzano to test. I am asking anyone else that can provide an image that reliably crashes when trying to save it to make the image and changes available to me for testing if possible.

In the mean time one may be able to work around the problem by doing a full garbage collect before snapshot. This should do a GC with a single compaction pass which should not fail, and then make it much more likely that the GC during snapshot will do a single compaction pass, since fewer objects should be mobile after the single pass compaction in the explicit GC.

To do this in Pharo I would put a full gc here:

SessionManager>>snapshot: save andQuit: quit | isImageStarting snapshotResult | ChangesLog default logSnapshot: save andQuit: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

self currentSession stop: quit."Image not usable from here until the session is restarted!" ...

In Squeak I would put a full GC here:

snapshot: save andQuit: quit withExitCode: exitCode embedded: embeddedFlag "Mark the changes file and close all files as part of #processShutdownList. If save is true, save the current state of this Smalltalk in the image file. If quit is true, then exit to the outer OS shell. If exitCode is not nil, then use it as exit code. The latter part of this method runs when resuming a previously saved image. This resume logic checks for a document file to process when starting up."

| resuming msg | Object flushDependents. Object flushEvents.

... Smalltalk processShutDownList: quit.

...
...
SmalltalkImage current primitiveGarbageCollect.

Cursor write show. save ifTrue: [resuming := embeddedFlag ifTrue: [self snapshotEmbeddedPrimitive] ifFalse: [self snapshotPrimitive]] "<-- PC frozen here on image file" ifFalse: [resuming := false].

I do apologise for the bug. I hope it will be fixed within a few days.

_,,,^..^,,,_ best, Eliot
Cuis-dev mailing list Cuis-dev@cuis-smalltalk.org http://cuis-smalltalk.org/mailman/listinfo/cuis-dev_cuis-smalltalk.org

-- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich

2589

Age (days ago)

2612

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

9 comments

6 participants

tags (0)

participants (6)

Ben Coman
Ben Coman
Eliot Miranda
H. Hirzel
Juan Vuletich
Nicolas Cellier