Reproducible Cog crash from image startup

List overview All Threads
Download

newer

older

VM Maker: VMMaker.oscog-eem.149.mcz

New kid on the block: gcc 4.6.1 &...

Mariano Martinez Peck

26 Feb 2012 26 Feb '12

5:58 p.m.

Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here: https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer.

...

From what I can see in GDB, it crashes in #mapStackPages because it does a

remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

Thanks,

-- Mariano http://marianopeck.wordpress.com

Attachments:

attachment.html (text/html — 1.2 KB)

Show replies by thread

Eliot Miranda

27 Feb 27 Feb

12:51 a.m.

Hi Mariano,

I need the changes file to reproduce this without a notifier stating that the changes file is missing. Could you send me it asap? BTW, it does crash on my Cog. But it will be easier to debug with a changes file. Thx

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:

...

Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here: https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

Thanks,

-- Mariano http://marianopeck.wordpress.com

-- best, Eliot

Eliot Miranda

5:20 a.m.

Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:

...

Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here: https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).

b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.

cheers, Eliot

...

Thanks,

-- Mariano http://marianopeck.wordpress.com

-- best, Eliot

stephane ducasse

9:31 a.m.

...

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).

b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.

Eliot

do you think that it can be due to heavy use of weak announcements that are not garbage collected?

Stef

Mariano Martinez Peck

9:46 a.m.

On Mon, Feb 27, 2012 at 12:51 AM, Eliot Miranda eliot.miranda@gmail.comwrote:

...

Hi Mariano,
 I need the changes file to reproduce this without a notifier stating
that the changes file is missing. Could you send me it asap? BTW, it does crash on my Cog. But it will be easier to debug with a changes file. Thx

Hi Eliot. Even if it seems to discover the cause, just in case, I have updated the .changes: https://gforge.inria.fr/frs/download.php/30284/Marea.104-Crash.1.changes.zip

thanks

...

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:

...
Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here: https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

Thanks,

-- Mariano http://marianopeck.wordpress.com

-- best, Eliot

-- Mariano http://marianopeck.wordpress.com

Mariano Martinez Peck

9:53 a.m.

On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.comwrote:

...

Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:

...
Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here: https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

wow, I never imagine about that.

...

a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).

yes, with vmParameter would be nice, like the external semaphore table.

...

b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.

please! :)

I have check in the image, before reproducing the bug, and it is not that bad:

WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

...

cheers, Eliot

...
Thanks,

-- Mariano http://marianopeck.wordpress.com

-- best, Eliot

-- Mariano http://marianopeck.wordpress.com

Igor Stasenko

9:03 p.m.

On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...

On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here: https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).

yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.

please! :)

I have check in the image, before reproducing the bug, and it is not that bad:

WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

Also, answering Stephane's question: AFAIK, a weak roots table size is not liearly depending on the total number of all weak containers in your image. But i might be wrong. Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry. And is the weak roots table size limit reasonably good? Needless to say, that nobody likes when system hits the wall of hardcoded limits.

-- Best regards, Igor Stasenko.

Eliot Miranda

10:06 p.m.

On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.com wrote:

...

On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com

wrote:

...
...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <

marianopeck@gmail.com> wrote:

...
...
...
Hi. I have faced a VM crash while using Nautilus browser. It took me a

while, but I finally could make a reproducible crash from image startup. You can find the image here:

...
...
...
https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

...
...
...
What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus

developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

...
...
...
while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The

weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

...
...
I found this using aDebug VM with assert enabled (i.e. compiled with

NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

...
...
wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a

start-up flag or an image header constant (with e.g. vmParameter accessors).

...
yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should

probably cause the VM to abort with a useful error message.

...
...
please! :)

I have check in the image, before reproducing the bug, and it is not

that bad:

...
WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is

ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

...
hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

Take a look at the attached. It is taken form the image at a point where an incrementalGC is performed when the weakRootTable has 6000 or more elements. It shows a very deep call stack full of WeakAnnouncementSubscriptions.

...

Also, answering Stephane's question: AFAIK, a weak roots table size is not liearly depending on the total number of all weak containers in your image. But i might be wrong. Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.

So when a GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.

And is the weak roots table size limit reasonably good? Needless to

...

say, that nobody likes when system hits the wall of hardcoded limits.

Hmmm... In VisualWorks, which has a two-space copying generational GC there is no weak root table during incremental GC. Instead the list of weak objects is threaded through the corpses left behind in from space. So at least for some GC designs a weak roots table isn't even needed. What the right solution is for the longer term is I don't know. For example, if a weak roots table is required the VM can keep track of the count of weak container instances and base the table size on the number of instances. This is something I will solve in my new object representation/GC. But for now I think just providing a parameter to determine the maximum size is sufficient.

...

Best regards, Igor Stasenko.

-- cheers, Eliot

Eliot Miranda

10:15 p.m.

let me retry *with* the attachment :(

On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.com wrote:

...

On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com

wrote:

...
...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <

marianopeck@gmail.com> wrote:

...
...
...
Hi. I have faced a VM crash while using Nautilus browser. It took me a

while, but I finally could make a reproducible crash from image startup. You can find the image here:

...
...
...
https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

...
...
...
What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus

developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

...
...
...
while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The

weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

...
...
I found this using aDebug VM with assert enabled (i.e. compiled with

NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

...
...
wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a

start-up flag or an image header constant (with e.g. vmParameter accessors).

...
yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should

probably cause the VM to abort with a useful error message.

...
...
please! :)

I have check in the image, before reproducing the bug, and it is not

that bad:

...
WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is

ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

...
hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

...

Also, answering Stephane's question: AFAIK, a weak roots table size is not liearly depending on the total number of all weak containers in your image.

The number of weak containers does define an upper bound on the size of the table. It doesn't necessarily correlate to how many containers are encountered in an incremental GC.

But i might be wrong.

...

Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.

So when an incremental GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.

But in reading the code more carefully I notice that the weak roots table is not used during a full GC. Instead, during a fullGC nilling is done as each weak container is encountered. I don't understand how this works yet. Anyone care to explain?

...

And is the weak roots table size limit reasonably good? Needless to say, that nobody likes when system hits the wall of hardcoded limits.

...

-- Best regards, Igor Stasenko.

-- best, Eliot

Eliot Miranda

10:32 p.m.

ignore this. there's a repost with corrections and the attachments on its way. but its so large (800k stack trace) it awaits moderator approval...

On Mon, Feb 27, 2012 at 1:06 PM, Eliot Miranda eliot.miranda@gmail.comwrote:

...

On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.comwrote:

...
On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com

wrote:

...
...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <

marianopeck@gmail.com> wrote:

...
...
...
Hi. I have faced a VM crash while using Nautilus browser. It took me

a while, but I finally could make a reproducible crash from image startup. You can find the image here:

...
...
...
https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

...
...
...
What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus

developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

...
...
...
while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The

weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

...
...
I found this using aDebug VM with assert enabled (i.e. compiled with

NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

...
...
wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a

start-up flag or an image header constant (with e.g. vmParameter accessors).

...
yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should

probably cause the VM to abort with a useful error message.

...
...
please! :)

I have check in the image, before reproducing the bug, and it is not

that bad:

...
WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is

ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

...
hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

Take a look at the attached. It is taken form the image at a point where an incrementalGC is performed when the weakRootTable has 6000 or more elements. It shows a very deep call stack full of WeakAnnouncementSubscriptions.

...
Also, answering Stephane's question: AFAIK, a weak roots table size is not liearly depending on the total number of all weak containers in your image. But i might be wrong. Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.

So when a GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.

And is the weak roots table size limit reasonably good? Needless to

...
say, that nobody likes when system hits the wall of hardcoded limits.

Hmmm... In VisualWorks, which has a two-space copying generational GC there is no weak root table during incremental GC. Instead the list of weak objects is threaded through the corpses left behind in from space. So at least for some GC designs a weak roots table isn't even needed. What the right solution is for the longer term is I don't know. For example, if a weak roots table is required the VM can keep track of the count of weak container instances and base the table size on the number of instances. This is something I will solve in my new object representation/GC. But for now I think just providing a parameter to determine the maximum size is sufficient.

--

...
Best regards, Igor Stasenko.

-- cheers, Eliot

-- best, Eliot

Eliot Miranda

11:25 p.m.

and in fact the issue is an infinite recursion in Nautilus class>groupsManager:

0xbff5ead0 M Nautilus class>groupsManager 363174868: a(n) Nautilus class 0xbff5eae8 M Nautilus>groupsManager 397856104: a(n) Nautilus 0xbff5eb00 M NautilusUI(AbstractNautilusUI)>groupsManager 397856244: a(n) NautilusUI 0xbff5eb1c M NautilusUI(AbstractNautilusUI)>aGroupHasBeenAdded: 397856244: a(n) NautilusUI 0xbff5eb38 M WeakMessageSend>value: 397858000: a(n) WeakMessageSend 0xbff5eb54 M WeakMessageSend>cull: 397858000: a(n) WeakMessageSend 0xbff5eb70 M WeakMessageSend>cull:cull: 397858000: a(n) WeakMessageSend 0xbff5eb94 M [] in WeakAnnouncementSubscription>deliver: 397858036: a(n) WeakAnnouncementSubscription 0xbff5ebb0 M BlockClosure>on:do: 402448660: a(n) BlockClosure 0xbff5ebd0 M BlockClosure>on:fork: 402448660: a(n) BlockClosure 0xbff5ebf0 M WeakAnnouncementSubscription>deliver: 397858036: a(n) WeakAnnouncementSubscription 0xbff5ec14 M [] in SubscriptionRegistry>deliver:to: 363283080: a(n) SubscriptionRegistry 0xbff5ec34 M BlockClosure>ifCurtailed: 402448516: a(n) BlockClosure 0xbff5ec58 M [] in SubscriptionRegistry>deliver:to: 363283080: a(n) SubscriptionRegistry 0xbff5ec78 M OrderedCollection>do: 402427332: a(n) OrderedCollection 0xbff5ec94 M SubscriptionRegistry>deliver:to: 363283080: a(n) SubscriptionRegistry 0xbff5ecb8 M SubscriptionRegistry>deliver: 363283080: a(n) SubscriptionRegistry 0xbff5ecd8 M Announcer>announce: 363283068: a(n) Announcer 0xbff5ecf8 M GroupsHolder>addADynamicClassGroupSilentlyNamed:block: 402407104: a(n) GroupsHolder 0xbff5ed1c M Nautilus class>buildGroupManager 363174868: a(n) Nautilus class 0xbff5ed34 M Nautilus class>groupsManager 363174868: a(n) Nautilus class 0xbff5ed4c M Nautilus>groupsManager 397856104: a(n) Nautilus

On Mon, Feb 27, 2012 at 1:15 PM, Eliot Miranda eliot.miranda@gmail.comwrote:

...

let me retry *with* the attachment :(

On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.comwrote:

...
On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com

wrote:

...
...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <

marianopeck@gmail.com> wrote:

...
...
...
Hi. I have faced a VM crash while using Nautilus browser. It took me

a while, but I finally could make a reproducible crash from image startup. You can find the image here:

...
...
...
https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

...
...
...
What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus

developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

...
...
...
while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The

weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

...
...
I found this using aDebug VM with assert enabled (i.e. compiled with

NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

...
...
wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a

start-up flag or an image header constant (with e.g. vmParameter accessors).

...
yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should

probably cause the VM to abort with a useful error message.

...
...
please! :)

I have check in the image, before reproducing the bug, and it is not

that bad:

...
WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is

ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

...
hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

Take a look at the attached. It is taken form the image at a point where an incrementalGC is performed when the weakRootTable has 6000 or more elements. It shows a very deep call stack full of WeakAnnouncementSubscriptions.

...
Also, answering Stephane's question: AFAIK, a weak roots table size is

not liearly depending on the total number of all weak containers in your image.

The number of weak containers does define an upper bound on the size of the table. It doesn't necessarily correlate to how many containers are encountered in an incremental GC.

But i might be wrong.

...
Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.

So when an incremental GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.

But in reading the code more carefully I notice that the weak roots table is not used during a full GC. Instead, during a fullGC nilling is done as each weak container is encountered. I don't understand how this works yet. Anyone care to explain?

...
And is the weak roots table size limit reasonably good? Needless to say, that nobody likes when system hits the wall of hardcoded limits.

Hmmm... In VisualWorks, which has a two-space copying generational GC there is no weak root table during incremental GC. Instead the list of weak objects is threaded through the corpses left behind in from space. So at least for some GC designs a weak roots table isn't even needed. I will copy this scheme in my new object representation/GC. But for now I think just providing a parameter to determine the maximum size is sufficient.

...
-- Best regards, Igor Stasenko.

-- best, Eliot

-- best, Eliot

Igor Stasenko

28 Feb 28 Feb

1:13 p.m.

On 27 February 2012 22:15, Eliot Miranda eliot.miranda@gmail.com wrote:

...

let me retry *with* the attachment :(

On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.com wrote:

...
On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here:

https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).

yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.

please! :)

I have check in the image, before reproducing the bug, and it is not that bad:

WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

Take a look at the attached. It is taken form the image at a point where an incrementalGC is performed when the weakRootTable has 6000 or more elements. It shows a very deep call stack full of WeakAnnouncementSubscriptions.

...
Also, answering Stephane's question: AFAIK, a weak roots table size is

not liearly depending on the total number of all weak containers in your image.

The number of weak containers does define an upper bound on the size of the table. It doesn't necessarily correlate to how many containers are encountered in an incremental GC.

...
But i might be wrong. Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.

So when an incremental GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.

But in reading the code more carefully I notice that the weak roots table is not used during a full GC. Instead, during a fullGC nilling is done as each weak container is encountered. I don't understand how this works yet. Anyone care to explain?

yes, i noticed that too.

Weak roots are used only during incremental GC but not full GC, which makes sense:

incrementalGC ... weakRootCount := 0. statSweepCount := statMarkCount := statMkFwdCount := statCompMoveCount := 0. self markPhase: false. self assert: weakRootCount <= WeakRootTableSize. 1 to: weakRootCount do: [:i| self finalizeReference: (weakRoots at: i)]. ...

fullGC ... statSweepCount := statMarkCount := statMkFwdCount := statCompMoveCount := 0. self clearRootsTable. youngStart := self startOfMemory. "process all of memory" self markPhase: true. "Sweep phase returns the number of survivors. Use the up-to-date version instead the one from startup." totalObjectCount := self sweepPhaseForFullGC. ...

The bad thing is that , #markAndTrace: is used in both #incrementalGC and #fullGC and calls: self lastPointerOf: oop recordWeakRoot: true.

and that last 'true' is evil, because it will populate weak roots into table even during full GC which means that a total number of weak containers in a whole image should not be more than (err.. how much you said?) otherwise *!crash!*. Also, note that #fullGC method even doesn't cares to reset weakRootCount to zero, which means that it can be any value, which was the last number of weak containers found during last incremental GC , so its enough to have (1+WeakRootTableSize/2) weak containers discovered by incremental GC, for full GC to start corrupting a memory not saying about higher numbers.

To fix that , a #fullGC should either use different #markAndTrace: method, which won't records weak roots, or actually anything, which will lead to following change:

self lastPointerOf: oop recordWeakRoot: true. to be: self lastPointerOf: oop recordWeakRoot: (fullGC not).

i think it is easy to imagine an image with lots of weak containers, while only few of them (a reasonable number ;) can be discovered during single incremental GC.

Another thing, which i would do is to trigger a full GC, if during incremental GC a weak roots table overflows. But its hard to tell, how easy to implement abortion of marking phase due to table overflow..

...

...
And is the weak roots table size limit reasonably good? Needless to say, that nobody likes when system hits the wall of hardcoded limits.

Hmmm... In VisualWorks, which has a two-space copying generational GC there is no weak root table during incremental GC. Instead the list of weak objects is threaded through the corpses left behind in from space. So at least for some GC designs a weak roots table isn't even needed. I will copy this scheme in my new object representation/GC. But for now I think just providing a parameter to determine the maximum size is sufficient.

i actually wonder do we really need this structure at all. an incremental GC works similarly to full one, except that it operates on smaller heap size. so, this piece of code: 1 to: weakRootCount do: [:i| self finalizeReference: (weakRoots at: i)].

can be replaced by heap-walking procedure from:

youngStart to: memoryEnd

finding all weak containers and checking them. (we can even leave the counter, so if count = 0, you don't do heap walk, and if it >0 , you decrement it by 1 when discovering weak container object, so heap walk will terminate once counter will reach 0).

...

-- best, Eliot

-- Best regards, Igor Stasenko.

Igor Stasenko

1:51 p.m.

On 28 February 2012 13:13, Igor Stasenko siguctua@gmail.com wrote:

...

On 27 February 2012 22:15, Eliot Miranda eliot.miranda@gmail.com wrote:

...
let me retry *with* the attachment :(

On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.com wrote:

...
On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Mariano,

On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck marianopeck@gmail.com wrote:

...
Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here:

https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip

What the image is running at startup that causes the crash is:

| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.

If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)

while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }

Any ideas?

The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.

I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.

wow, I never imagine about that.

...
a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).

yes, with vmParameter would be nice, like the external semaphore table.

...
b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.

please! :)

I have check in the image, before reproducing the bug, and it is not that bad:

WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538

So...maybe when I do the stuff that reproduces the crash there is ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?

hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.

Take a look at the attached. It is taken form the image at a point where an incrementalGC is performed when the weakRootTable has 6000 or more elements. It shows a very deep call stack full of WeakAnnouncementSubscriptions.

...
Also, answering Stephane's question: AFAIK, a weak roots table size is

not liearly depending on the total number of all weak containers in your image.

The number of weak containers does define an upper bound on the size of the table. It doesn't necessarily correlate to how many containers are encountered in an incremental GC.

...
But i might be wrong. Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.

So when an incremental GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.

But in reading the code more carefully I notice that the weak roots table is not used during a full GC. Instead, during a fullGC nilling is done as each weak container is encountered. I don't understand how this works yet. Anyone care to explain?

yes, i noticed that too.

Weak roots are used only during incremental GC but not full GC, which makes sense:

incrementalGC ... weakRootCount := 0. statSweepCount := statMarkCount := statMkFwdCount := statCompMoveCount := 0. self markPhase: false. self assert: weakRootCount <= WeakRootTableSize. 1 to: weakRootCount do: [:i| self finalizeReference: (weakRoots at: i)]. ...

fullGC ... statSweepCount := statMarkCount := statMkFwdCount := statCompMoveCount := 0. self clearRootsTable. youngStart := self startOfMemory. "process all of memory" self markPhase: true. "Sweep phase returns the number of survivors. Use the up-to-date version instead the one from startup." totalObjectCount := self sweepPhaseForFullGC. ...

The bad thing is that , #markAndTrace: is used in both #incrementalGC and #fullGC and calls: self lastPointerOf: oop recordWeakRoot: true.

and that last 'true' is evil, because it will populate weak roots into table even during full GC which means that a total number of weak containers in a whole image should not be more than (err.. how much you said?) otherwise *!crash!*. Also, note that #fullGC method even doesn't cares to reset weakRootCount to zero, which means that it can be any value, which was the last number of weak containers found during last incremental GC , so its enough to have (1+WeakRootTableSize/2) weak containers discovered by incremental GC, for full GC to start corrupting a memory not saying about higher numbers.

To fix that , a #fullGC should either use different #markAndTrace: method, which won't records weak roots, or actually anything, which will lead to following change:

self lastPointerOf: oop recordWeakRoot: true. to be: self lastPointerOf: oop recordWeakRoot: (fullGC not).

i think it is easy to imagine an image with lots of weak containers, while only few of them (a reasonable number ;) can be discovered during single incremental GC.

Another thing, which i would do is to trigger a full GC, if during incremental GC a weak roots table overflows. But its hard to tell, how easy to implement abortion of marking phase due to table overflow..

...
...
And is the weak roots table size limit reasonably good? Needless to say, that nobody likes when system hits the wall of hardcoded limits.

Hmmm... In VisualWorks, which has a two-space copying generational GC there is no weak root table during incremental GC. Instead the list of weak objects is threaded through the corpses left behind in from space. So at least for some GC designs a weak roots table isn't even needed. I will copy this scheme in my new object representation/GC. But for now I think just providing a parameter to determine the maximum size is sufficient.

i actually wonder do we really need this structure at all. an incremental GC works similarly to full one, except that it operates on smaller heap size. so, this piece of code: 1 to: weakRootCount do: [:i| self finalizeReference: (weakRoots at: i)].

can be replaced by heap-walking procedure from:

youngStart to: memoryEnd

finding all weak containers and checking them. (we can even leave the counter, so if count = 0, you don't do heap walk, and if it >0 , you decrement it by 1 when discovering weak container object, so heap walk will terminate once counter will reach 0).

A dirty fix to this would be directly in :

lastPointerOf: oop recordWeakRoot: recordWeakRoot "<Boolean>" .... [recordWeakRoot ifTrue: ["And remember as weak root" weakRootCount := weakRootCount + 1.

( weakRootCount <= WeakRootTableSize) ifTrue: [ weakRoots at: weakRootCount put: oop ] ifFalse: [ self isFullGC ifFalse: [ self handleWeakRootsTableOverflow ] "and just ignore if full gc" ] ].

now if we put:

weakRootCount := WeakRootTableSize + 1.

in #fullGC method , this will result in ignoring weak roots during full gc mark phase.

ohh ... well, since i'm using #isFullGC it actually makes sense to just put:

(recordWeakRoot and: [ isFullGC not ])

to get bug busted. Just not sure, if extra memory read (checking full gc flag ) will impact the performance that much..

actually for performance, its better to actually 'call' #markAndTrace: with right argument, then it will be inlined and compiler will produce no-op there.

-- Best regards, Igor Stasenko.

Igor Stasenko

6:26 p.m.

investigating a bit further.. actually #markAndTrace: is the only sender of self lastPointerOf: oop recordWeakRoot: true.

the another sender of it is #startObj, inlined into markAndTrace: itself, but with false argument. So, it means that only "roots" are affected by this bug, an entry points where VM using this method to start tracing objects graph. The bug uncovers itself only in cases like with Nautilus, which having too deep (but could be finite) stack, with references to weak objects on it, because #markAndTraceStackPage: using #markAndTrace: for all oops on stack, which contributes to weak roots table.

I suspecting that markAndTraceStackPage: don't needs to record roots, because stack page , despite it is not a regular object oh heap actually acts as a root, and won't be GCed, because it is already reachable: (self assert: (stackPages isFree: thePage) not.)

which means that recording every weak object found on stack as weak root probably an overkill (you don't do it for regular ones ,isnt?). this is easy to modify, by adding extra keyword to #markAndTrace: passing whether it should record the incoming oop as weak root or not, instead of always assume "true" Because indeed, the weak roots table size is calculated by WeakRootTableSize := RootTableSize + RemapBufferSize + 100.

but it doesn't takes into account an arbitrary depth of stack :)

Here the code to crash VM (attached):

It crashes my VM stably at depth ~ 1450

CrashMe new crashMe: 1450

(as i said before , you need only a half of weak roots table to get things overflowing... WeakRootTableSize 2625

-- Best regards, Igor Stasenko.

4464

Age (days ago)

4466

Last active (days ago)

vm-dev@lists.squeakfoundation.org

13 comments

4 participants

tags (0)

participants (4)

Eliot Miranda
Igor Stasenko
Mariano Martinez Peck
stephane ducasse