[Vm-dev] Reproducible Cog crash from image startup

Eliot Miranda eliot.miranda at gmail.com
Mon Feb 27 22:25:39 UTC 2012


and in fact the issue is an infinite recursion in Nautilus
class>groupsManager:

0xbff5ead0 M Nautilus class>groupsManager 363174868: a(n) Nautilus class
0xbff5eae8 M Nautilus>groupsManager 397856104: a(n) Nautilus
0xbff5eb00 M NautilusUI(AbstractNautilusUI)>groupsManager 397856244: a(n)
NautilusUI
0xbff5eb1c M NautilusUI(AbstractNautilusUI)>aGroupHasBeenAdded: 397856244:
a(n) NautilusUI
0xbff5eb38 M WeakMessageSend>value: 397858000: a(n) WeakMessageSend
0xbff5eb54 M WeakMessageSend>cull: 397858000: a(n) WeakMessageSend
0xbff5eb70 M WeakMessageSend>cull:cull: 397858000: a(n) WeakMessageSend
0xbff5eb94 M [] in WeakAnnouncementSubscription>deliver: 397858036: a(n)
WeakAnnouncementSubscription
0xbff5ebb0 M BlockClosure>on:do: 402448660: a(n) BlockClosure
0xbff5ebd0 M BlockClosure>on:fork: 402448660: a(n) BlockClosure
0xbff5ebf0 M WeakAnnouncementSubscription>deliver: 397858036: a(n)
WeakAnnouncementSubscription
0xbff5ec14 M [] in SubscriptionRegistry>deliver:to: 363283080: a(n)
SubscriptionRegistry
0xbff5ec34 M BlockClosure>ifCurtailed: 402448516: a(n) BlockClosure
0xbff5ec58 M [] in SubscriptionRegistry>deliver:to: 363283080: a(n)
SubscriptionRegistry
0xbff5ec78 M OrderedCollection>do: 402427332: a(n) OrderedCollection
0xbff5ec94 M SubscriptionRegistry>deliver:to: 363283080: a(n)
SubscriptionRegistry
0xbff5ecb8 M SubscriptionRegistry>deliver: 363283080: a(n)
SubscriptionRegistry
0xbff5ecd8 M Announcer>announce: 363283068: a(n) Announcer
0xbff5ecf8 M GroupsHolder>addADynamicClassGroupSilentlyNamed:block:
402407104: a(n) GroupsHolder
0xbff5ed1c M Nautilus class>buildGroupManager 363174868: a(n) Nautilus class
0xbff5ed34 M Nautilus class>groupsManager 363174868: a(n) Nautilus class
0xbff5ed4c M Nautilus>groupsManager 397856104: a(n) Nautilus


On Mon, Feb 27, 2012 at 1:15 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:

> let me retry *with* the attachment :(
>
> On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko <siguctua at gmail.com>wrote:
>
>> On 27 February 2012 10:53, Mariano Martinez Peck <marianopeck at gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>> >>
>> >> Hi Mariano,
>> >>
>> >> On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <
>> marianopeck at gmail.com> wrote:
>> >>>
>> >>>
>> >>> Hi. I have faced a VM crash while using Nautilus browser. It took me
>> a while, but I finally could make a reproducible crash from image startup.
>> You can find the image here:
>> >>>
>> https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip
>> >>>
>> >>> What the image is running at startup that causes the crash is:
>> >>>
>> >>> | nautilus model ui|
>> >>> Nautilus instVarNamed: 'groups' put: nil.
>> >>> model := Nautilus open.
>> >>> ui := model ui.
>> >>> ui groupsButtonAction.
>> >>>
>> >>> If you need more about the "domain", we can ask Ben, Nautilus
>> developer.  From what I can see in GDB, it crashes in #mapStackPages
>> because it does a remap to an OOP that is 0 (zero)
>> >>>
>> >>> while (theSP <= frameRcvrOffset) {
>> >>>                     oop = longAt(theSP);
>> >>>                     if (!((oop & 1))) {
>> >>>                         longAtput(theSP, remap(oop));
>> >>>                     }
>> >>>                     theSP += BytesPerWord;
>> >>>                 }
>> >>>
>> >>>
>> >>> Any ideas?
>> >>
>> >>
>> >> The image overflows the weakRoots table in scanning stack pages.  The
>> weakRoots table registers weak objects for scanning at the end of a GC.  It
>> is, unfortunately, fixed size (~2600 entries), and there are lots of
>> WeakMessageSends and WeakAnnouncementSubscriptions on the stack.
>> >>
>> >> I found this using aDebug VM with assert enabled (i.e. compiled with
>> NDEBUG /not/ defined).  I increased the table size to 3000 then 6000 before
>> finding it no longer crashed with a weakRoots  table size of 12000.
>> >>
>> >
>> > wow, I never imagine about that.
>> >
>> >>
>> >> a) Looks like weakRoots' size should be configurable either via a
>> start-up flag or an image header constant (with e.g. vmParameter accessors).
>> >
>> >
>> > yes, with vmParameter would be nice, like the external semaphore table.
>> >
>> >>
>> >>
>> >> b) overflowing the weakRoots table (and possibly other tables) should
>> probably cause the VM to abort with a useful error message.
>> >>
>> >
>> > please!  :)
>> >
>> > I have check in the image, before reproducing the bug, and it is not
>> that bad:
>> >
>> > WeakMessageSend instanceCount 755.
>> > WeakAnnouncementSubscription instanceCount 538
>> >
>> > So...maybe when I do the stuff that reproduces the crash there is
>> ANOTHER bug (say a loop for example), that cause to have much more
>> instances of those weak stuff?
>> >
>> >
>> hmm.. i hardly believe that UI needs such amount of weak messages to
>> wire the stuff.. but it is hard to tell, since i'm not an author.
>>
>
>
> Take a look at the attached.  It is taken form the image at a point where
> an incrementalGC is performed when the weakRootTable has 6000 or more
> elements.  It shows a very deep call stack full
> of WeakAnnouncementSubscriptions.
>
>
>
>> Also, answering Stephane's question: AFAIK, a weak roots table size is
>>
>> not liearly depending on the total number of all weak containers in
>> your image.
>>
>
> The number of weak containers does define an upper bound on the size of
> the table.  It doesn't necessarily correlate to how many containers are
> encountered in an incremental GC.
>
>
> But i might be wrong.
>> Eliot, can you please explain how this weak roots table populated and
>> what triggers addition of new element(s) to it, and freeing the entry.
>>
>
>
> So when an incremental GC is performed, any weak collections encountered
> must be scanned later, after the mark phase of non-objects have completed,
> so that the GC can discover which elements of weak collections are unmarked
> and nil these collections.  So in markAndTrace any encountered weak objects
> get added as "roots" to the weakRootsTable.  Later (either in incrementalGC
> or fullGC) the weak table is processed and unmarked referents in the weak
> arrays in the weak table are nilled.  Hence the weak table fills during the
> mark phase and is emptied in the nilling phase.
>
> But in reading the code more carefully I notice that the weak roots table
> is not used during a full GC.  Instead, during a fullGC nilling is done as
> each weak container is encountered.  I don't understand how this works yet.
>  Anyone care to explain?
>
>
>> And is the weak roots table size limit reasonably good? Needless to
>> say, that nobody likes when system hits the wall of hardcoded limits.
>>
>
>
> Hmmm... In VisualWorks, which has a two-space copying generational GC
> there is no weak root table during incremental GC.  Instead the list of
> weak objects is threaded through the corpses left behind in from space.  So
> at least for some GC designs a weak roots table isn't even needed.  I will
> copy this scheme in my new object representation/GC.  But for now I think
> just providing a parameter to determine the maximum size is sufficient.
>
>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>>
>
>
>
> --
> best,
> Eliot
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120227/d0292c0d/attachment.htm


More information about the Vm-dev mailing list