[Vm-dev] Reproducible Cog crash from image startup

Eliot Miranda eliot.miranda at gmail.com
Mon Feb 27 21:06:29 UTC 2012


On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko <siguctua at gmail.com> wrote:

> On 27 February 2012 10:53, Mariano Martinez Peck <marianopeck at gmail.com>
> wrote:
> >
> >
> >
> > On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> >>
> >> Hi Mariano,
> >>
> >> On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <
> marianopeck at gmail.com> wrote:
> >>>
> >>>
> >>> Hi. I have faced a VM crash while using Nautilus browser. It took me a
> while, but I finally could make a reproducible crash from image startup.
> You can find the image here:
> >>>
> https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip
> >>>
> >>> What the image is running at startup that causes the crash is:
> >>>
> >>> | nautilus model ui|
> >>> Nautilus instVarNamed: 'groups' put: nil.
> >>> model := Nautilus open.
> >>> ui := model ui.
> >>> ui groupsButtonAction.
> >>>
> >>> If you need more about the "domain", we can ask Ben, Nautilus
> developer.  From what I can see in GDB, it crashes in #mapStackPages
> because it does a remap to an OOP that is 0 (zero)
> >>>
> >>> while (theSP <= frameRcvrOffset) {
> >>>                     oop = longAt(theSP);
> >>>                     if (!((oop & 1))) {
> >>>                         longAtput(theSP, remap(oop));
> >>>                     }
> >>>                     theSP += BytesPerWord;
> >>>                 }
> >>>
> >>>
> >>> Any ideas?
> >>
> >>
> >> The image overflows the weakRoots table in scanning stack pages.  The
> weakRoots table registers weak objects for scanning at the end of a GC.  It
> is, unfortunately, fixed size (~2600 entries), and there are lots of
> WeakMessageSends and WeakAnnouncementSubscriptions on the stack.
> >>
> >> I found this using aDebug VM with assert enabled (i.e. compiled with
> NDEBUG /not/ defined).  I increased the table size to 3000 then 6000 before
> finding it no longer crashed with a weakRoots  table size of 12000.
> >>
> >
> > wow, I never imagine about that.
> >
> >>
> >> a) Looks like weakRoots' size should be configurable either via a
> start-up flag or an image header constant (with e.g. vmParameter accessors).
> >
> >
> > yes, with vmParameter would be nice, like the external semaphore table.
> >
> >>
> >>
> >> b) overflowing the weakRoots table (and possibly other tables) should
> probably cause the VM to abort with a useful error message.
> >>
> >
> > please!  :)
> >
> > I have check in the image, before reproducing the bug, and it is not
> that bad:
> >
> > WeakMessageSend instanceCount 755.
> > WeakAnnouncementSubscription instanceCount 538
> >
> > So...maybe when I do the stuff that reproduces the crash there is
> ANOTHER bug (say a loop for example), that cause to have much more
> instances of those weak stuff?
> >
> >
> hmm.. i hardly believe that UI needs such amount of weak messages to
> wire the stuff.. but it is hard to tell, since i'm not an author.
>

Take a look at the attached.  It is taken form the image at a point where
an incrementalGC is performed when the weakRootTable has 6000 or more
elements.  It shows a very deep call stack full
of WeakAnnouncementSubscriptions.


>
> Also, answering Stephane's question: AFAIK, a weak roots table size is
> not liearly depending on the total number of all weak containers in
> your image.
> But i might be wrong.
> Eliot, can you please explain how this weak roots table populated and
> what triggers addition of new element(s) to it, and freeing the entry.
>

So when a GC is performed, any weak collections encountered must be scanned
later, after the mark phase of non-objects have completed, so that the GC
can discover which elements of weak collections are unmarked and nil these
collections.  So in markAndTrace any encountered weak objects get added as
"roots" to the weakRootsTable.  Later (either in incrementalGC or fullGC)
the weak table is processed and unmarked referents in the weak arrays in
the weak table are nilled.  Hence the weak table fills during the mark
phase and is emptied in the nilling phase.

And is the weak roots table size limit reasonably good? Needless to
> say, that nobody likes when system hits the wall of hardcoded limits.
>

Hmmm... In VisualWorks, which has a two-space copying generational GC there
is no weak root table during incremental GC.  Instead the list of weak
objects is threaded through the corpses left behind in from space.  So at
least for some GC designs a weak roots table isn't even needed.  What the
right solution is for the longer term is I don't know.  For example, if a
weak roots table is required the VM can keep track of the count of weak
container instances and base the table size on the number of instances.
 This is something I will solve in my new object representation/GC.  But
for now I think just providing a parameter to determine the maximum size is
sufficient.

--
> Best regards,
> Igor Stasenko.
>

-- 
cheers,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120227/7d779b0b/attachment.htm


More information about the Vm-dev mailing list