[Vm-dev] Reproducible Cog crash from image startup

Eliot Miranda eliot.miranda at gmail.com
Mon Feb 27 21:32:31 UTC 2012


ignore this.  there's a repost with corrections and the attachments on its
way.  but its so large (800k stack trace) it awaits moderator approval...

On Mon, Feb 27, 2012 at 1:06 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:

>
>
> On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko <siguctua at gmail.com>wrote:
>
>> On 27 February 2012 10:53, Mariano Martinez Peck <marianopeck at gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>> >>
>> >> Hi Mariano,
>> >>
>> >> On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck <
>> marianopeck at gmail.com> wrote:
>> >>>
>> >>>
>> >>> Hi. I have faced a VM crash while using Nautilus browser. It took me
>> a while, but I finally could make a reproducible crash from image startup.
>> You can find the image here:
>> >>>
>> https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip
>> >>>
>> >>> What the image is running at startup that causes the crash is:
>> >>>
>> >>> | nautilus model ui|
>> >>> Nautilus instVarNamed: 'groups' put: nil.
>> >>> model := Nautilus open.
>> >>> ui := model ui.
>> >>> ui groupsButtonAction.
>> >>>
>> >>> If you need more about the "domain", we can ask Ben, Nautilus
>> developer.  From what I can see in GDB, it crashes in #mapStackPages
>> because it does a remap to an OOP that is 0 (zero)
>> >>>
>> >>> while (theSP <= frameRcvrOffset) {
>> >>>                     oop = longAt(theSP);
>> >>>                     if (!((oop & 1))) {
>> >>>                         longAtput(theSP, remap(oop));
>> >>>                     }
>> >>>                     theSP += BytesPerWord;
>> >>>                 }
>> >>>
>> >>>
>> >>> Any ideas?
>> >>
>> >>
>> >> The image overflows the weakRoots table in scanning stack pages.  The
>> weakRoots table registers weak objects for scanning at the end of a GC.  It
>> is, unfortunately, fixed size (~2600 entries), and there are lots of
>> WeakMessageSends and WeakAnnouncementSubscriptions on the stack.
>> >>
>> >> I found this using aDebug VM with assert enabled (i.e. compiled with
>> NDEBUG /not/ defined).  I increased the table size to 3000 then 6000 before
>> finding it no longer crashed with a weakRoots  table size of 12000.
>> >>
>> >
>> > wow, I never imagine about that.
>> >
>> >>
>> >> a) Looks like weakRoots' size should be configurable either via a
>> start-up flag or an image header constant (with e.g. vmParameter accessors).
>> >
>> >
>> > yes, with vmParameter would be nice, like the external semaphore table.
>> >
>> >>
>> >>
>> >> b) overflowing the weakRoots table (and possibly other tables) should
>> probably cause the VM to abort with a useful error message.
>> >>
>> >
>> > please!  :)
>> >
>> > I have check in the image, before reproducing the bug, and it is not
>> that bad:
>> >
>> > WeakMessageSend instanceCount 755.
>> > WeakAnnouncementSubscription instanceCount 538
>> >
>> > So...maybe when I do the stuff that reproduces the crash there is
>> ANOTHER bug (say a loop for example), that cause to have much more
>> instances of those weak stuff?
>> >
>> >
>> hmm.. i hardly believe that UI needs such amount of weak messages to
>> wire the stuff.. but it is hard to tell, since i'm not an author.
>>
>
> Take a look at the attached.  It is taken form the image at a point where
> an incrementalGC is performed when the weakRootTable has 6000 or more
> elements.  It shows a very deep call stack full
> of WeakAnnouncementSubscriptions.
>
>
>>
>> Also, answering Stephane's question: AFAIK, a weak roots table size is
>> not liearly depending on the total number of all weak containers in
>> your image.
>> But i might be wrong.
>> Eliot, can you please explain how this weak roots table populated and
>> what triggers addition of new element(s) to it, and freeing the entry.
>>
>
> So when a GC is performed, any weak collections encountered must be
> scanned later, after the mark phase of non-objects have completed, so that
> the GC can discover which elements of weak collections are unmarked and nil
> these collections.  So in markAndTrace any encountered weak objects get
> added as "roots" to the weakRootsTable.  Later (either in incrementalGC or
> fullGC) the weak table is processed and unmarked referents in the weak
> arrays in the weak table are nilled.  Hence the weak table fills during the
> mark phase and is emptied in the nilling phase.
>
> And is the weak roots table size limit reasonably good? Needless to
>> say, that nobody likes when system hits the wall of hardcoded limits.
>>
>
> Hmmm... In VisualWorks, which has a two-space copying generational GC
> there is no weak root table during incremental GC.  Instead the list of
> weak objects is threaded through the corpses left behind in from space.  So
> at least for some GC designs a weak roots table isn't even needed.  What
> the right solution is for the longer term is I don't know.  For example, if
> a weak roots table is required the VM can keep track of the count of weak
> container instances and base the table size on the number of instances.
>  This is something I will solve in my new object representation/GC.  But
> for now I think just providing a parameter to determine the maximum size is
> sufficient.
>
> --
>> Best regards,
>> Igor Stasenko.
>>
>
> --
> cheers,
> Eliot
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120227/42af87af/attachment-0001.htm


More information about the Vm-dev mailing list