[BUG][FIX] WeakGCFix-wbk

Andreas Raab andreas.raab at gmx.de
Thu Mar 25 00:03:29 UTC 2004


Hi Bryce,

> That my fix does fix my problems however does isolate it. It's
> something that can stop a root weak object from being collected.
> That implies that the mark bit is set, and yes that bit should not
> be set. I very much doubt that my code is setting that bit, that
> would involve it producing an otherwise good header work with a
> bad mark bit which is highly unlikely.

There's a simple way to find out - just scan all the objects right *before*
GC to see if any of them have the mark bit set before GC starts. If that
happens you know you're dead in the water and having an IGC triggered at
various places would allow you to pin-point where it happens.

> So my situation is this: I have a bug that is possibly caused by the
> garbage collector and I have a fix that works. Unfortunately the fix
> works for the wrong reasons which is at least enlightening especially
> with you help. I can continue working using my fix but that leaves the
> real bug undiscovered. I can also spend more time chasing a better
> fix. Given that my fix fixes my problem it really isolates the kind of
> issue which is not the sort of thing that my VM modifications could
> do, especially as I've single stepped through the machine code I'm
> running.

It is *always* the case for GC problems that they show up in completely
unrelated places. *ALWAYS*! Don't waste your time investigating that
particular place which just happens to trigger a GC. Run IGCs at every
allocation! Add sanity checks! The only thing you can say for sure is that
the problem occured "some time before" the GC was triggered.

> Currently, I feel that I should release the next Exupery version with
> a Linux VM that includes my fix. See how that VM works in real use
> rather than just under explicit testing for a few weeks.

That's entirely your choice - if you have faith that the fix you're using
solves the problem, go for it. Though, I have to admit that having chased GC
bugs before it is a dangerous assumption to have a fix where you don't
understand why it works.

> Exupery does involve a few VM modifications to run. First, it needs to
> get the addresses of various VM variables for code generation. Second,
> it needs to modify the message sending code so it can override methods
> with compiled code. This is why until I had that fix I assumed the bug
> was due to my code.

I am still convinced it is in your code ;-)

> However to test rootTable updating I do run global
> collects frequently, this produces the bug that I see.
> I run identical code elsewhere without the garbage collect
> when testing the assignment which does not crash. The test
> that causes the crash does not update the rootTable,
> I've checked both by reading the assembly generated and
> also by single stepping through the machine code while watching
> the contents of the rootTable (only four entries in this case).

That doesn't mean anything. The only thing you can say for sure is that it
happened some time before the GC. The code you're looking at might be
*completely* unrelated.

> If there is interest, I'm happy to chase this further now. If it
> isn't impacting anybody else then I'll leave it until a better time.
> A better time would be when working with the Exupery/VM integration
> which is the guts of the next release. Or on things that involve GC
> interaction such as inlining code where type tests need types which
> are objects which the garbage collector can move.

That's entirely your decision to make.

Cheers,
  - Andreas




More information about the Squeak-dev mailing list