[BUG][FIX] WeakGCFix-wbk

Bryce Kampjes bryce at kampjes.demon.co.uk
Wed Mar 24 22:43:50 UTC 2004


Hi Andreas,
First I agree with you. My fix should NOT be included in the main
VM/image. 

It does however unfortunately fix my personal problems. When they
first appeared about three months ago I fixed them by compiling
the VM without optimisation. That this helped is unfortunately a
bad sign, it indicates that the difference between working and
non-working code is in the area where the compiler has a right
to choose, or a bug in the compiler which is very unlikely, but
so is a bug in the interpreter/garbage collector.

It is unfortunate that it fixes my problems but that isolates my
problems a lot which is very fortunate. Yes, I'm working with a custom
VM. But I managed to run my version of your test and produce a nil
even with a stock VM but not a stock image. Thinking about it I don't
think my version (with the message send) should be different to
yours. Weird.

That my fix does fix my problems however does isolate it. It's
something that can stop a root weak object from being collected.
That implies that the mark bit is set, and yes that bit should not
be set. I very much doubt that my code is setting that bit, that
would involve it producing an otherwise good header work with a 
bad mark bit which is highly unlikely. 

Only in one place do I deal with headers, and that is what that part
of the test suite that crashes. However the test that crashes does
nothing, and I've stepped through the machine code, instruction by
instruction, multiple times over a three month period to know
this. Actually that specific test verifies that Exupery is not adding
anything to the root table when both objects in an assignment are both
old. Unfortunately, this also removes the chance of unexpected GCs
because a call instruction is definitely noticeable.

So my situation is this: I have a bug that is possibly caused by the
garbage collector and I have a fix that works. Unfortunately the fix
works for the wrong reasons which is at least enlightening especially
with you help. I can continue working using my fix but that leaves the
real bug undiscovered. I can also spend more time chasing a better
fix. Given that my fix fixes my problem it really isolates the kind of
issue which is not the sort of thing that my VM modifications could
do, especially as I've single stepped through the machine code I'm
running.

Currently, I feel that I should release the next Exupery version with
a Linux VM that includes my fix. See how that VM works in real use
rather than just under explicit testing for a few weeks. Hope
inspiration strikes, or (more likely) a better time comes to chase
this bug further. Releasing a modified VM is nessisary to let people
play with it without needing to compile the VM themselves.

Oh, the stock VM was a 3.4-2 Linux VM from Ian's site. My compiled
VM's were modified versions built from Ned's SourceForge VM branch
with the latest version of VMMaker.

Exupery does involve a few VM modifications to run. First, it needs to
get the addresses of various VM variables for code generation. Second,
it needs to modify the message sending code so it can override methods
with compiled code. This is why until I had that fix I assumed the bug
was due to my code. However to test rootTable updating I do run global
collects frequently, this produces the bug that I see. I run identical
code elsewhere without the garbage collect when testing the assignment
which does not crash. The test that causes the crash does not update
the rootTable, I've checked both by reading the assembly generated and
also by single stepping through the machine code while watching the 
contents of the rootTable (only four entries in this case).

If there is interest, I'm happy to chase this further now. If it isn't
impacting anybody else then I'll leave it until a better time.  A
better time would be when working with the Exupery/VM integration
which is the guts of the next release. Or on things that involve GC
interaction such as inlining code where type tests need types which
are objects which the garbage collector can move.

Bryce



More information about the Squeak-dev mailing list