[squeak-dev] WeakRegistry deadlock [was: JNIPort preview for Squeak and Pharo available on SqueakSource]

Joachim Geidel joachim.geidel at onlinehome.de
Sat Jun 12 14:41:49 UTC 2010


> Am 09.06.10 16:11 schrieb Levente Uzonyi:
>>> * Starting a JVM seems to lead to a deadlock in Squeak 4.1, but not in
>>> Pharo. When you start a JVM, nothing happens until you hit command-. (on a
>>> Mac), wait until a notifier for the user interrupt appears, and proceed. The
>>> problem seems to be somewhere in WeakRegistry; it disappears when you
>>> replace Squeak?s WeakRegistry by the WeakRegistry class from a Pharo image.
>> 
>> That's really interesting. I tried to reproduce it on windows without
>> success, the jvm just failed to start, then alien crashed the image. Can
>> you send a stack trace of the deadlock?
>> 
>> 
>> Levente

I have some news, and maybe a solution.

I managed to produce a log file for the #wait and #signal messages sent to
the Semaphore of the WeakRegistry (see below). I still can't explain why the
deadlock occurs and why proceeding after a user interrupt helps.

What I found however is that changing the priority of WeakArray's
finalizationProcess from userInterruptPriority to systemBackgroundPriority
solved the problem - no more deadlocks when starting JNIPort's JVM. I have
no idea why this works, and I also have no idea if changing the priority of
the process can have any negative side-effects.

I find WeakKeyDictionary>>finalizeValues suspicious. It does a linear scan
of the hash table, and when it nils a slot, it rehashes objects found above
this slot. I think this can corrupt the hash table when there are colliding
hashes. A simplified example with a hash table with 5 slots which abstracts
from the fact that the elements are actually WeakKeyAssociations:

Initial state:
[nil nil nil nil nil]
Add object A with hash 4:
[nil nil nil A nil]
Add object B with hash 5:
[nil nil nil A B]
Add object C with hash 4:
[C nil nil A B]
Expire object A, finalizeValues:
[C nil nil nil B]

WeakKeyDictionary>>finalizeValues will detect that the object in slot 4 was
garbage collected, and try to rehash objects from there to the end. However,
it will not detect that object C needs to be moved to slot 4. This means
that C will not be found by scanFor: (tests if it is present will give the
wrong answer), and it can be added again to the WeakKeyDictionary. If C
expires after being added a second time, it will be finalized twice, which
can lead to errors.

Or did I get something wrong here?

Best regards,
Joachim Geidel

----

In the log, + stands for #wait, - for #signal. The numbers are the hashes of
the active process. 1033371648 is the Process executing the JVM startup,
560201728 is the hash of the finalization process.

+1033371648
-1033371648
[many repetitions]
+1033371648
-1033371648
+560201728
-560201728
+1033371648
-1033371648
+1033371648
-1033371648
+1033371648
-1033371648
+1033371648
-1033371648
+1033371648
-1033371648
+560201728
-560201728
[more repetitions: 1 access by the finalization process after 6-8 accesses
by the JVM process]
+560201728
-560201728
+1033371648
-1033371648
+560201728
-560201728
+1033371648
-1033371648
+560201728
-560201728
+1033371648
-1033371648
[more repetitions]
+1033371648
-1033371648
+560201728
+1033371648
[Deadlock, User Interrupt & Proceed here]
-1033371648
-560201728
+560201728
-560201728
Etc.





More information about the Squeak-dev mailing list