[squeak-dev] Re: WeakRegistry deadlock [was: JNIPort preview for Squeak and Pharo available on SqueakSource]

Levente Uzonyi leves at elte.hu
Sat Jun 12 16:26:08 UTC 2010


On Sat, 12 Jun 2010, Joachim Geidel wrote:

>> Am 09.06.10 16:11 schrieb Levente Uzonyi:
>>>> * Starting a JVM seems to lead to a deadlock in Squeak 4.1, but not in
>>>> Pharo. When you start a JVM, nothing happens until you hit command-. (on a
>>>> Mac), wait until a notifier for the user interrupt appears, and proceed. The
>>>> problem seems to be somewhere in WeakRegistry; it disappears when you
>>>> replace Squeak?s WeakRegistry by the WeakRegistry class from a Pharo image.
>>>
>>> That's really interesting. I tried to reproduce it on windows without
>>> success, the jvm just failed to start, then alien crashed the image. Can
>>> you send a stack trace of the deadlock?
>>>
>>>
>>> Levente
>
> I have some news, and maybe a solution.
>
> I managed to produce a log file for the #wait and #signal messages sent to
> the Semaphore of the WeakRegistry (see below). I still can't explain why the
> deadlock occurs and why proceeding after a user interrupt helps.
>
> What I found however is that changing the priority of WeakArray's
> finalizationProcess from userInterruptPriority to systemBackgroundPriority
> solved the problem - no more deadlocks when starting JNIPort's JVM. I have
> no idea why this works, and I also have no idea if changing the priority of
> the process can have any negative side-effects.
>
> I find WeakKeyDictionary>>finalizeValues suspicious. It does a linear scan
> of the hash table, and when it nils a slot, it rehashes objects found above
> this slot. I think this can corrupt the hash table when there are colliding
> hashes. A simplified example with a hash table with 5 slots which abstracts
> from the fact that the elements are actually WeakKeyAssociations:
>
> Initial state:
> [nil nil nil nil nil]
> Add object A with hash 4:
> [nil nil nil A nil]
> Add object B with hash 5:
> [nil nil nil A B]
> Add object C with hash 4:
> [C nil nil A B]
> Expire object A, finalizeValues:
> [C nil nil nil B]
>
> WeakKeyDictionary>>finalizeValues will detect that the object in slot 4 was
> garbage collected, and try to rehash objects from there to the end. However,
> it will not detect that object C needs to be moved to slot 4. This means
> that C will not be found by scanFor: (tests if it is present will give the
> wrong answer), and it can be added again to the WeakKeyDictionary. If C
> expires after being added a second time, it will be finalized twice, which
> can lead to errors.
>
> Or did I get something wrong here?

Very nice find, this is definitely the bug. Here is a snippet to reproduce 
it:

| objectWithHashModulo w a b c |
objectWithHashModulo := [ :requestedHash :modulo |
  	| object |
  	[
  		object := Object new.
  		object hash \\ modulo = requestedHash ] whileFalse.
  	object ].
w := WeakKeyDictionary new.
a := objectWithHashModulo value: 3 value: 5.
w at: a put: 1.
b := objectWithHashModulo value: 4 value: 5.
w at: b put: 2.
c := objectWithHashModulo value: 3 value: 5.
w at: c put: 3.
self assert: w capacity = 5.
self assert: (w array at: 4) key == a.
self assert: (w array at: 5) key == b.
self assert: (w array at: 1) key == c.
a := nil.
Smalltalk garbageCollect.
w finalizeValues.
self assert: (w includesKey: c)

I will fix it soon.


Thanks,
Levente


>
> Best regards,
> Joachim Geidel
>
> ----
>
> In the log, + stands for #wait, - for #signal. The numbers are the hashes of
> the active process. 1033371648 is the Process executing the JVM startup,
> 560201728 is the hash of the finalization process.
>
> +1033371648
> -1033371648
> [many repetitions]
> +1033371648
> -1033371648
> +560201728
> -560201728
> +1033371648
> -1033371648
> +1033371648
> -1033371648
> +1033371648
> -1033371648
> +1033371648
> -1033371648
> +1033371648
> -1033371648
> +560201728
> -560201728
> [more repetitions: 1 access by the finalization process after 6-8 accesses
> by the JVM process]
> +560201728
> -560201728
> +1033371648
> -1033371648
> +560201728
> -560201728
> +1033371648
> -1033371648
> +560201728
> -560201728
> +1033371648
> -1033371648
> [more repetitions]
> +1033371648
> -1033371648
> +560201728
> +1033371648
> [Deadlock, User Interrupt & Proceed here]
> -1033371648
> -560201728
> +560201728
> -560201728
> Etc.
>
>
>



More information about the Squeak-dev mailing list