[squeak-dev] Improving WeakKeyDictionary comment

Levente Uzonyi leves at caesar.elte.hu
Sun Jun 5 21:22:54 UTC 2022

Hi Chris,

On Tue, 31 May 2022, Chris Muller wrote:

> Hi Jakob, hi Levente,
>       > Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected
>       of them?
>       >
>       > - Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly?
> Yes.  Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
> Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys.  Here's a short script showing the mechanics and assumptions of this:

No, that is not the case. Associations with nil keys get removed 
automatically, which is why you cannot have nil keys in 
WeakKeyDictionaries (and because there was no demand to implement 
something like what has been done for Sets).

What you mean is that associations do not get removed immediately.
Here's is a snippet demonstrating the situation by only sending #at:put: 
to the dictionary and always with a new key:

| w |
w := WeakKeyDictionary new.
1000 timesRepeat: [ w at: Object new put: 0 ].
Transcript open; showln: { w size. w slowSize. w capacity }.
w capacity - w size timesRepeat: [ w at: Object new put: 0 ].
Smalltalk garbageCollectMost.
w at: Object new put: 0.
Smalltalk garbageCollectMost.
Transcript showln: { w size. w slowSize. w capacity }.

The following lines should appear on the transcript:
#(1000 0 1817)
#(1 0 2)

The dictionary will eventually shrink to the smallest possible 
capacity, 2, holding one association to be garbage collected and no 
associations with a valid key.

As with all weak hashed collections, #isEmpty, #size, #notEmpty are not 
reliable because the entries may vanish any time, so there's no effort to 
even try to make those correct.
If you need accurate information, you must check against #slowSize, which 
means O(n) runtime. Using ephemerons may help with that.

> |d|
> d:={'hello' copy -> 'world'.  'hello' copy -> 'there'} as: WeakIdentityKeyDictionary.
> Smalltalk garbageCollect.
> self assert: (d includesKey: 'hello') not.   "good"
> self assert: d notEmpty.    "bad"
> self assert: d array asSet size > 1.   "bad"
> d finalizeValues.   "required!"
> self assert: d array asSet size = 1.   "good"
> self assert: d isEmpty.    "good"
>       Only if you are interested in finalization and want to do it your own way.
>       And even then, only if you want to have your finalizer executed as soon as
>       possible when a key of your dictionary has been garbage collected.
>       So, if you just want a dictionary with weak keys, the answer is no.
>       > - Should one register the dictionary instance via WeakArray addWeakDependent:?
>       Same as above. I'm not even sure it would work with
>       WeakKeyDictionaries. It's intended to work with WeakRegistries, which are
>       the main providers of finalization.
> I believe it would.  Check out WeakArray class>>#finalizationProcess and, perhaps doing so would allow those WeakKey dictionary's to clean themselves up automatically so you don't have to time sending #finalizeValues
> yourself.  But, I don't trust it, because it's hard to tell, because #finalizationProcess waits on FinalizationSemaphore, which is an opaque VM-controlled semaphore which I have no idea when it's signaled.
>       > - Which VM feature is the WeakRegistry class comment referring to; is it the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
>       One[1] that has never really been used in Squeak and has been
>       obsoleted by ephemerons. Even though we do not use ephemerons for
>       finalization yet.
>       > - If the answers to the questions above do not give it away, what must one do to use a WeakKeyDictionary safely?
>       The difference between a regular Dictionary and a WeakKeyDictionary is
>       that the latter cannot hold nil as a key, and your associations may
>       disappear when their keys are not referenced strongly.
> They only disappear from the perspective of using the object API (excluding #isEmpty and #notEmpty as demonstrated by the script above).  Internally, the former Association objects don't disappear, and still take up memory.
> Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway).  If interested, load "MaBase" from SqueakMap and
> then see MaWeakIdentityKeyDictionary.

Is it really faster?
Here's a basic benchmark:

 	| m |
 	Smalltalk garbageCollect.
 	m := MaDictionary new.
 		#insert -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i ] ] timeToRun.
 		#access -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply ] ] timeToRun.
 		#update -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i + 1 ] ] timeToRun.
 		#positiveLookup -> [ 1 to: 1000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun.
 		#negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun.
 		#remove -> [ 1 to: 1000000 do: [ :i | m removeKey: i hashMultiply ] ] timeToRun
 	} ] value.

"{#insert->535 . #access->1062 . #update->250 . #positiveLookup->967 . #negativeLookup->1055 . #remove->2353}"
Even with MaDictionary >> #checkForUnderflow: patched to avoid the creation of Fractions.

The same benchmark for Dictionary yields {#insert->469 . #access->945 . #update->198 . #positiveLookup->174 . #negativeLookup->301 . #remove->719} on my machine.


More information about the Squeak-dev mailing list