[squeak-dev] Improving WeakKeyDictionary comment

Chris Muller ma.chris.m at gmail.com
Mon Jun 6 05:51:10 UTC 2022


Hi Levente,

> >       > - Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly?
> >
> > Yes.  Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
> >
> > Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys.  Here's a short script showing the mechanics and assumptions of this:
>
> No, that is not the case. Associations with nil keys get removed
> automatically, which is why you cannot have nil keys in
> WeakKeyDictionaries (and because there was no demand to implement
> something like what has been done for Sets).
>
> What you mean is that associations do not get removed immediately.
> Here's is a snippet demonstrating the situation by only sending #at:put:
> to the dictionary and always with a new key:
>
> | w |
> w := WeakKeyDictionary new.
> 1000 timesRepeat: [ w at: Object new put: 0 ].
> Transcript open; showln: { w size. w slowSize. w capacity }.
> w capacity - w size timesRepeat: [ w at: Object new put: 0 ].
> Smalltalk garbageCollectMost.
> w at: Object new put: 0.
> Smalltalk garbageCollectMost.
> Transcript showln: { w size. w slowSize. w capacity }.
>
>
> The following lines should appear on the transcript:
> #(1000 0 1817)
> #(1 0 2)

Okay, that must be new.  In 5.3, I still get:

#(1000 1000 2423)
#(2424 0 5119)

but, in trunk, yes!  That's great, thanks for letting me know, but I'm
having trouble identifying _why_ it works like that in trunk -- what
change actually enabled that?  It's something Igor and I worked on
back in 2010 (I incorrectly said 2007, before), but it involved a VM
change we couldn't get pushed through.

> The dictionary will eventually shrink to the smallest possible
> capacity, 2,

Wow, even smaller than the default initial capacity, 3, now there's a
pleasant surprise. ;)  Does that mean if I do a "WeakKeyDictionary
new" and never add any entries to it, its capacity will eventually
shrink to 2?

> holding one association to be garbage collected and no
> associations with a valid key.
>
> As with all weak hashed collections, #isEmpty, #size, #notEmpty are not
> reliable because the entries may vanish any time, so there's no effort to
> even try to make those correct.

Hm, Marcel fixed #isEmpty for Weak collections in 2020.  There are
contexts where those methods are useful even with the understanding of
the potential inaccuracy.  Wouldn't you consider #at: and
#includesKey: to also be affected (e.g., "not reliable") by this
logic, as well?

> If you need accurate information, you must check against #slowSize, which
> means O(n) runtime. Using ephemerons may help with that.

I hope to one day understand ephemerons and why they're useful,
because I sense they provide some kind of magic that could help my
stack.  But, I never grok'd them so far.

> > Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway).  If interested, load "MaBase" from SqueakMap and
> > then see MaWeakIdentityKeyDictionary.
>
> Is it really faster?
> Here's a basic benchmark:
>
> [
>         | m |
>         Smalltalk garbageCollect.
>         m := MaDictionary new.
>         {
>                 #insert -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i ] ] timeToRun.
>                 #access -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply ] ] timeToRun.
>                 #update -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i + 1 ] ] timeToRun.
>                 #positiveLookup -> [ 1 to: 1000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun.
>                 #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun.
>                 #remove -> [ 1 to: 1000000 do: [ :i | m removeKey: i hashMultiply ] ] timeToRun
>         } ] value.
>
> "{#insert->535 . #access->1062 . #update->250 . #positiveLookup->967 . #negativeLookup->1055 . #remove->2353}"
> The same benchmark for Dictionary yields
>  {#insert->469 . #access->945 . #update->198 . #positiveLookup->174 . #negativeLookup->301 . #remove->719} on my machine.
> Even with MaDictionary >> #checkForUnderflow: patched to avoid the creation of Fractions.

Yeah, one Fraction per resize isn't going to be noticeable, but thanks
for pointing it out, I patched it.

MaDictionary is actually just an abstract superclass Igor made to
contain his work, it wasn't meant to be used.  The real purpose of his
work was the replacement of Squeak's woeful WeakKeyIdentityDictionary
from the days when the identityHash was only 12-bits.  Your bench
didn't have any collisions, but that's what the Ma variant's optimized
for.

But Squeak has come a long way since the 12-bit identityHash and
#hashMultiply, so the benefit is not nearly as pronounced, but I
enhanced your basic bench to try to more-closely emulate how Magma
would be using them for, and shows that it's still a _little_ (albeit,
not much) better than WeakKeyIdentityDictionary with #at: and
#at:put:...

{ WeakIdentityKeyDictionary. MaWeakIdentityKeyDictionary}
 collect: [ : cls |
[
        | m r o testSize |
        testSize := 1e6.
        o := Object new.
        r := (1 to: testSize) collect: [ : e | Object new ].
        r := r shuffled.  "mix up identityHash's to simulate
real-world access, just in case consecutive creation above might
artificially help the hash algo"
        m := cls new.
        Smalltalk garbageCollect.
        {        m class -> testSize.
                #insert -> [ r do: [ :i | m at: i put: i ] ] timeToRun.
                r := r shuffled.
                #access -> [ r do: [ :i | m at: i ] ] timeToRun.
                r := r shuffled.
                #negativeaccess -> [ r do: [ :i | m at: o ifAbsent:
[nil] ] ] timeToRun.
                r := r shuffled.
                #update -> [ r do: [ :i | m at: i put: 0 ] ] timeToRun.
                r := r shuffled.
                #positiveLookup -> [ r do: [ :i | m includesKey: i ] ]
timeToRun.
                r := r shuffled.
                #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m
includesKey: o ] ] timeToRun.
        } ] value select: [ : e | e isVariableBinding ] ].

{{WeakIdentityKeyDictionary->1000000 . #insert->1830 . #access->1260 .
#negativeaccess->705 . #update->532 . #positiveLookup->524 .
#negativeLookup->42} .
{MaWeakIdentityKeyDictionary->1000000 . #insert->1286 . #access->1170
. #negativeaccess->739 . #update->435 . #positiveLookup->1131 .
#negativeLookup->800}}

I never noticed #includesKey: to be so slow on Ma (especially the
negative case, wow!), and I think Magma does use that in one place,
but most of the usage is for #at: and #at:put: which get even better
as the sizes get beyond 1M elements.

But with 6.0's new auto-cleaning, I will certainly retire these Ma dictionary's!

I always learn useful stuff from you, thanks a lot for your help.

 - Chris


More information about the Squeak-dev mailing list