Sets, hashing and #fullCheck

Andreas Raab andreas.raab at gmx.de
Wed Mar 10 23:01:19 UTC 2004


Hi Guys,

I recently ran into some situations where identity set operations became
horribly slow and since I know there are some people out there who really
understand about this, I thought maybe there's something we can do.

First of all, here's a little benchmark which you can run to see what I'm
talking about:

objList := CompiledMethod allInstances.
objList size < 10000 ifTrue:[self error:'Not enough objects'].
times := Array new: 1000.
1 to: 1000 do:[:max|
    set := IdentitySet new.
    1 to: max * 10 do:[:i| set add: (objList at: i)].
    time := [objList do:[:each| set includes: each]] timeToRun.
    times at: max put: time.
    Display getCanvas line: max at 0 to: max@(time//10) color: Color black.
].

The test simply puts a varying number of objects into an identity set and
then does a (constant) number of inclusion queries. If you run it, you will
notice that after a while we get some *really* huge spikes in there. Playing
with it more by replacing the "IdentitySet new" with "IdentitySet new:
20000" shows that the queries can be run in pretty much constant time (as
they should) so it seems that this has to do with the number of free slots
in the set.

I then went further and made a custom identity set class that used 1/3rd
(instead of the standard 1/4th) of the size as the trigger for growing
inside #fullCheck. It helped a little bit, but basically the spikes were
still there. Changing it to 1/2 however did the trick nicely.

The questions I have are basically: Is there a reasonable explanation for
the behavior I'm seeing when changing it to 1/2 or is this just something
that worked for this specific experiment? More generally, is it possible to
establish a reasonable upper bound for what the "expected fill rate" of an
IDSet should be (considering things like number of hash bits, hash function
used etc)? And finally, would it be worthwhile to change the identity based
sets/dictionaries to grow more agressively?

Thanks for any insights,
  - Andreas




More information about the Squeak-dev mailing list