[squeak-dev] 4.1 - hashed collections still a problem

Levente Uzonyi leves at elte.hu
Wed Mar 24 22:26:13 UTC 2010


On Wed, 24 Mar 2010, Igor Stasenko wrote:

> On a board's meeting, Chris mentioned my implementation, and he says
> that it outperforms the LargeIdentitySet.
> I'm not sure how Chris compared it, because i implemented this scheme
> only for dictionaries , not sets.

It's really hard to believe that a dictionary can be faster than 
LargeIdentitySet (I even doubt that a set could be faster), so I'm waiting 
for the numbers.

As you pointed out it's unfair to compare sets to dictionaries, so I 
wrote LargeIdentityDictionary 
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.st 
) which works just like LargeIdentitySet, but it also stores the values 
besides the keys. I modified Chris' benchmark just like you suggested: 
store nil as the value 
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenchmark.wst 
). The benchmark compares LargeIdentityDictionary to IdentityDictionary. 
It uses two diffent lookup methods (#at: and #includesKey:). I ran the 
benchmarks and copied the results to a single file 
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenchmark.txt 
), but I'm sure the nice charts are more interesting, so here they are:
http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.png
http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png

The first chart shows that IdentityDictionary has a big spike at 970000, 
probably because of clustering (we need even better primes). So the second 
chart is the same as the first, but it only shows the 0-5000ms range. What 
we can see is #at: and #includesKey: have similar performance for 
IdentityDictionary (both use #scanFor:). LargeIdentityDictionary >> #at: 
is ~3.5-5x faster than IdentityDictionary >> #at:. And what's more 
interresing is that #includesKey: is so fast that it seems to be totally 
flat (though it's not). That's because #includesKey: can use the 
#pointsTo: primitive, but #at: can't, because it has to return the value 
for the key. So if a dictionary is faster than LargeIdentitySet, than it's 
graph has to be even flatter than #includesKey:'s graph (it's really hard 
to believe that such dictionary exists).

I wonder how LargeIdentityDictionary compares to your dictionaries'.


Levente

P.S. LargeIdentityDictionary is hardly tested and doesn't implement every 
"public" method that IdentityDictionary does.

>
> If you interested in details, you can check it in Magma.
> Brief description:
> i using a linked list for storing associations which keys having same hash value
> so, in addition to key/value pair there is next ivar.
> And dictionary's array entries look like this:
> ...
> ...
> ...
> e1 -> e2 -> e3
> ...
> ...
>
> where e1,e2,e3 is associations , which having same hash value, but
> different keys.
> So, when it doing lookup for a key, say e3, initially by doing hash
> lookup it finds e1, and then since #= for e1 answers false, it looks
> for the next linked association, until e3 is found.
>
> If i remember, the last time i benched this implementation, it almost
> same in speed as an original implementation for small sizes, and
> outperforms on a large sizes, especially, when you got more than 4096
> elements.
>
> So, Chris, can you give us details how you compared the speed between
> LargeIdentitySet and my dictionaries?
> I don't think its correct to compare them, because dictionaries need
> to handle more data than sets, and we should expect some slowdown
> because of it.
> But sure, you can imitate the Set behavior with dicts, by simply
> ignoring the value field of association (set it to nil), and use only
> keys, i.e.
> set add: object
> is like:
> dict at: object put: nil.
>
> -- 
> Best regards,
> Igor Stasenko AKA sig.
>
>



More information about the Squeak-dev mailing list