[squeak-dev] 4.1 - hashed collections still a problem
Levente Uzonyi
leves at elte.hu
Wed Mar 24 22:26:13 UTC 2010
On Wed, 24 Mar 2010, Igor Stasenko wrote:
> On a board's meeting, Chris mentioned my implementation, and he says
> that it outperforms the LargeIdentitySet.
> I'm not sure how Chris compared it, because i implemented this scheme
> only for dictionaries , not sets.
It's really hard to believe that a dictionary can be faster than
LargeIdentitySet (I even doubt that a set could be faster), so I'm waiting
for the numbers.
As you pointed out it's unfair to compare sets to dictionaries, so I
wrote LargeIdentityDictionary
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.st
) which works just like LargeIdentitySet, but it also stores the values
besides the keys. I modified Chris' benchmark just like you suggested:
store nil as the value
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenchmark.wst
). The benchmark compares LargeIdentityDictionary to IdentityDictionary.
It uses two diffent lookup methods (#at: and #includesKey:). I ran the
benchmarks and copied the results to a single file
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenchmark.txt
), but I'm sure the nice charts are more interesting, so here they are:
http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.png
http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png
The first chart shows that IdentityDictionary has a big spike at 970000,
probably because of clustering (we need even better primes). So the second
chart is the same as the first, but it only shows the 0-5000ms range. What
we can see is #at: and #includesKey: have similar performance for
IdentityDictionary (both use #scanFor:). LargeIdentityDictionary >> #at:
is ~3.5-5x faster than IdentityDictionary >> #at:. And what's more
interresing is that #includesKey: is so fast that it seems to be totally
flat (though it's not). That's because #includesKey: can use the
#pointsTo: primitive, but #at: can't, because it has to return the value
for the key. So if a dictionary is faster than LargeIdentitySet, than it's
graph has to be even flatter than #includesKey:'s graph (it's really hard
to believe that such dictionary exists).
I wonder how LargeIdentityDictionary compares to your dictionaries'.
Levente
P.S. LargeIdentityDictionary is hardly tested and doesn't implement every
"public" method that IdentityDictionary does.
>
> If you interested in details, you can check it in Magma.
> Brief description:
> i using a linked list for storing associations which keys having same hash value
> so, in addition to key/value pair there is next ivar.
> And dictionary's array entries look like this:
> ...
> ...
> ...
> e1 -> e2 -> e3
> ...
> ...
>
> where e1,e2,e3 is associations , which having same hash value, but
> different keys.
> So, when it doing lookup for a key, say e3, initially by doing hash
> lookup it finds e1, and then since #= for e1 answers false, it looks
> for the next linked association, until e3 is found.
>
> If i remember, the last time i benched this implementation, it almost
> same in speed as an original implementation for small sizes, and
> outperforms on a large sizes, especially, when you got more than 4096
> elements.
>
> So, Chris, can you give us details how you compared the speed between
> LargeIdentitySet and my dictionaries?
> I don't think its correct to compare them, because dictionaries need
> to handle more data than sets, and we should expect some slowdown
> because of it.
> But sure, you can imitate the Set behavior with dicts, by simply
> ignoring the value field of association (set it to nil), and use only
> keys, i.e.
> set add: object
> is like:
> dict at: object put: nil.
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>
More information about the Squeak-dev
mailing list
|