[squeak-dev] 4.1 - hashed collections still a problem

Igor Stasenko siguctua at gmail.com
Wed Mar 24 22:58:02 UTC 2010


On 25 March 2010 00:26, Levente Uzonyi <leves at elte.hu> wrote:
> On Wed, 24 Mar 2010, Igor Stasenko wrote:
>
>> On a board's meeting, Chris mentioned my implementation, and he says
>> that it outperforms the LargeIdentitySet.
>> I'm not sure how Chris compared it, because i implemented this scheme
>> only for dictionaries , not sets.
>
> It's really hard to believe that a dictionary can be faster than
> LargeIdentitySet (I even doubt that a set could be faster), so I'm waiting
> for the numbers.
>
> As you pointed out it's unfair to compare sets to dictionaries, so I wrote
> LargeIdentityDictionary
> (http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.st
> ) which works just like LargeIdentitySet, but it also stores the values
> besides the keys. I modified Chris' benchmark just like you suggested: store
> nil as the value
> (http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenchmark.wst
> ). The benchmark compares LargeIdentityDictionary to IdentityDictionary. It
> uses two diffent lookup methods (#at: and #includesKey:). I ran the
> benchmarks and copied the results to a single file
> (http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenchmark.txt
> ), but I'm sure the nice charts are more interesting, so here they are:
> http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.png
> http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png
>
> The first chart shows that IdentityDictionary has a big spike at 970000,
> probably because of clustering (we need even better primes). So the second
> chart is the same as the first, but it only shows the 0-5000ms range. What
> we can see is #at: and #includesKey: have similar performance for
> IdentityDictionary (both use #scanFor:). LargeIdentityDictionary >> #at: is
> ~3.5-5x faster than IdentityDictionary >> #at:. And what's more interresing
> is that #includesKey: is so fast that it seems to be totally flat (though
> it's not). That's because #includesKey: can use the #pointsTo: primitive,
> but #at: can't, because it has to return the value for the key. So if a
> dictionary is faster than LargeIdentitySet, than it's graph has to be even
> flatter than #includesKey:'s graph (it's really hard to believe that such
> dictionary exists).
>
i think that #pointsTo: is a cheat :), which you can use in Sets but
not dictionaries, because
it contains associations. Also, it works only for identity-based collections.

> I wonder how LargeIdentityDictionary compares to your dictionaries'.
>
me too.
>
> Levente
>
> P.S. LargeIdentityDictionary is hardly tested and doesn't implement every
> "public" method that IdentityDictionary does.
>
>>
>> If you interested in details, you can check it in Magma.
>> Brief description:
>> i using a linked list for storing associations which keys having same hash
>> value
>> so, in addition to key/value pair there is next ivar.
>> And dictionary's array entries look like this:
>> ...
>> ...
>> ...
>> e1 -> e2 -> e3
>> ...
>> ...
>>
>> where e1,e2,e3 is associations , which having same hash value, but
>> different keys.
>> So, when it doing lookup for a key, say e3, initially by doing hash
>> lookup it finds e1, and then since #= for e1 answers false, it looks
>> for the next linked association, until e3 is found.
>>
>> If i remember, the last time i benched this implementation, it almost
>> same in speed as an original implementation for small sizes, and
>> outperforms on a large sizes, especially, when you got more than 4096
>> elements.
>>
>> So, Chris, can you give us details how you compared the speed between
>> LargeIdentitySet and my dictionaries?
>> I don't think its correct to compare them, because dictionaries need
>> to handle more data than sets, and we should expect some slowdown
>> because of it.
>> But sure, you can imitate the Set behavior with dicts, by simply
>> ignoring the value field of association (set it to nil), and use only
>> keys, i.e.
>> set add: object
>> is like:
>> dict at: object put: nil.
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>>
>
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list