[squeak-dev] 4.1 - hashed collections still a problem
siguctua at gmail.com
Wed Mar 24 22:58:02 UTC 2010
On 25 March 2010 00:26, Levente Uzonyi <leves at elte.hu> wrote:
> On Wed, 24 Mar 2010, Igor Stasenko wrote:
>> On a board's meeting, Chris mentioned my implementation, and he says
>> that it outperforms the LargeIdentitySet.
>> I'm not sure how Chris compared it, because i implemented this scheme
>> only for dictionaries , not sets.
> It's really hard to believe that a dictionary can be faster than
> LargeIdentitySet (I even doubt that a set could be faster), so I'm waiting
> for the numbers.
> As you pointed out it's unfair to compare sets to dictionaries, so I wrote
> ) which works just like LargeIdentitySet, but it also stores the values
> besides the keys. I modified Chris' benchmark just like you suggested: store
> nil as the value
> ). The benchmark compares LargeIdentityDictionary to IdentityDictionary. It
> uses two diffent lookup methods (#at: and #includesKey:). I ran the
> benchmarks and copied the results to a single file
> ), but I'm sure the nice charts are more interesting, so here they are:
> The first chart shows that IdentityDictionary has a big spike at 970000,
> probably because of clustering (we need even better primes). So the second
> chart is the same as the first, but it only shows the 0-5000ms range. What
> we can see is #at: and #includesKey: have similar performance for
> IdentityDictionary (both use #scanFor:). LargeIdentityDictionary >> #at: is
> ~3.5-5x faster than IdentityDictionary >> #at:. And what's more interresing
> is that #includesKey: is so fast that it seems to be totally flat (though
> it's not). That's because #includesKey: can use the #pointsTo: primitive,
> but #at: can't, because it has to return the value for the key. So if a
> dictionary is faster than LargeIdentitySet, than it's graph has to be even
> flatter than #includesKey:'s graph (it's really hard to believe that such
> dictionary exists).
i think that #pointsTo: is a cheat :), which you can use in Sets but
not dictionaries, because
it contains associations. Also, it works only for identity-based collections.
> I wonder how LargeIdentityDictionary compares to your dictionaries'.
> P.S. LargeIdentityDictionary is hardly tested and doesn't implement every
> "public" method that IdentityDictionary does.
>> If you interested in details, you can check it in Magma.
>> Brief description:
>> i using a linked list for storing associations which keys having same hash
>> so, in addition to key/value pair there is next ivar.
>> And dictionary's array entries look like this:
>> e1 -> e2 -> e3
>> where e1,e2,e3 is associations , which having same hash value, but
>> different keys.
>> So, when it doing lookup for a key, say e3, initially by doing hash
>> lookup it finds e1, and then since #= for e1 answers false, it looks
>> for the next linked association, until e3 is found.
>> If i remember, the last time i benched this implementation, it almost
>> same in speed as an original implementation for small sizes, and
>> outperforms on a large sizes, especially, when you got more than 4096
>> So, Chris, can you give us details how you compared the speed between
>> LargeIdentitySet and my dictionaries?
>> I don't think its correct to compare them, because dictionaries need
>> to handle more data than sets, and we should expect some slowdown
>> because of it.
>> But sure, you can imitate the Set behavior with dicts, by simply
>> ignoring the value field of association (set it to nil), and use only
>> keys, i.e.
>> set add: object
>> is like:
>> dict at: object put: nil.
>> Best regards,
>> Igor Stasenko AKA sig.
Igor Stasenko AKA sig.
More information about the Squeak-dev