[squeak-dev] 4.1 - hashed collections still a problem

Igor Stasenko siguctua at gmail.com
Tue Mar 23 23:29:46 UTC 2010


On 24 March 2010 01:22, Bert Freudenberg <bert at freudenbergs.de> wrote:
> On 23.03.2010, at 23:57, Levente Uzonyi wrote:
>>
>> On Tue, 23 Mar 2010, Bert Freudenberg wrote:
>>
>>> On 23.03.2010, at 16:01, Lukas Renggli wrote:
>>>>
>>>>>> Just an idea: we could get rid of compact classes, which would give us
>>>>>> additional 6 bits (5 bits from the compact class index plus 1 bit from the
>>>>>> header type because there would only be 2 header types left). This would
>>>>>> increase the identity hash values from 4096 to 262144. In a PharoCore1.0
>>>>>> image there are 148589 instances of compact classes, hence this would cost
>>>>>> 580k. Or, we could just add an additional word and use the spare bits from
>>>>>> the old identity hash for other stuff, e.g., immutability ;)
>>>>>
>>>>> I like the first idea, we could even have the 17 continuous bits for
>>>>> identity hash the 1 separate bit for immutability.
>>>>
>>>> Yes please, I love it :-)
>>>>
>>>> Lukas
>>>
>>> Well, someone should code it up, and then lets's see macro benchmarks :)
>>
>> That's a great idea, but I'm sure it'll take a while until that happens. Fortunately identityhash related benchmarks can be done without changing the vm. I rewrote a bit the benchmark from Chris, created three classes which have 17, 18 and 30 bits for #scaledIdentityHash. Ran the benchmark with these three classes + Object, collected the data and created some diagrams. I'm sure most people don't care about the code/data[1], so here are the diagrams:
>> http://leves.web.elte.hu/identityHashBits/identityHashBits.png
>> http://leves.web.elte.hu/identityHashBits/identityHashBits2.png
>> http://leves.web.elte.hu/identityHashBits/identityHashBits3.png
>>
>> The first one contains the four graphs. It clearly shows that 12 bits (Object) are insufficient for #identityHash. Even 5 more bits gives 8-9x speedup and a dramatic change in behavior.
>>
>> The second is the same as the first, but it shows only the 17, 18 and 30 bits case. Note that the primes (hashtable sizes) are now optimized for 12 bits. If they are optimized for 17/18 bits then the results can be better for larger set sizes (130+/260+) where they show worse behavior compared to the 18/30 bits case.
>>
>> The third graph shows how an optimized data structure (LargeIdentitySet) compares to the 17, 18 and 30 bits case using only 12 bits.
>>
>> [1] All the code/data that were used to generate these graphs can be found here http://leves.web.elte.hu/identityHashBits
>>
>>
>> Levente
>>
>> P.S. I also created a 12 bit version of the 17-18-30 bit classes and found that it's 1.2-2.0x slower than Object, so the values after the vm changes are expected to be even better than what these graphs show.
>
> So this seems to indicate your specialized data structure beats all VM hash bits extension?
>
> Also, we don't know yet how getting rid of compact classes would affect performance.
>
i expect a slight speed increase, because of more uniform header handling :)

Concerning LargeIdentitySet - this is good argument to the point that
for large collections we should use different data structures.

> - Bert -
>

-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list