[squeak-dev] Re: Hash changes
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Tue Dec 1 07:42:21 UTC 2009
2009/12/1 Andreas Raab <andreas.raab at gmx.de>:
> Thanks. Now I see what's happening. The problem is that adding objects
> without scaled hash to a "regular" Dictionary creates hugely disproportional
> collision chain(s) as soon as we get north of 4096 objects. At this point
> all the possible entries that any new object could have by default are
> already taken and since we use a linear collision chain they combine into
> one gigantonormeous (hope I spelled that correctly :-) collision chain.
>
> That explains the difference - for larger number of elements we end up (with
> your changes) with an average length of the collision chain being (dict size
> // 4096) for both adding and accessing where without the scale we end up
> with a collision chain of (dict size) length (for adding) and (dict size //
> 2) for accessing elements once all possible initial slots are filled up and
> spill over.
>
> (and we should probably add a comment explaining the issue)
>
> Cheers,
> - Andreas
>
Yes, that's what I tried to explain previously...
The change does not change the RATE of collisions, only the COST of collisions.
But I must be just bad in English ;)
Cheers
Nicolas
> Levente Uzonyi wrote:
>>
>> On Mon, 30 Nov 2009, Andreas Raab wrote:
>>
>>>> | test array |
>>>> Transcript open; cr.
>>>> test := Dictionary new.
>>>
>>> "There used to be an old issue with growth of Sets that were initially
>>> sized with goodSizeFor: 3. Try working around it by starting at 5."
>>> test := Dictionary new: 5.
>>>
>>
>> This shouldn't be an issue in this case. The capacity of new
>> HashedCollections are already modified to be carefully selected primes.
>> AFAIK only hashMultiply doesn't like 5 as capacity but it's not involved
>> here. Anyway I added this to the new test.
>>
>>> "Having sequences of objects with strictly consecutive hashes is not
>>> representative for real systems. Instead, create 4096 objects and choose 100
>>> atRandom."
>>>
>>> objSet := Array new: 4096 streamContents: [ :stream |
>>> 4096 timesRepeat: [ stream nextPut: Object new ] ].
>>> array := (1 to: 100) collect:[:i| objSet atRandom].
>>>
>>
>> Good point but the hashes are not consecutive (at least with my vm). Since
>> we need 10000 different Objects I added this idea to the test with some
>> modifications. 10000 Objects are created, then shuffled and used when
>> needed.
>>
>> The modified test:
>>
>> | test array objects |
>> Transcript open; cr.
>> test := Dictionary new: 5.
>> objects := ((1 to: 10000) collect: [ :each | Object new ]) shuffled
>> readStream.
>> [ test size >= 10000 ] whileFalse: [
>> Smalltalk garbageCollect.
>> array := Array new: 100 streamContents: [ :stream |
>> 100 timesRepeat: [ stream nextPut: objects next ] ]. "Create 100
>> new Objects"
>> Transcript
>> print: [ "Add the 100 new Objects to the Dictionary as keys with
>> nil values"
>> 1 to: array size do: [ :index |
>> test at: (array at: index) put: nil ] ] timeToRun;
>> tab;
>> print: test size;
>> tab;
>> print: [ "Access the added Objects 1000 times"
>> 1 to: 1000 do: [ :iteration |
>> 1 to: array size do: [ :index |
>> test at: (array at: index) ] ] ] timeToRun;
>> tab;
>> print: [ "This should be substracted from the previous value"
>> 1 to: 1000 do: [ :iteration |
>> 1 to: array size do: [ :index |
>> array at: index ] ] ] timeToRun;
>> cr;
>> flush ]
>>
>> The results:
>> Updated image:
>> 0 100 93 4
>> 0 200 144 4
>> 0 300 125 4
>> 0 400 105 3
>> 0 500 111 4
>> 0 600 150 4
>> 0 700 115 3
>> -- snip --
>> 0 9600 297 3
>> 0 9700 296 4
>> 0 9800 299 5
>> 1 9900 312 4
>> 0 10000 309 4
>>
>> Values in the third column are increasing somewhat, but not significantly,
>> no spikes etc.
>>
>> Current image:
>> 0 100 76 3
>> 1 200 119 4
>> 0 300 105 4
>> 0 400 80 4
>> 0 500 112 4
>> 0 600 111 3
>> 1 700 98 4
>> -- snip --
>> 1 3700 1279 4
>> 2 3800 1998 4
>> 3 3900 3196 4
>> 5 4000 4473 4
>> 11 4100 11301 4
>> 19 4200 18823 4
>> 36 4300 36391 3
>> 93 4400 45294 4
>> 47 4500 46674 3
>> ...
>>
>> Around 4000 things get really slow. There are also smaller spikes near
>> 2500. I didn't wait for 4600. As you can see in the first column #at:put: is
>> consuming more time every iteration.
>>
>>
>> Levente
>>
>>
>
>
>
More information about the Squeak-dev
mailing list
|