[squeak-dev] Re: Hash changes
Andreas Raab
andreas.raab at gmx.de
Tue Dec 1 07:56:13 UTC 2009
Nicolas Cellier wrote:
> Yes, that's what I tried to explain previously...
> The change does not change the RATE of collisions, only the COST of collisions.
> But I must be just bad in English ;)
Nope, my disconnect was in not realizing the devastating effects of
filling *all* the initial slots and creating a single collision chain as
the result. I thought you were saying something along the lines that
creating objects with strictly consecutive hashes, i.e., hash(n+1) =
hash(n)+1 would lead to the creation of some degenerate collision chain
which is why I was asking to randomize input to see if the behavior is
caused by that dependency.
Cheers,
- Andreas
>
> Cheers
>
> Nicolas
>
>> Levente Uzonyi wrote:
>>> On Mon, 30 Nov 2009, Andreas Raab wrote:
>>>
>>>>> | test array |
>>>>> Transcript open; cr.
>>>>> test := Dictionary new.
>>>> "There used to be an old issue with growth of Sets that were initially
>>>> sized with goodSizeFor: 3. Try working around it by starting at 5."
>>>> test := Dictionary new: 5.
>>>>
>>> This shouldn't be an issue in this case. The capacity of new
>>> HashedCollections are already modified to be carefully selected primes.
>>> AFAIK only hashMultiply doesn't like 5 as capacity but it's not involved
>>> here. Anyway I added this to the new test.
>>>
>>>> "Having sequences of objects with strictly consecutive hashes is not
>>>> representative for real systems. Instead, create 4096 objects and choose 100
>>>> atRandom."
>>>>
>>>> objSet := Array new: 4096 streamContents: [ :stream |
>>>> 4096 timesRepeat: [ stream nextPut: Object new ] ].
>>>> array := (1 to: 100) collect:[:i| objSet atRandom].
>>>>
>>> Good point but the hashes are not consecutive (at least with my vm). Since
>>> we need 10000 different Objects I added this idea to the test with some
>>> modifications. 10000 Objects are created, then shuffled and used when
>>> needed.
>>>
>>> The modified test:
>>>
>>> | test array objects |
>>> Transcript open; cr.
>>> test := Dictionary new: 5.
>>> objects := ((1 to: 10000) collect: [ :each | Object new ]) shuffled
>>> readStream.
>>> [ test size >= 10000 ] whileFalse: [
>>> Smalltalk garbageCollect.
>>> array := Array new: 100 streamContents: [ :stream |
>>> 100 timesRepeat: [ stream nextPut: objects next ] ]. "Create 100
>>> new Objects"
>>> Transcript
>>> print: [ "Add the 100 new Objects to the Dictionary as keys with
>>> nil values"
>>> 1 to: array size do: [ :index |
>>> test at: (array at: index) put: nil ] ] timeToRun;
>>> tab;
>>> print: test size;
>>> tab;
>>> print: [ "Access the added Objects 1000 times"
>>> 1 to: 1000 do: [ :iteration |
>>> 1 to: array size do: [ :index |
>>> test at: (array at: index) ] ] ] timeToRun;
>>> tab;
>>> print: [ "This should be substracted from the previous value"
>>> 1 to: 1000 do: [ :iteration |
>>> 1 to: array size do: [ :index |
>>> array at: index ] ] ] timeToRun;
>>> cr;
>>> flush ]
>>>
>>> The results:
>>> Updated image:
>>> 0 100 93 4
>>> 0 200 144 4
>>> 0 300 125 4
>>> 0 400 105 3
>>> 0 500 111 4
>>> 0 600 150 4
>>> 0 700 115 3
>>> -- snip --
>>> 0 9600 297 3
>>> 0 9700 296 4
>>> 0 9800 299 5
>>> 1 9900 312 4
>>> 0 10000 309 4
>>>
>>> Values in the third column are increasing somewhat, but not significantly,
>>> no spikes etc.
>>>
>>> Current image:
>>> 0 100 76 3
>>> 1 200 119 4
>>> 0 300 105 4
>>> 0 400 80 4
>>> 0 500 112 4
>>> 0 600 111 3
>>> 1 700 98 4
>>> -- snip --
>>> 1 3700 1279 4
>>> 2 3800 1998 4
>>> 3 3900 3196 4
>>> 5 4000 4473 4
>>> 11 4100 11301 4
>>> 19 4200 18823 4
>>> 36 4300 36391 3
>>> 93 4400 45294 4
>>> 47 4500 46674 3
>>> ...
>>>
>>> Around 4000 things get really slow. There are also smaller spikes near
>>> 2500. I didn't wait for 4600. As you can see in the first column #at:put: is
>>> consuming more time every iteration.
>>>
>>>
>>> Levente
>>>
>>>
>>
>>
>
>
More information about the Squeak-dev
mailing list
|