[squeak-dev] Re: Hash changes

Andreas Raab andreas.raab at gmx.de
Tue Dec 1 07:56:13 UTC 2009


Nicolas Cellier wrote:
> Yes, that's what I tried to explain previously...
> The change does not change the RATE of collisions, only the COST of collisions.
> But I must be just bad in English ;)

Nope, my disconnect was in not realizing the devastating effects of 
filling *all* the initial slots and creating a single collision chain as 
the result. I thought you were saying something along the lines that 
creating objects with strictly consecutive hashes, i.e., hash(n+1) = 
hash(n)+1 would lead to the creation of some degenerate collision chain 
which is why I was asking to randomize input to see if the behavior is 
caused by that dependency.

Cheers,
   - Andreas

> 
> Cheers
> 
> Nicolas
> 
>> Levente Uzonyi wrote:
>>> On Mon, 30 Nov 2009, Andreas Raab wrote:
>>>
>>>>> | test array |
>>>>> Transcript open; cr.
>>>>> test := Dictionary new.
>>>> "There used to be an old issue with growth of Sets that were initially
>>>> sized with goodSizeFor: 3. Try working around it by starting at 5."
>>>> test := Dictionary new: 5.
>>>>
>>> This shouldn't be an issue in this case. The capacity of new
>>> HashedCollections are already modified to be carefully selected primes.
>>> AFAIK only hashMultiply doesn't like 5 as capacity but it's not involved
>>> here. Anyway I added this to the new test.
>>>
>>>> "Having sequences of objects with strictly consecutive hashes is not
>>>> representative for real systems. Instead, create 4096 objects and choose 100
>>>> atRandom."
>>>>
>>>> objSet := Array new: 4096 streamContents: [ :stream |
>>>>    4096 timesRepeat: [ stream nextPut: Object new ] ].
>>>> array := (1 to: 100) collect:[:i| objSet atRandom].
>>>>
>>> Good point but the hashes are not consecutive (at least with my vm). Since
>>> we need 10000 different Objects I added this idea to the test with some
>>> modifications. 10000 Objects are created, then shuffled and used when
>>> needed.
>>>
>>> The modified test:
>>>
>>> | test array objects |
>>> Transcript open; cr.
>>> test := Dictionary new: 5.
>>> objects := ((1 to: 10000) collect: [ :each | Object new ]) shuffled
>>> readStream.
>>> [ test size >= 10000 ] whileFalse: [
>>>    Smalltalk garbageCollect.
>>>    array := Array new: 100 streamContents: [ :stream |
>>>        100 timesRepeat: [ stream nextPut: objects next  ] ]. "Create 100
>>> new Objects"
>>>    Transcript
>>>        print: [ "Add the 100 new Objects to the Dictionary as keys with
>>> nil values"
>>>            1 to: array size do: [ :index |
>>>                test at: (array at: index) put: nil ] ] timeToRun;
>>>        tab;
>>>        print: test size;
>>>        tab;
>>>        print: [ "Access the added Objects 1000 times"
>>>            1 to: 1000 do: [ :iteration |
>>>                1 to: array size do: [ :index |
>>>                    test at: (array at: index) ] ] ] timeToRun;
>>>        tab;
>>>        print: [ "This should be substracted from the previous value"
>>>            1 to: 1000 do: [ :iteration |
>>>                1 to: array size do: [ :index |
>>>                    array at: index ] ] ] timeToRun;
>>>        cr;
>>>        flush ]
>>>
>>> The results:
>>> Updated image:
>>> 0    100    93    4
>>> 0    200    144    4
>>> 0    300    125    4
>>> 0    400    105    3
>>> 0    500    111    4
>>> 0    600    150    4
>>> 0    700    115    3
>>> -- snip --
>>> 0    9600    297    3
>>> 0    9700    296    4
>>> 0    9800    299    5
>>> 1    9900    312    4
>>> 0    10000    309    4
>>>
>>> Values in the third column are increasing somewhat, but not significantly,
>>> no spikes etc.
>>>
>>> Current image:
>>> 0    100    76    3
>>> 1    200    119    4
>>> 0    300    105    4
>>> 0    400    80    4
>>> 0    500    112    4
>>> 0    600    111    3
>>> 1    700    98    4
>>> -- snip --
>>> 1    3700    1279    4
>>> 2    3800    1998    4
>>> 3    3900    3196    4
>>> 5    4000    4473    4
>>> 11    4100    11301    4
>>> 19    4200    18823    4
>>> 36    4300    36391    3
>>> 93    4400    45294    4
>>> 47    4500    46674    3
>>> ...
>>>
>>> Around 4000 things get really slow. There are also smaller spikes near
>>> 2500. I didn't wait for 4600. As you can see in the first column #at:put: is
>>> consuming more time every iteration.
>>>
>>>
>>> Levente
>>>
>>>
>>
>>
> 
> 




More information about the Squeak-dev mailing list