[squeak-dev] Re: Hash changes
Andreas Raab
andreas.raab at gmx.de
Tue Dec 1 01:45:27 UTC 2009
Thanks. Now I see what's happening. The problem is that adding objects
without scaled hash to a "regular" Dictionary creates hugely
disproportional collision chain(s) as soon as we get north of 4096
objects. At this point all the possible entries that any new object
could have by default are already taken and since we use a linear
collision chain they combine into one gigantonormeous (hope I spelled
that correctly :-) collision chain.
That explains the difference - for larger number of elements we end up
(with your changes) with an average length of the collision chain being
(dict size // 4096) for both adding and accessing where without the
scale we end up with a collision chain of (dict size) length (for
adding) and (dict size // 2) for accessing elements once all possible
initial slots are filled up and spill over.
(and we should probably add a comment explaining the issue)
Cheers,
- Andreas
Levente Uzonyi wrote:
> On Mon, 30 Nov 2009, Andreas Raab wrote:
>
>>> | test array |
>>> Transcript open; cr.
>>> test := Dictionary new.
>>
>> "There used to be an old issue with growth of Sets that were initially
>> sized with goodSizeFor: 3. Try working around it by starting at 5."
>> test := Dictionary new: 5.
>>
>
> This shouldn't be an issue in this case. The capacity of new
> HashedCollections are already modified to be carefully selected primes.
> AFAIK only hashMultiply doesn't like 5 as capacity but it's not involved
> here. Anyway I added this to the new test.
>
>> "Having sequences of objects with strictly consecutive hashes is not
>> representative for real systems. Instead, create 4096 objects and
>> choose 100 atRandom."
>>
>> objSet := Array new: 4096 streamContents: [ :stream |
>> 4096 timesRepeat: [ stream nextPut: Object new ] ].
>> array := (1 to: 100) collect:[:i| objSet atRandom].
>>
>
> Good point but the hashes are not consecutive (at least with my vm).
> Since we need 10000 different Objects I added this idea to the test with
> some modifications. 10000 Objects are created, then shuffled and used
> when needed.
>
> The modified test:
>
> | test array objects |
> Transcript open; cr.
> test := Dictionary new: 5.
> objects := ((1 to: 10000) collect: [ :each | Object new ]) shuffled
> readStream.
> [ test size >= 10000 ] whileFalse: [
> Smalltalk garbageCollect.
> array := Array new: 100 streamContents: [ :stream |
> 100 timesRepeat: [ stream nextPut: objects next ] ]. "Create
> 100 new Objects"
> Transcript
> print: [ "Add the 100 new Objects to the Dictionary as keys with
> nil values"
> 1 to: array size do: [ :index |
> test at: (array at: index) put: nil ] ] timeToRun;
> tab;
> print: test size;
> tab;
> print: [ "Access the added Objects 1000 times"
> 1 to: 1000 do: [ :iteration |
> 1 to: array size do: [ :index |
> test at: (array at: index) ] ] ] timeToRun;
> tab;
> print: [ "This should be substracted from the previous value"
> 1 to: 1000 do: [ :iteration |
> 1 to: array size do: [ :index |
> array at: index ] ] ] timeToRun;
> cr;
> flush ]
>
> The results:
> Updated image:
> 0 100 93 4
> 0 200 144 4
> 0 300 125 4
> 0 400 105 3
> 0 500 111 4
> 0 600 150 4
> 0 700 115 3
> -- snip --
> 0 9600 297 3
> 0 9700 296 4
> 0 9800 299 5
> 1 9900 312 4
> 0 10000 309 4
>
> Values in the third column are increasing somewhat, but not
> significantly, no spikes etc.
>
> Current image:
> 0 100 76 3
> 1 200 119 4
> 0 300 105 4
> 0 400 80 4
> 0 500 112 4
> 0 600 111 3
> 1 700 98 4
> -- snip --
> 1 3700 1279 4
> 2 3800 1998 4
> 3 3900 3196 4
> 5 4000 4473 4
> 11 4100 11301 4
> 19 4200 18823 4
> 36 4300 36391 3
> 93 4400 45294 4
> 47 4500 46674 3
> ...
>
> Around 4000 things get really slow. There are also smaller spikes near
> 2500. I didn't wait for 4600. As you can see in the first column
> #at:put: is consuming more time every iteration.
>
>
> Levente
>
>
More information about the Squeak-dev
mailing list
|