[squeak-dev] Re: Hash changes
Levente Uzonyi
leves at elte.hu
Mon Nov 30 23:54:15 UTC 2009
On Mon, 30 Nov 2009, Andreas Raab wrote:
>> | test array |
>> Transcript open; cr.
>> test := Dictionary new.
>
> "There used to be an old issue with growth of Sets that were initially
> sized with goodSizeFor: 3. Try working around it by starting at 5."
> test := Dictionary new: 5.
>
This shouldn't be an issue in this case. The capacity of
new HashedCollections are already modified to be carefully selected
primes. AFAIK only hashMultiply doesn't like 5 as capacity but it's not
involved here. Anyway I added this to the new test.
> "Having sequences of objects with strictly consecutive hashes is not
> representative for real systems. Instead, create 4096 objects and choose 100
> atRandom."
>
> objSet := Array new: 4096 streamContents: [ :stream |
> 4096 timesRepeat: [ stream nextPut: Object new ] ].
> array := (1 to: 100) collect:[:i| objSet atRandom].
>
Good point but the hashes are not consecutive (at least with my vm). Since
we need 10000 different Objects I added this idea to the test with some
modifications. 10000 Objects are created, then shuffled and used when needed.
The modified test:
| test array objects |
Transcript open; cr.
test := Dictionary new: 5.
objects := ((1 to: 10000) collect: [ :each | Object new ]) shuffled readStream.
[ test size >= 10000 ] whileFalse: [
Smalltalk garbageCollect.
array := Array new: 100 streamContents: [ :stream |
100 timesRepeat: [ stream nextPut: objects next ] ]. "Create 100 new Objects"
Transcript
print: [ "Add the 100 new Objects to the Dictionary as keys with nil values"
1 to: array size do: [ :index |
test at: (array at: index) put: nil ] ] timeToRun;
tab;
print: test size;
tab;
print: [ "Access the added Objects 1000 times"
1 to: 1000 do: [ :iteration |
1 to: array size do: [ :index |
test at: (array at: index) ] ] ] timeToRun;
tab;
print: [ "This should be substracted from the previous value"
1 to: 1000 do: [ :iteration |
1 to: array size do: [ :index |
array at: index ] ] ] timeToRun;
cr;
flush ]
The results:
Updated image:
0 100 93 4
0 200 144 4
0 300 125 4
0 400 105 3
0 500 111 4
0 600 150 4
0 700 115 3
-- snip --
0 9600 297 3
0 9700 296 4
0 9800 299 5
1 9900 312 4
0 10000 309 4
Values in the third column are increasing somewhat, but not
significantly, no spikes etc.
Current image:
0 100 76 3
1 200 119 4
0 300 105 4
0 400 80 4
0 500 112 4
0 600 111 3
1 700 98 4
-- snip --
1 3700 1279 4
2 3800 1998 4
3 3900 3196 4
5 4000 4473 4
11 4100 11301 4
19 4200 18823 4
36 4300 36391 3
93 4400 45294 4
47 4500 46674 3
...
Around 4000 things get really slow. There are also smaller spikes near
2500. I didn't wait for 4600. As you can see in the first column #at:put:
is consuming more time every iteration.
Levente
More information about the Squeak-dev
mailing list
|