Nicolas Cellier wrote:
Yes, that's what I tried to explain previously... The change does not change the RATE of collisions, only the COST of collisions. But I must be just bad in English ;)
Nope, my disconnect was in not realizing the devastating effects of filling *all* the initial slots and creating a single collision chain as the result. I thought you were saying something along the lines that creating objects with strictly consecutive hashes, i.e., hash(n+1) = hash(n)+1 would lead to the creation of some degenerate collision chain which is why I was asking to randomize input to see if the behavior is caused by that dependency.
Cheers, - Andreas
Cheers
Nicolas
Levente Uzonyi wrote:
On Mon, 30 Nov 2009, Andreas Raab wrote:
| test array | Transcript open; cr. test := Dictionary new.
"There used to be an old issue with growth of Sets that were initially sized with goodSizeFor: 3. Try working around it by starting at 5." test := Dictionary new: 5.
This shouldn't be an issue in this case. The capacity of new HashedCollections are already modified to be carefully selected primes. AFAIK only hashMultiply doesn't like 5 as capacity but it's not involved here. Anyway I added this to the new test.
"Having sequences of objects with strictly consecutive hashes is not representative for real systems. Instead, create 4096 objects and choose 100 atRandom."
objSet := Array new: 4096 streamContents: [ :stream | 4096 timesRepeat: [ stream nextPut: Object new ] ]. array := (1 to: 100) collect:[:i| objSet atRandom].
Good point but the hashes are not consecutive (at least with my vm). Since we need 10000 different Objects I added this idea to the test with some modifications. 10000 Objects are created, then shuffled and used when needed.
The modified test:
| test array objects | Transcript open; cr. test := Dictionary new: 5. objects := ((1 to: 10000) collect: [ :each | Object new ]) shuffled readStream. [ test size >= 10000 ] whileFalse: [ Smalltalk garbageCollect. array := Array new: 100 streamContents: [ :stream | 100 timesRepeat: [ stream nextPut: objects next ] ]. "Create 100 new Objects" Transcript print: [ "Add the 100 new Objects to the Dictionary as keys with nil values" 1 to: array size do: [ :index | test at: (array at: index) put: nil ] ] timeToRun; tab; print: test size; tab; print: [ "Access the added Objects 1000 times" 1 to: 1000 do: [ :iteration | 1 to: array size do: [ :index | test at: (array at: index) ] ] ] timeToRun; tab; print: [ "This should be substracted from the previous value" 1 to: 1000 do: [ :iteration | 1 to: array size do: [ :index | array at: index ] ] ] timeToRun; cr; flush ]
The results: Updated image: 0 100 93 4 0 200 144 4 0 300 125 4 0 400 105 3 0 500 111 4 0 600 150 4 0 700 115 3 -- snip -- 0 9600 297 3 0 9700 296 4 0 9800 299 5 1 9900 312 4 0 10000 309 4
Values in the third column are increasing somewhat, but not significantly, no spikes etc.
Current image: 0 100 76 3 1 200 119 4 0 300 105 4 0 400 80 4 0 500 112 4 0 600 111 3 1 700 98 4 -- snip -- 1 3700 1279 4 2 3800 1998 4 3 3900 3196 4 5 4000 4473 4 11 4100 11301 4 19 4200 18823 4 36 4300 36391 3 93 4400 45294 4 47 4500 46674 3 ...
Around 4000 things get really slow. There are also smaller spikes near 2500. I didn't wait for 4600. As you can see in the first column #at:put: is consuming more time every iteration.
Levente