[BUG] Unscalability in collections.

Scott A Crosby crosby at qwes.math.cmu.edu
Sun Oct 7 03:51:25 UTC 2001


Resubmitted so that it appears under the [BUG] heading.

The fix is trivial, getting it into the image isn't. See the thread for
gory details of why, or just trust my word. :)

Scott



---------- Forwarded message ----------
Date: Fri, 5 Oct 2001 15:37:22 -0400 (EDT)
From: Scott A Crosby <crosby at qwes.math.cmu.edu>
Reply-To: squeak-dev at lists.squeakfoundation.org
To: Squeak List <squeak-dev at lists.squeakfoundation.org>
Subject: Horrible unscalability, how do I change hashBitsOf: ?

Hello, I've been continuing my profiling and I found one 'interesting'
surprise.

The hashing strategy used by dictionaries and sets is non-optimal if the
hashcode returned is not over a wide range..

Thats because, you search from

 'hash \\ array size' in a circular loop through the dictionary.

But, if hash is 'primitiveAsOop', its max range is 4096, so if you store
10,000 objects in a set, dictionary, other, your dictionary will change
from O((1/.25)*10,000/4096) to O(10000) to scan the dictionary.

Fortunately, symbols use the string hash code, and not hashBitOf, thus
they create a full range of outputs, but if you just want to store a
dictionary or set of other objects, the above code will fall over as soon
as you try to store more than a couple of thousand.  (I saw this when I
noticed how much overhead macroBenchmark2 spent on hashing symbols. If
symbol dictionaries/sets are relatively small, you can get away with it.

The fix is to make the following change:


```
hashBitsOf: oop
	^ ((self baseHeader: oop)
	 	>> 17 bitAnd: 4095) << 10
'''

To artificially increase the range. Now, this change will corrupt every
Set, Dictionary, and who knows what else in the image.

But, as a bonus, it would nicely IdentitySet, IdentityDictionary, and
make them almost no-ops, except for invoking identityHash instead of
hash on the keys. (Make a method called 'hashKey' which gets overloaded by
Identity*, and the two classes have but one method.)


Think its doable, or something I should flee like the plague, cause it is
too hard and won't make too much difference?


Scott


--
No DVD movie will ever enter the public domain, nor will any CD. The last CD
and the last DVD will have moldered away decades before they leave copyright.
This is not encouraging the creation of knowledge in the public domain.







More information about the Squeak-dev mailing list