Horrible unscalability, how do I change hashBitsOf: ?
Martin McClure
martin at hand2mouse.com
Sun Oct 7 03:32:33 UTC 2001
At 3:37 PM -0400 10/5/01, Scott A Crosby wrote:
[...]
>
>The hashing strategy used by dictionaries and sets is non-optimal if the
>hashcode returned is not over a wide range..
This is true. I believe that large hashed collections would be faster
if the hash values were spread more widely, even if the number of
values is still only 4096. However your solution...
>
>
>```
>hashBitsOf: oop
> ^ ((self baseHeader: oop)
> >> 17 bitAnd: 4095) << 10
>'''
...making each hash value a multiple of 1024, is not a good answer.
It creates a problem for many small collections. Those whose array
size is a power of two would all hash to the same index. And
MethodDictionaries are always sized at a power of two, and method
lookup uses identity hash.
Other power of two array sizes probably aren't very common, but "Set
new" sets the array size to 6, and Sets grow by doubling, so Sets and
Dictionaries created without a specific size will always have an
array size of 6 times a power of two. For small powers of two and
hash values that are always a multiple of 1024, only three hash
buckets ever get used.
The result of your fix, then, is a lot more linear probing within
small hashed collections, and a lot less linear probing within large
collections. The improvement to large collection lookup would
probably be overshadowed by the loss of efficiency in method lookup.
For a better answer I'd want to look at the considerable literature
on hashing, and also look at the previous discussion threads
previously mentioned:
At 9:55 AM -0700 10/6/01, Tim Rowledge wrote:
>I think a long conversation about hashing can be found in the list
>archive; 'fraid I don't remember any dates to help you. It's a perennial
>topic in Smalltalk.
...but it does occur to me that you might get better results by
multiplying by, say, 65537 than by 1024. Having only 4096 hash
values, but having them be (0..4095)*65537 would bring the overall
range to 28 bits, within shouting distance of SmallInteger limits
without bumping into them. Multiplication by 65537 is also fast even
on machines without fast integer multiply, it's a number or'ed with
itself shifted by 16 bits. 65537 is also prime, which avoids the
really degenerate cases where the collection's array size is evenly
divisible by the hash multiple, and many of the other undesirable
cases.
If you actually want to try it and see how well it works, the rest of
this message is relevant.
I'm slightly uncomfortable about modifying hashBitsOf:, just because
it would no longer be doing what its name suggests. Perhaps putting
the modifications in Interpreter>>primitiveAsOop would be better,
though you'd also have to change
Interpreter>>lookupMethodInDictionary. Maybe just changing the name
hashBitsOf: to something like identityHashOf: would make me happier.
At 3:37 PM -0400 10/5/01, Scott A Crosby wrote:
>
>To artificially increase the range. Now, this change will corrupt every
>Set, Dictionary, and who knows what else in the image.
Rehashing the collections isn't hard, the methods are there in Set to
find them all and rehash.
You have to make sure method lookup works during the transition.
Making a special transitional VM with
Interpreter>>lookupMethodInDictionary modified to always scan the
entire dictionary before returning false might be all that's
required. Commenting out the line
nextSelector=nilObj ifTrue: [^false].
looks like a way to do that, though the speed hit would mean you
wouldn't want to leave that change in after you'd converted your
image.
I don't know if anything else would break. Try it and see.
>
>But, as a bonus, it would nicely IdentitySet, IdentityDictionary, and
>make them almost no-ops, except for invoking identityHash instead of
>hash on the keys. (Make a method called 'hashKey' which gets overloaded by
>Identity*, and the two classes have but one method.)
>
>
>Think its doable, or something I should flee like the plague, cause it is
>too hard and won't make too much difference?
Magic 8 ball says... proceed with caution. :-)
-Martin
More information about the Squeak-dev
mailing list
|