Hash rehashed (was: [SqF]Report of VI4 Project for Feb '02)
Martin McClure
martin at hand2mouse.com
Tue Feb 5 04:01:27 UTC 2002
At 4:18 PM -0800 2/4/02, Tim Rowledge wrote:
>Is there any value in making the putative hash-multiplier be dependant
>upon the size of the collection into which the object is being put?
>The downside that immediately pops up is that this value would then
>change as you grow the collection which would presumably require moving
>all the previously placed objects; except of course when you grow a
>collection you have to copy all the entries to the new array anyway, at
>least in most cases.
>
>This could be implemented by adding an instvar to all collections,
>though I fear that would be rather wasteful since so many collections
>are small enough that the multiplier would be 1. If a single list of
>values would be good (ie all varieties of collection can use the same
>values) then a trivial prim could take the array size, the object hash
>and do the entire lookup/algorithm - multiply - modulo operation.
Hmm. Possible, but I don't find it attractive. The main reasons, at
least initially, are aesthetic.
Looking at the overall hashed collection design, it feels a lot more
like what's broken is identityHash, and that the collections are
fine. Thus, it feels more 'right' to fix identityHash.
It also looks like if we choose a multiplier (or other simple
algorithm) for producing a hash value from the header hash bits, and
we choose it *very* carefully, it will work for almost all collection
array sizes. The 'almost' could lead to a slightly uglier design,
having the collections avoid certain sizes (as Scott mentioned in his
reply.)
Another, more technical and less aesthetic, argument against fixing
it in the collections is that many collections are based on #hash,
not on #identityHash, but since the default implementation of #hash
is #identityHash the collection can't tell which objects it contains
would need to have their hash values multiplied. You don't want to
multiply other hash values, because if the implementers of that
object's #hash have been good the hash value spans the entire
SmallInteger range, and you'd end up pushing the computation into
large integer math, which would be a slowdown that is unnecessary in
this case.
-Martin
More information about the Squeak-dev
mailing list
|