From what I understand:
It does not reduce the rate of hash collisions. It just reduces the cost of hash collision by avoiding consecutive hash. Consecutive hash are evil because in case of collision scanFor: tend to be of linear-cost (scan all consecutive slots until an empty one)
Oh, but you participated to this discussion, didn't you ? http://bugs.squeak.org/view.php?id=1876
( via http://code.google.com/p/pharo/issues/detail?id=213 )
Nicolas
2009/11/30 Andreas Raab andreas.raab@gmx.de:
I'll be happy to do this but can someone summarize the expected effects of those changes? The main thing I don't understand is why it is advantageous to simply scale the id-hash given that it doesn't actually increase the range of hashes, i.e., id-hash is still limited to 12 bits so no matter how much you multiply these values the result will still only be 4096 distinct values. Why is it advantageous to make these numbers artificially larger without increasing the number of hash bits?
Cheers, - Andreas
David T. Lewis wrote:
I think that this is important work. Most of the background discussion took place on the Pharo list, so some may not have seen it. I personally am not qualified to do anything with this, but hopefully others can take a look at it and incorporate it in trunk if appropriate.
Dave
On Mon, Nov 30, 2009 at 06:26:21AM +0100, Levente Uzonyi wrote:
Hi,
I uploaded the second part of Andr?s' hash changes to the inbox. Every package requires a new mcm. The load order is:
Collections-ul.217 "Initialization" Kernel-ul.308 "Object >> #hash" Kernel-ul.309 Kernel-ul.310 Collections-ul.218 "IdentityDictionary" Collections-ul.219 Collections-ul.220 "KeyedIdentitySet" Collections-ul.221 Collections-ul.222 "WeakIdentityKeyDictionary" Collections-ul.223 Collections-ul.224 "IdentitySet" Collections-ul.225 System-ul.185 "SystemDictionary" System-ul.186 System-ul.187 Collections-ul.226 "Cleanup" Collections-ul.227
These packages modify the following:
- Object >> #hash returns an integer between 0 and 1073479680 instead of
0 and 4096, though the range is not continuous
- identity-based collections use the same range extension method as
Object >> #hash
- SystemDictionary uses #hash instead of #identityHash
- all preambles and postscripts are removed from Collections, Kernel and
System packages
Cheers, Levente