[squeak-dev] The Inbox: Collections-cbc.812.mcz
leves at caesar.elte.hu
Fri Nov 16 10:42:11 UTC 2018
On Fri, 16 Nov 2018, commits at source.squeak.org wrote:
> Chris Cunningham uploaded a new version of Collections to project The Inbox:
> Item was changed:
> ----- Method: Interval>>hash (in category 'comparing') -----
> "Hash is reimplemented because = is implemented."
> ^(((start hash bitShift: 2)
> + bitOr: self last hash)
> - bitOr: stop hash)
> bitShift: 1)
> bitOr: self size!
As Bert pointed out, #bitOr: is not a good choice for hashing, because
it loses entropy. Also, I'd use #hashMultiply instead of ad-hoc bit
shifts to maximize the entropy of the hash value. E.g.:
^((start hash hashMultiply bitXor: self last hash) hashMultiply
bitXor: self size) hashMultiply
Below is a snippet comparing various hash functions:
hashes := Set new. "Original #hash"
hashes2 := Set new. "#hash from Collections-cbc.812"
hashes3 := Set new. "The above proposed #hash using #hashMultiply and #bitXor:"
intervals := Set new. "The maximum number of different #hash values"
Interval allInstancesDo: [ :each |
hashes add: each hash.
hashes2 add: each hash2.
hashes3 add: each hash3.
intervals add: each ].
intervals size }
"==> #(584 584 1113 1113)"
The sample is too small, but it suggests that the original #hash and the
#hash in Collections-cbc.812 are suboptimal.
If we create a few more intervals in a workspace:
newIntervals := (1 to: 10000) collect: [ :each | (1 to: each) ].
newIntervals2 := (1 to: 10000) collect: [ :each | (1 to: each by: 2) ].
newIntervals3 := (1 to: 10000) collect: [ :each | (each to: 1 by: -1) ].
then the numbers will be more accurate:
#(6122 5368 26244 27043)
More information about the Squeak-dev