[squeak-dev] 32 vs 64 bits and large integer hash
eliot.miranda at gmail.com
Mon Nov 26 16:02:33 UTC 2018
> On Nov 26, 2018, at 2:34 AM, Luciano Notarfrancesco <luchiano at gmail.com> wrote:
> Perhaps it is best to change Set>>scanFor:, as I suggested in other mail. I think it might be best because: 1) adheres better to KISS principles, keeping Integer>>hash unchanged and simple; 2) it might address similar performance problems with other types of objects, not only SmallIntegers; 3) leaves the responsibility of 'mixing' the hash bits to the hashed collections where it seems to belong.
> If we do this, then we can have simpler requirements for hash; it has to be a SmallInteger, equal objects must have equal hash, and ideally it has to be well distributed. But it doesn't need to look random, so similar objects can have similar hashes (as it is currently the case with SmallInteger, and maybe with Strings too? it might be even desirable for similar Strings to have similar hashes). Then the only required change is in Dictionary and Set scanFor:
> start _ (anObject hash \\ array size) + 1.
> should be changed to:
> start _ (anObject hash hashMultiply \\ array size) + 1.
> (or something similar, anything that 'mixes' the bits of the hash, perhaps with a primitive that does this and the modulus all at once). This would ensure that lookup for hashed collections is done in constant time and doesn't degenerate into linear searches, at a very small cost (the lookup time is increased slightly due to the extra hashMultiply).
I find this much more compelling. I have no objection to this and it suggests an obvious extension to the hashMultiply primitive, which is to support large Integer receivers.
(p.s. _ in place of := is deprecated).
> And there's always the alternative of PluggableSet / PluggableDictionary for the cases when a custom hash is desired.
> My other 2 cents,
>> On Mon, Nov 26, 2018 at 4:14 AM Luciano Notarfrancesco <luchiano at gmail.com> wrote:
>>> On Mon, Nov 26, 2018 at 4:05 AM Luciano Notarfrancesco <luchiano at gmail.com> wrote:
>>> 1 to 254 have probability 1/254 while 0 has probability 2/254
>> Oops, that should be 1/256 and 2/256, sorry.
>> When probabilities don't sum 1, unexpected things can happen ;)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Squeak-dev