[squeak-dev] 32 vs 64 bits and large integer hash
luchiano at gmail.com
Mon Nov 26 10:34:48 UTC 2018
Perhaps it is best to change Set>>scanFor:, as I suggested in other mail. I
think it might be best because: 1) adheres better to KISS principles,
keeping Integer>>hash unchanged and simple; 2) it might address similar
performance problems with other types of objects, not only SmallIntegers;
3) leaves the responsibility of 'mixing' the hash bits to the hashed
collections where it seems to belong.
If we do this, then we can have simpler requirements for hash; it has to be
a SmallInteger, equal objects must have equal hash, and ideally it has to
be well distributed. But it doesn't need to look random, so similar objects
can have similar hashes (as it is currently the case with SmallInteger, and
maybe with Strings too? it might be even desirable for similar Strings to
have similar hashes). Then the only required change is in Dictionary and
start _ (anObject hash \\ array size) + 1.
should be changed to:
start _ (anObject hash hashMultiply \\ array size) + 1.
(or something similar, anything that 'mixes' the bits of the hash, perhaps
with a primitive that does this and the modulus all at once). This would
ensure that lookup for hashed collections is done in constant time and
doesn't degenerate into linear searches, at a very small cost (the lookup
time is increased slightly due to the extra hashMultiply).
And there's always the alternative of PluggableSet / PluggableDictionary
for the cases when a custom hash is desired.
My other 2 cents,
On Mon, Nov 26, 2018 at 4:14 AM Luciano Notarfrancesco <luchiano at gmail.com>
> On Mon, Nov 26, 2018 at 4:05 AM Luciano Notarfrancesco <luchiano at gmail.com>
>> 1 to 254 have probability 1/254 while 0 has probability 2/254
> Oops, that should be 1/256 and 2/256, sorry.
> When probabilities don't sum 1, unexpected things can happen ;)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Squeak-dev