Hash rehashed (was: [SqF]Report of VI4 Project for Feb '02)

Scott A Crosby crosby at qwes.math.cmu.edu
Sun Feb 3 09:04:11 UTC 2002


On Sat, 2 Feb 2002, Martin McClure wrote:

> At 4:36 PM -0500 2/2/02, Scott A Crosby wrote:
> [...]
> >

> The reason is that using 65537 as a factor always yields hash values
> that have bits 12-15 zero.

... when computed modulo 2^k k>12


>
> Scott, last time I looked you were using a factor of 4097. Are you
> still using this factor?
>

No. 200thousand-odd something.

Just pick a prime number p such that it* the hash value is large, doesn't
overflow, and p-1 doesn't have many small or obviously bad factors.

> Things that make 4097 less than ideal (though neither of these may be
> a big problem):
> * Produces hash values in the range (0 to: (2 raisedTo: 24) - 1); 24
> bits. The ideal range is the entire SmallInteger range, 31 bits
> (though in either case we can only have 4096 values spread sparsely
> throughout the range, unless we add more header bits.)

You want them evenly distributed over the size of the collection. If the
largest collection you think you'll see is 1m, then a prime in the 300's
would work OK. (Except for the problem where the collection array size and
the prime have a common factor.) Ergo, a nice big beefy prime number
thats unlikely to divide into the collection size by accident is a
better choice.

Scott.





More information about the Squeak-dev mailing list