Color>>#hash

David Chase chase at world.std.com
Mon Mar 1 03:25:01 UTC 1999


Arrrrgggg.

Some of us (basically, friends and friends of friends, where
the FoFs are hashing experts) proposed a better hash function
for Java Strings, apparently with no luck.  It would be
REALLY NICE if hash functions actually did a decent job of
splattering almost every set of inputs into what looks like
noise (such functions exist) first, and then people worried
about performance of the hash function.  In the case of the
three values R, G, B, a good hash might be something
like 

  K1 * R + K2 * G + K3 * B

where K1, K2, and K3 are random numbers, and NOT powers of
two minus 1 (as is the case for Java Strings).  It's 
better if K1, K2, and K3 happen to be largish primes.
If the hash value is supposed to be positive, then you 
can do slightly better by taking the raw result (treated
as an unsigned number) and take its value modulo 0x7fffffff
(a prime, I think) which can be computed with (in Java)

    t0 += (t0>>31) & 0x7fffffff;        /* -1 <= t0 <= 2^{31}-1      */

If the input (t0) can have the value 0x80000000 then
you need to do this a second time, since doing it once
only gets you to

   0xffffffff (== -1 == 0x7ffffffe mod 0x7fffffff)

Another possibility is residues in GF(2**N), if you want 
to treat the color as 4 8-bit numbers and do the table
lookup, but in our work with hash functions for Java
strings, we got better performance from multiplication
by constants, though we did have to fool around quite
a bit to get the performance up.

And yes, I think that in general the DEFAULT hash function
should be a good one.  If an "expert" wishes to substitute
a fast one with a rotten distribution, that is the sort
of foot-shooting game that "experts" often play.

Sorry for the rant, but I've about had it with the sort
of "golly, if we don't do this brain-dead thing, it will
be slow" by alleged computer scientists who don't even
take the time to see how well a proper hash function
(or algorithm, or whatever) can be made to run if they
really think about the problem.

David Chase
NaturalBridge LLC





More information about the Squeak-dev mailing list