New method cache, 30% faster macrobenchmarks and inefficiencies.

Mon Dec 10 16:16:37 UTC 2001

Perhaps I am misunderstanding this proposed longer hash
function, but you might be better served (in general) by
a "better" hash function, that did not use a number that
happens to be a power-of-two plus one.  You might try
a "universal hash function" along the lines described
by Wegman and Carter (I don't have the reference).
I spent some idle time a couple of years ago playing with
"better" hash functions for Java strings, working from
discussions with Wegman, and it was possible to arrive at
functions that were still quite fast, yet had "good" statistical
properties (insensitive to table size, low probability of
collisions of hash values, etc).  (Java's hash functions
are generally poor-to-horrible.)

If you're just looking for a "better" factor, you might
try (say) 0x1461bf27.  It's prime, it's also the lower 32
bits of an irreducible polynomial in GF(2**32), it's also
got 16 set bits.  If you need a smaller number, you might
try something like 0x1a89 -- it's prime, it's the lower
bits of an irreducible poly in GF(2**13), it's got 6 set
bits.  (I can make more of these, obviously).

At 04:07 AM 12/10/2001 +0100, Stephan Rudlof wrote:

>[100000 timesRepeat: ['hy!' longIdentityHash]] timeToRun
>  586 586 586
>[100000 timesRepeat: ['hy!' identityHash]] timeToRun
>  467 466 471
>
>The methods for this test in ProtoObject:
>
>longIdentityHash
>         ^ self identityHash * 4097
>
>identityHash
>...
>This isn't worth introducing a new prim, isn't it?