identiyHash discussion

Scott A Crosby crosby at qwes.math.cmu.edu
Tue Dec 11 23:25:29 UTC 2001


On Tue, 11 Dec 2001, Tim Rowledge wrote:

> Not sure if this small point was ever addressed:-
>
> Scott A Crosby wrote:
>
>
> > Thus, the added layer of 'longHash' seems ill-defined and pointless. If an
> > object needs more hash bits, can it not store those bits as an internal
> > field, and override 'hash' with a new function returning those bits?
> Not in the sense of keeping a larger hash in the object header, no.
> There is a limit on the space available and 12 bits is it.
>

This was not what I meant.. See below:

> Now, we could extend the object format to address this; I seem to
> remember that at least one Smalltalk (Tek?) allowed for an extra word to
> be added to the object to hold a 32bit hash if needed. However, it is
> pretty rarely needed and so I'm not sure the complication would be worth
> it. In Squeak one could conceivably add an optional fourth header word

Its unnecessary. We can do all of this in smalltalk with no vm
changes.

> to allow a similar capability, but that would also add complications.
> The annoying part of any of these possibilties is that the hash value is
> not often needed and so it hurts to pay the price.
>

Exactly. Extra hash bits are unlikely to be needed in practice. I fully
agree, there should not be any VM changes... Fortunately, code like the
following would work fine.

--------

FooClass>>new
   super new.
   hashCode := SmallInteger returnRandomInteger.

FooClass>>hash
   ^hashCode

FooClass>>= object
   ^self == object

----

And/or, we can make a new method:

Class>>largeHashSubclass:instanceVariableNames:classVariableNames:poolDictionaries:category:

which creates the two above methods automatically.

(Should I overload #identityHash instead of #hash, but I'm not sure if
this breaks things. Essentially, we're faking an identityHash with that
extra field)

--

I'm not offering a solution for handling all large collections
effectively. I'm just offering a solution to *fix a bug* in collections
that are >4000 elements.

Assuming a load factor of .5:

                  ---      # probes into array needed #
Collection size   ---  old code ---  my code --- ideal.
1000                          2            2         2
2000                          2            2         2
3000                          4            2         2
4000                       2000            2         2
5000                       2500            2.5       2
8000                       4000            4         2
10000                      5000            5         2
100000                    50000           50         2
1000000                  500000          500         2

(where ideal requires a hash that can result in 2^30 distinct values)

I'm not offering a solution that gives us the last column. I'm fixing the
bug that gives us the first column.

Nor am I suggesting that we, in general, need to change the VM to give us
the last column. In the general case, it'll be a rarity.

And, if, the rarity occurs, and someone has a collection so large that
they feel they need the last column instead of the 'my code' column, they
can use the code above to add a seperate hashCode field to their class.

Scott






More information about the Squeak-dev mailing list