identiyHash discussion
Scott A Crosby
crosby at qwes.math.cmu.edu
Tue Dec 11 23:25:29 UTC 2001
On Tue, 11 Dec 2001, Tim Rowledge wrote:
> Not sure if this small point was ever addressed:-
>
> Scott A Crosby wrote:
>
>
> > Thus, the added layer of 'longHash' seems ill-defined and pointless. If an
> > object needs more hash bits, can it not store those bits as an internal
> > field, and override 'hash' with a new function returning those bits?
> Not in the sense of keeping a larger hash in the object header, no.
> There is a limit on the space available and 12 bits is it.
>
This was not what I meant.. See below:
> Now, we could extend the object format to address this; I seem to
> remember that at least one Smalltalk (Tek?) allowed for an extra word to
> be added to the object to hold a 32bit hash if needed. However, it is
> pretty rarely needed and so I'm not sure the complication would be worth
> it. In Squeak one could conceivably add an optional fourth header word
Its unnecessary. We can do all of this in smalltalk with no vm
changes.
> to allow a similar capability, but that would also add complications.
> The annoying part of any of these possibilties is that the hash value is
> not often needed and so it hurts to pay the price.
>
Exactly. Extra hash bits are unlikely to be needed in practice. I fully
agree, there should not be any VM changes... Fortunately, code like the
following would work fine.
--------
FooClass>>new
super new.
hashCode := SmallInteger returnRandomInteger.
FooClass>>hash
^hashCode
FooClass>>= object
^self == object
----
And/or, we can make a new method:
Class>>largeHashSubclass:instanceVariableNames:classVariableNames:poolDictionaries:category:
which creates the two above methods automatically.
(Should I overload #identityHash instead of #hash, but I'm not sure if
this breaks things. Essentially, we're faking an identityHash with that
extra field)
--
I'm not offering a solution for handling all large collections
effectively. I'm just offering a solution to *fix a bug* in collections
that are >4000 elements.
Assuming a load factor of .5:
--- # probes into array needed #
Collection size --- old code --- my code --- ideal.
1000 2 2 2
2000 2 2 2
3000 4 2 2
4000 2000 2 2
5000 2500 2.5 2
8000 4000 4 2
10000 5000 5 2
100000 50000 50 2
1000000 500000 500 2
(where ideal requires a hash that can result in 2^30 distinct values)
I'm not offering a solution that gives us the last column. I'm fixing the
bug that gives us the first column.
Nor am I suggesting that we, in general, need to change the VM to give us
the last column. In the general case, it'll be a rarity.
And, if, the rarity occurs, and someone has a collection so large that
they feel they need the last column instead of the 'my code' column, they
can use the code above to add a seperate hashCode field to their class.
Scott
More information about the Squeak-dev
mailing list
|