HashBits, a lazy way

Hannes Hirzel hannes.hirzel.squeaklist at bluewin.ch
Thu Jul 10 01:39:29 UTC 2003


John

Thank you for the additional clarification. I have some additional
questions  though ....

John M McIntosh wrote:
>> From:   Hannes Hirzel < hannes.hirzel.squeaklist at b... >
>>
>> Which kind of problem are you thinking of
>> solving? It seems to have something to do with generating the internal
>> id each object has. What is wrong with the current solution and which
>> kind of enhancement are you thinking of?


Hash values used in sets
========================

> Ok, a rule is when you stick an object into a Set it needs two things.  
> A compare method
> and an hash method. The hash method is used to place the object in the  
> dictionary, and of course an
> interation thru the elements might occur if there is a collision, quite  
> basic computer science.
> 
> See for example Float and ChessMove for example of = & hash. The idea  
> of the hash of course is to distribute
> a set of objects evenly within a range, versus having many objects of  
> the same class but different characteristics
> end up being hashed to the same location.  See issues with large  
> dictionaries for issues with hash. Failure
> to sync hash & = also lead to problems.


How are hash values to objects?
===============================


> Now if the object does not have a hash method defined by the
> author of a class, then we get to Object>>hash,
> when invokes Object>>identityHash
> which ends up calling a primitive to fetch the 12 bits of hash from the  
> object header. This hash is created by
> ObjectMemory>>newObjectHash. (VisualWorks uses a hash from  Knuth).

A question aside: Are 4096 possible hash values enough? (I remember
vaguely that this has been discussed on the list but I do not remember
the conclusion reached if any.)


> Now the optimization is that the identiy hash is rarely used because  
> most classes have hashs, and we only
> need to do the hash for storing the item in a Set. IE doing  
> (1.0+2.0)*3.0 generates hashs on all the objects, plus
> working objects. This takes time.

I do not understand why (1.0+2.0)*3.0 generates hashs on all the
objects. What does this expression mean in this context?


> The lazyness here is we only want to generate the identity hash if we  
> need to. That is the hash value in the object header remains zero
> until it is needed.

> In a perfect world, I would have said if the hash of the object
> ObjectMemory>hashBitsOf: is zero, then
> generate one and stick it in the header.


> Problem then is when to decide if we want to do that. 

> Problem is there are already  
> 71 objects in the image that
> have a hash of zero. To change these require rehashing all sets in the  
> image and external segments, but that's
> tied to this VM change.

Isn't this a one time action within each image?
How is it tied to a VM change?


> So I'm looking for a brilliant idea of course to automate that.

To automate what? Rehasing all the sets?


> PS I've thought of saying if the object location is < youngspace, then  
> do nothing, Perhaps today I'll see
> if iterating over all Set objects triggers any usage of identity hash...


Thank you for your elaboration

Hannes Hirzel




More information about the Squeak-dev mailing list