HashBits, a lazy way

John M McIntosh johnmci at mac.com
Wed Jul 9 17:32:05 UTC 2003


> From:   Hannes Hirzel < hannes.hirzel.squeaklist at b... >
> ...

> Hi John
>
> The things you write, are interesting, but I do not get the point which
> topic you are writing about. Which kind of problem are you thinking of
> solving? It seems to have something to do with generating the internal
> id each object has. What is wrong with the current solution and which
> kind of enhancement are you thinking of?
>
> Thanks for clarification in advance.
>
> Regards
>
> Hannes Hirzel
>
> John M McIntosh wrote:
> >> The solution of course is to do it with lazy initialization, only  
> when
> >> someone actually needs to use it do we
> >> build the hash number just for them.
> >>

Ok, a rule is when you stick an object into a Set it needs two things.  
A compare method
and an hash method. The hash method is used to place the object in the  
dictionary, and of course an
interation thru the elements might occur if there is a collision, quite  
basic computer science.

See for example Float and ChessMove for example of = & hash. The idea  
of the hash of course is to distribute
a set of objects evenly within a range, versus having many objects of  
the same class but different characteristics
end up being hashed to the same location.  See issues with large  
dictionaries for issues with hash. Failure
to sync hash & = also lead to problems.

Now if the object does not have a hash, then we get to Object>>hash,  
when invokes Object>>identityHash
which ends up calling a primitive to fetch the 12 bits of hash from the  
object header. This hash is created by
ObjectMemory>>newObjectHash. (VisualWorks uses a hash from  Knuth).

Now the optimization is that the identiy hash is rarely used because  
most classes have hashs, and we only
need to do the hash for storing the item in a Set. IE doing  
(1.0+2.0)*3.0 generates hashs on all the objects, plus
working objects. This takes time.

The lazyness here is we only want to generate the identity hash if we  
need to.

Problem then is when to decide if we want to do that.

In a perfect world, I would have said if the hash of the object  
ObjectMemory>hashBitsOf: is zero, then
generate one and stick it in the header. Problem is there are already  
71 objects in the image that
have a hash of zero. To change these require rehashing all sets in the  
image and external segments, but that's
tied to this VM change.

So I'm looking for a brilliant idea of course to automate that.

PS I've thought of saying if the object location is < youngspace, then  
do nothing, Perhaps today I'll see
if iterating over all Set objects triggers any usage of identity hash...

--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===


More information about the Squeak-dev mailing list