identiyHash discussion

Andres Valloud sqrmax at prodigy.net
Wed Dec 12 19:36:33 UTC 2001


Hi Scott.

> I'm not offering a solution for handling all large collections
> effectively. I'm just offering a solution to *fix a bug* in
> collections that are >4000 elements.

I see what the problem is.  I don't have your previous postings here,
however, how about this?

* Where IdentitySet/IdentityDictionary send #identityHash, it should be
multiplied by a large prime to spread out the hash buckets when the
array is large.

* In MethodDictionary, #identityHash shouldn't be multiplied so it's not
necessary to change the VM (ie, 4096 methods/class is extremely generous
already plus we save an epsilon by avoiding a multiplication per
selector lookup).

So it's not necessary to change any #identityHash/#hash methods, etc.

To put this in the image, you could try a series of ordered changesets,
like this:

1. Add an instance variable called, say, #oldArray, to all identity
collections to be modified.  Put up an accessor which lazily initializes
oldArray to the current array.  I don't know if MethodDictionaries will
like the added instance variable, but if so a workaround will be fine
(*).  

2. Insert the modifications in the methods, with an extra provision: if
self oldArray == the current array, perform the old hashing.

3. Insert a modification in #rehash so that setting up a new array uses
the new hashing.

4. Execute Set>>rehashAllSets (IIRC).

5. Now, all IdentitySets/IdentityDictionaries have their instance
variable rehashed set to the old array, so they're doing the new
hashing.  Note that the logic change and the array change happen
instantaneously, so everything's fine.  Remove the references to the
rehashed instance variable.  Revert to the old version of #rehash. 
Stomp all old code with the corresponding new versions.

6. Remove the instance variable rehashed.

Done.

Andres.

(*)  For instance, the accessor lazily sets up a global array of
associations between the instances of set/dictionary and their current
array or array of keys.  Let's don't use a dictionary or set for this. 
Yes it's somewhat slow, but it's not relevant.  At any rate, in this way
it's not necessary to use the instance variable.




More information about the Squeak-dev mailing list