help!! - now "Hash the set"

Mike Klein mike at twinsun.com
Thu Apr 16 19:53:47 UTC 1998


> >The number of Sets in a system and its medium size can vary from system to 
> system...
> >Probably we need a more secure hash for Sets.
> 
> Well yes, what do we do with its hash? Certainly it's not a good thing to 
> depend on the hash of its elements, because its unordered. This also renders 
> kicking the problem to the implementation of the sets into a bad thing.

I don't see why Set >># hash would be a bad thing.

> >Or a NOT so clever #=...
> 
> Like detecting an object not in the other... size squared complexity, fast 
> for small sets, terribly slow for big sets.
> 
> >How many times do we compare collections for equal contents?
> >How many times do we compare by identity tests?

> If you plan waiting until eternity comes when sometime you finally send 
> #=... no.

I often compare sets for equality.  Usually these sets are very small,
so my brain-dead N^2 algorithms have never needed improvement.

> I'm not convinced in this case. I think I now have it:
> 
> 1. Add an instance variable to sets, namely hash.
> 2. Initialize to zero on instance creation.
> 3. Upon every add: anObject, bitXor it with anObject hash.
> 4. Upon every remove: anObject, bitXor it with anObject hash.
> 
> The steps 3 and 4 only update when something happens. Hmmmm... this kills 
> all the problems, I guess. In fact, it's a fast way to check if aSet ~= 
> anotherSet, by checking their sizes first, then its hashes and then bad luck.
> 
> Could this work for any other collection?... well... ;-)... it's fast 
> enough so that it doesn't turn add:, remove: and so into number crunchers.
> 
> Andres.

I like 1,2.  No need to pay for 3,4 untill #hash is sent, in which case
the computation time is roughly akin to the case with 3,4 (although in one big
lump).

What, you may ask, happens, if you try to hash, modify the set, and then
hash again?  You shouldn't.  Modifying objects after they have been hashed
indicates a confusion between a functional and imperitive approach.

I would hope the set would raise an error on any adds or removes after it
had been hashed.  I talk about this a little more in an other post.

(Do you call them posts in a mailing list?)


-- Mike Klein





More information about the Squeak-dev mailing list