help!! - now "Hash the set"

Gerardo Richarte gr at fuzzy.uba.ar
Fri Apr 17 04:35:12 UTC 1998


sqrmax at cvtci.com.ar wrote:
> 
> Hi.
> 
> >The number of Sets in a system and its medium size can vary from system to
> system...
> >Probably we need a more secure hash for Sets.
> 
> Well yes, what do we do with its hash? Certainly it's not a good thing to
> depend on the hash of its elements, because its unordered. This also renders
> kicking the problem to the implementation of the sets into a bad thing.
> 
> 1. Add an instance variable to sets, namely hash.
> 2. Initialize to zero on instance creation.
> 3. Upon every add: anObject, bitXor it with anObject hash.
> 4. Upon every remove: anObject, bitXor it with anObject hash.
> 
> The steps 3 and 4 only update when something happens. Hmmmm... this kills
> all the problems, I guess. In fact, it's a fast way to check if aSet ~=
> anotherSet, by checking their sizes first, then its hashes and then bad luck.

    Don't run so fast! hashes are really susceptible to bad implementations.

    Q: What if the hash of anObject changes while it's included in aSet?
    A: The hash will no longer be valid.
    Ex:
        aSet _ Set new add: (bSet _ Set new).
		cSet _ Set new add: (dSet _ Set new).
		aSet = cSet. <Alt-p> true
		aSet hash = cSet hast. <Alt-p> true
		bSet add: 1.
		aSet = cSet. <Alt-p> false
		aSet hash = cSet hash. <Alt-p> true
		"Ups!"

	Q: Why will somebody do anything like this?
	A: Why not?
	Note: Is Smalltalk a language? or it has all the messages it can
comunicate already encoded within itself? We don't have to imagine all
its posible uses.

	All this makes me think that using Objects whos hashes can change
through time is not good at all for keys or indices, becouse if you, or
aDictionary, are using hashes to locate keys, may be that you would not
find an existing key. Be careful with this!

> Could this work for any other collection?... well... ;-)... it's fast
	Yes it's fast... but I'm almost sure that it's not safe.

    Anyway, as your statistics show, there are only few sets with many elements,
so, big sets are not used as keys (if this stats are valid for uncommon uses).
IMHO hashes should be computed from scratch every time (and must be meaningful).
When there are few big sets, as your stats said,
    ((aSet size > 10) and:
    ("there are 2 sets with suze = 11, 1 with 12 elements and 1 with 13"))
        ifTrue: ["there is 1/3 chance of being wrong when comparing sizes"]

    Careful Bye!
	Richie++





More information about the Squeak-dev mailing list