[BUG][FIX] WeakKeyDictionary>>keysAndValuesDo:

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Jun 15 00:54:13 UTC 2004

Martin Wirblat <sql.mawi at t-link.de> wrote:
	don't you consider it to be a bug that Squeak Sets 
	change their hash values when their elements are changed?

No.  In fact, I think it would be a very serious error if the hash
value *didn't* change.  Remember the fundamental rule about #hash:

    x = y implies: x hash = y hash

Now, suppose we have two different Set objects x, y, which have the
same elements:

    x := #(1 2 3 4) asSet.
    y := #(3 1 4 2 4 1 3) asSet.

We expect that x = y.  After all, two sets are equal if and only if they
have equal elements, and if Sets don't act like sets, they have no right
to the name.  If we *ever* hope to use Sets as elements of other Sets or
as keys in a Dictionary, they had *better* get equality right.

So since x = y, we MUST have x hash = y hash.

Now suppose that x hash does not change when we add a new element.
Starting from empty Sets, we can prove that ALL sets have the same hash

We really are driven to this conclusion:
EITHER (1) aSet hash must change (sometimes) when the contents of aSet change,
    OR (2) all Sets must have the same hash value
    OR (3) the relationship between #= and #hash is broken
    OR (4) Sets get #= very badly wrong.

Myself, I don't see any problem with option (1), the way Squeak does things.

I just tried this in Ambrai Smalltalk 1.0 Beta 1 on a G4 Mac.
x hash = y hash   came out false,
x = y             came out false too.
x hash            did not change when I added an element.
What this means is that Ambrai (4) gets #= very badly wrong for Sets.

What does the ANSI Smalltalk standard say about this?  It says
    [#=] tests whether the receiver and the comparand are *equivalent*
    objects at the time the message is processed.  Return true if the
    receiver is equivalent to comparand.  Otherwise return false.
    The meaning of "equivalent" cannot be precisely defined but the
    intent is that two objects are considered equivalent if they can
    be used interchangeably.  Conforming protocols *may* choose to
    more precisely define the meaning of "equivalent".

    The value of
        receiver = comparand
    is true if and only if the value of        
        comparand = receiver
    would also be true.  If the value of
        receiver = comparand
    is true then the receiver and comparand must have equivalent hash
    values.  Or more formally,
        receiver = comparand =>
        receiver hash = comparand hash

    The equivalence of objects need not be temporally invariant.  Two
    independent invocations of #= with the same receiver and operand [sic.;
    should have been comparand] objects may not always yield the same value.
    Note that a collection that uses #= to discriminate objects may only
    reliably store objects whose hash values do not change *while the
    objects are contained in the collection*.

The *emphasis* marks were added by me.  It is true that the ANSI Smalltalk
standard does NOT further specify #= for Sets or Dictionaries, but the
base definition only says that "Conforming protocols *may* choose to more
precisely define" #=; a defensible reading is that two Sets ought to be #=
if they represent equal sets and this is so obvious that it shouldn't have
needed saying.  There is certainly nothing in the ANSI specification that
*forbids* getting Set>>= right.

	I just tested 
	    | hash s |
	    hash := ( s := #( 1 2 3 ) asSet ) hash.
	    s add: 4.
	    hash = s hash
	in Dolphin, VSE and VW and it always evaluated to true.

Can you imagine how useless #hash would be for Strings if the hash value
of a String didn't depend on what its elements were, only how many?  That's
how useless that kind of a #hash is for sets.

	Only Squeak thinks it is false.

Squeak gets it right.

	For me it is totally nonintuitive not to be able 
	to put any Set into another one.

No, you've got it wrong.  What I said was that you cannot put a Set
object inside ITSELF.  You can put a Set inside a DIFFERENT set with
no trouble at all.  In fact, from what you have said, out of Ambrai,
Dolphin, VSE, VW, and Squeak, Squeak is the *only* one where you *can*
usefully put a Set inside another Set.

	In fact every object that is explicitly designed to change its
	contents but which is not maintaining its hash value will give
	the programmer no intuitive clue that it can't be put in a Set.
I am always suspicious of appeals to 'intuition'.  One can reasonably
expect a Smalltalk programmer to know the two fundamental rules of hashing:

(1) x = y implies: x hash = y hash
(2) A mutable object should not be mutated while its hash value is useful,
    that is, while it is a member of a set or bag or a key in a dictionary.

Make no mistake, it is quite explicit in the ANSI standard that the
equality of *sequences* depends on the elements.  If the hash value of
a String didn't depend on the String's elements, it wouldn't be much use
for hashing, but because it does depend on the elements, a String may not
be changed *while* it is a member or key in a hashed collection.  The
same goes for Arrays, OrderedCollections, SortedCollections, &c.  Why
should Sets be any different?  You *can* put an Array, an OrderedCollection,
a String, a Set, or any mutable object, in a (different) set; you just have
to refrain from changing it while it's there.

Note that you *can* usefully put mutable objects into IdentitySets and use
them as keys in IdentityDictionaries and still keep on changing them to your
heart's content, and you *can* add an IdentitySet to itself without any
special problems.

More information about the Squeak-dev mailing list