[BUG][FIX] WeakKeyDictionary>>keysAndValuesDo:

Richard A. O'Keefe ok at cs.otago.ac.nz
Wed Jun 16 04:41:13 UTC 2004


I wrote:
	>We expect that x = y.  After all, two sets are equal if and only if 
	>they have equal elements, and if Sets don't act like sets, they have 
	>no right to the name.  If we *ever* hope to use Sets as elements of 
	>other Sets or as keys in a Dictionary, they had *better* get equality 
	>right.

Martin Wirblat defended the status quo:
	I guess you are referring with "the name" to how mathematicians have 
	defined their Sets. They have platonic Sets, meaning = and == are not 
	distinguished. Here on earth representation or materialization 
	matters, so programmers can have different sets with the same contents.
	 
And in precisely the same way they can have physically different Arrays
with equal contents.

	Programmers' Sets are different from mathematicians'.

Let's see:
    SETL	nope, not true.
    Haskell	nope, not true.
    ML		nope, not true.
    Prolog	nope, not true (the 'ordset' library goes to the trouble
		of storing sets in canonical form, so that "same shape"
		iff "same value")
    Mercury	nope, not true.
    Java	nope, not true.

What do I mean by "nope, not true"?  I mean that in all those languages,
set equality is defined to be "having equivalent elements", just like in
mathematics.  Here, for example, is the actual comment from
java/util/Set.java:

    /**
     * Compares the specified object with this set for equality.  Returns
     * <tt>true</tt> if the specified object is also a set, the two sets
     * have the same size, and every member of the specified set is
     * contained in this set (or equivalently, every member of this set is
     * contained in the specified set).  This definition ensures that the
     * equals method works properly across different implementations of the
     * set interface.
     *
     * @param o Object to be compared for equality with this set.
     * @return <tt>true</tt> if the specified Object is equal to this set.
     */

That's right, aSet.equals(anotherSet) in Java precisely when the two
sets have equal elements.  Oh, and the hash code for a Java Set is
computed from the hash codes of its elements, pretty much the way it is
in Squeak.  In fact, the Java 1.2 collection classes were clearly modelled
fairly closely on Smalltalk, and much though it pains (AAAARGH) me to say
this, they are fairly well thought out.

	A mathematician can't add something to a set without getting
	another set.

A programmer cannot add something to a Smalltalk Set without getting
a different *STATE* for that object, and the #= selector exists precisely
to look at states.  That's the point of the #== (same identity) -vs- #=
(same state) distinction, after all.

	After adding something to a set a programmer has of course
	the same object and that may lead him to think that he has the same set.

There's an old children's rhyme:
    "Moses supposes his toeses are roses,
     but Moses supposes erroneously."

A programmer who goes out of his way to change the set that is stored in
a Set and then thinks he has the same set clearly belongs in a Home for
the Bewildered.  The distinction between objects and states is a pretty
fundamental one in Smalltalk; if you don't understand that you will never
be quite sure when to use #== and when to use #= .

	I know what you said. But you are implying that a set is the Set of 
	the mathematicians.

I am doing nothing of the kind.
I am implying that the *STATE* of a Set object represents something
extremely similar to a mathematical set.

	Only then you can put a set into another set.

This is a good example of a non-sequitur.

	Such a Set must not be changed, which contradicts to the
	programmer's day to day use of Set.

Again, you are confusing STATES (sets) with OBJECTS (Sets).
The basic point of #= is to compare the STATES held in objects,
so aSet = anotherSet should be true when the two Sets hold equal sets.

As a matter of fact, _this_ programmer's day-to-day use of sets is
usually in declarative languages where there is no mutation of any kind,
and remarkably simple it makes this programmer's life.

You see, you really CANNOT attack what Squeak does with Sets (making
#= compare states and #hash depend on states) without providing an
argument, which, if sound, would just as strongly condemn the fact that
#= compares the states of Array objects and #hash depends on the state
of an Array object.  Squeak's behaviour is consistent and useful.

For those Smalltalks where #= and #hash are the same as #== and #identityHash
for Sets, their behaviour is inconsistent and markedly less useful.

The fact that you shouldn't change a Set while you have it in another Set
is no different from the fact that you shouldn't change a String or an
Array while you have it in a Set.

	>I am always suspicious of appeals to 'intuition'.  One can reasonably
	>expect a Smalltalk programmer to know the two fundamental rules of 
	>hashing: 
	>
	>(1) x = y implies: x hash = y hash
	>(2) A mutable object should not be mutated while its hash value is 
	>useful, 
	>    that is, while it is a member of a set or bag or a key in a 
	>    dictionary. 
	>
	That may be clear to you, but I bet that many if not most people who 
	are programming Smalltalk (i.e. including many newcomers) are not 
	aware of this problem, which is pretty good demonstrated by the fact 
	that different Smalltalk dialects implement Sets differently (and only 
	Squeak gets it right in your opinion). 
	
Your inference here is unsound.  Those other Smalltalk dialects *do* get
the relationship between #= and #hash consistent, so the fact that their
definition makes Set>>= pretty much useless does *NOT* demonstrate anything
about their understanding of the relationship between #= and #hash.  In
fact, my study of the code in Ambrai Smalltalk shows that although they
cannot spell Week (in DateAndTime they spell it Weak) they do understand
the ANSI standard quite thoroughly and have gone to a lot of trouble to
follow its details pretty darn closely.  They have even done this where it
is silent:  if it didn't say "do something useful", then by gosh, they didn't.

Any good Smalltalk book explains the two fundamental rules above at the
point where it describes #= and #==, or possibly at the point where it
describes Set and Dictionary.  A beginner who is unaware of this simply
hasn't bothered to read any of those wonderful free Smalltalk books.

	After all one criterion for the quality of a programming language is 
	how intuitive it is.

It's not a good criterion, because what is intuitive depends on what you
knew before.  For example, because I already knew SNOBOL and resolution
theorem proving, I found Prolog completely obvious.  I even understood the
cut (!) operation (it's identical to FENCE in SNOBOL).  But students who
only know Java find Prolog about as baffling as they come.

	A Set and even more a Bag is a brown sack for me.  I stuff
	things in it and from time to time I ask if something specific
	is in it.  I would be astonished when I order to put something
	new into the sack and it vanishes from the cellar of my
	warehouse and pops up in the loft.
	
This DOESN'T happen in Squeak and nobody is suggesting that it should.

Note that with the large number of students coming through who have learned
Java as their first language, there are a lot of people out there who would
regard Squeak's definition of #= and #hash for Set as obvious (just the way
Java does it) and Ambrai/Dolphin/VW etc as bafflingly stupid.  "Intuitive"
means "doesn't suprise ME" and is of very limited utility as a criterion.

	Arrays and Strings can be created literally and can not change size. 

So what?  OrderedCollections cannot be created literally and can change size,
yet they define #= and #hash to depend on the elements just the same way that
Arrays and (Squeak) Sets do.  And it's not just Squeak that makes
OrderedCollection work like that:  the ANSI standard requires it.

	There is a difference to Sets and that's what I wanted to
	express with "explicitly designed to change".  Arrays and
	Strings are more alike the mathematician's Set than the
	programmer's Set is.
	
This is not true in any meaningful sense.  #(1) and #(1 1) are not equal.
Arrays are very much "explicitly designed to change"; what else is the
point of #at:put:?

	A Squeak Set implements the mathematician's equality for #=. My brown 
	sack would be better suited with implementing identity for #= and 
	#hash accordingly and an additional #equals: would serve the 
	mathematicians.

If you want identity tests and identity-based hashing, you HAVE it.
#== and #identityHash are lying there obediently, awaiting your command.
Squeak offers a choice:  do you want to compare the IDENTITY of Sets
(use #==) or do you want to compare STATES (use #=), just exactly like
it does for Arrays and OrderedCollections.

	Perhaps the question is, if the average programmer is 
	more a mathematician or a warehouse worker? 
	
Well, the average programmer doesn't use Smalltalk.
I would _hope_ that the typical Smalltalk programmer is sufficiently
above the average to appreciate good design, to wish to produce
correct code, and to understand clearly the distinction between objects
and their states.




More information about the Squeak-dev mailing list