Richard A. O'Keefe
ok at cs.otago.ac.nz
Tue Jun 15 00:54:13 UTC 2004
Martin Wirblat <sql.mawi at t-link.de> wrote:
don't you consider it to be a bug that Squeak Sets
change their hash values when their elements are changed?
No. In fact, I think it would be a very serious error if the hash
value *didn't* change. Remember the fundamental rule about #hash:
x = y implies: x hash = y hash
Now, suppose we have two different Set objects x, y, which have the
x := #(1 2 3 4) asSet.
y := #(3 1 4 2 4 1 3) asSet.
We expect that x = y. After all, two sets are equal if and only if they
have equal elements, and if Sets don't act like sets, they have no right
to the name. If we *ever* hope to use Sets as elements of other Sets or
as keys in a Dictionary, they had *better* get equality right.
So since x = y, we MUST have x hash = y hash.
Now suppose that x hash does not change when we add a new element.
Starting from empty Sets, we can prove that ALL sets have the same hash
We really are driven to this conclusion:
EITHER (1) aSet hash must change (sometimes) when the contents of aSet change,
OR (2) all Sets must have the same hash value
OR (3) the relationship between #= and #hash is broken
OR (4) Sets get #= very badly wrong.
Myself, I don't see any problem with option (1), the way Squeak does things.
I just tried this in Ambrai Smalltalk 1.0 Beta 1 on a G4 Mac.
x hash = y hash came out false,
x = y came out false too.
x hash did not change when I added an element.
What this means is that Ambrai (4) gets #= very badly wrong for Sets.
What does the ANSI Smalltalk standard say about this? It says
[#=] tests whether the receiver and the comparand are *equivalent*
objects at the time the message is processed. Return true if the
receiver is equivalent to comparand. Otherwise return false.
The meaning of "equivalent" cannot be precisely defined but the
intent is that two objects are considered equivalent if they can
be used interchangeably. Conforming protocols *may* choose to
more precisely define the meaning of "equivalent".
The value of
receiver = comparand
is true if and only if the value of
comparand = receiver
would also be true. If the value of
receiver = comparand
is true then the receiver and comparand must have equivalent hash
values. Or more formally,
receiver = comparand =>
receiver hash = comparand hash
The equivalence of objects need not be temporally invariant. Two
independent invocations of #= with the same receiver and operand [sic.;
should have been comparand] objects may not always yield the same value.
Note that a collection that uses #= to discriminate objects may only
reliably store objects whose hash values do not change *while the
objects are contained in the collection*.
The *emphasis* marks were added by me. It is true that the ANSI Smalltalk
standard does NOT further specify #= for Sets or Dictionaries, but the
base definition only says that "Conforming protocols *may* choose to more
precisely define" #=; a defensible reading is that two Sets ought to be #=
if they represent equal sets and this is so obvious that it shouldn't have
needed saying. There is certainly nothing in the ANSI specification that
*forbids* getting Set>>= right.
I just tested
| hash s |
hash := ( s := #( 1 2 3 ) asSet ) hash.
s add: 4.
hash = s hash
in Dolphin, VSE and VW and it always evaluated to true.
Can you imagine how useless #hash would be for Strings if the hash value
of a String didn't depend on what its elements were, only how many? That's
how useless that kind of a #hash is for sets.
Only Squeak thinks it is false.
Squeak gets it right.
For me it is totally nonintuitive not to be able
to put any Set into another one.
No, you've got it wrong. What I said was that you cannot put a Set
object inside ITSELF. You can put a Set inside a DIFFERENT set with
no trouble at all. In fact, from what you have said, out of Ambrai,
Dolphin, VSE, VW, and Squeak, Squeak is the *only* one where you *can*
usefully put a Set inside another Set.
In fact every object that is explicitly designed to change its
contents but which is not maintaining its hash value will give
the programmer no intuitive clue that it can't be put in a Set.
I am always suspicious of appeals to 'intuition'. One can reasonably
expect a Smalltalk programmer to know the two fundamental rules of hashing:
(1) x = y implies: x hash = y hash
(2) A mutable object should not be mutated while its hash value is useful,
that is, while it is a member of a set or bag or a key in a dictionary.
Make no mistake, it is quite explicit in the ANSI standard that the
equality of *sequences* depends on the elements. If the hash value of
a String didn't depend on the String's elements, it wouldn't be much use
for hashing, but because it does depend on the elements, a String may not
be changed *while* it is a member or key in a hashed collection. The
same goes for Arrays, OrderedCollections, SortedCollections, &c. Why
should Sets be any different? You *can* put an Array, an OrderedCollection,
a String, a Set, or any mutable object, in a (different) set; you just have
to refrain from changing it while it's there.
Note that you *can* usefully put mutable objects into IdentitySets and use
them as keys in IdentityDictionaries and still keep on changing them to your
heart's content, and you *can* add an IdentitySet to itself without any
More information about the Squeak-dev