[BUG][FIX] WeakKeyDictionary>>keysAndValuesDo:

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Jun 22 03:23:29 UTC 2004


Martin Wirblat <sql.mawi at t-link.de> still doesn't get it.
	First and again, there are enough classes left to show that your 
	premise is not universally applied.

The number of classes reported by the snippet means NOTHING.
All that counts, the ONLY thing that counts, is the number of
classes satisfying
    (a) they don't redefine #=
    (b) they should
    (c) but they are good programming style none-the-less.
Your snippet could report five hundred million classes and still
miss the point as completely as it does now.

	Second, it is not at all that clear that you 
	simply can subtract Morph and its subclasses. Think a bit further and 
	imagine a Morphic implementation, in which morphs are not guaranteed 
	to be unique by their state.

Now we are headed off on a trip to fantasy land.
We *can* remove Morph and its descendants because they *do* have
unique state.

If they didn't have unique state, they would be *DIFFERENT* classes.
And who knows, perhaps those entirely imaginary classes *would* have
a suitable definition of #=.

	For performance reasons - if not for other ones - it is very
	well possible that #= still would have been implemented as #==.

I find the idea of knowingly delivering incorrect answers in the name
of 'performance reasons' hard to distinguish from 'culpable professional
negligence'.

This must be clearly distinguished from "offering documented approximate
results *when explicitly called for*".  I have no quarrels with any number
of #isApproximatelyEqualToGiveOrTakeACountryMile: methods.

I also have no quarrel with designers who decide that while equality _is_
definable on their classes it is too expensive and therefore write

    =
        "Equality could be define like this BLAH BLAH
         but we thought that was too expensive."
        self notYetImplemented.
    hash
	"See #= in this class."
        self notYetImplemented.

There are many cases where you simply don't care what #= does because you
do not expect that anyone will ever ask.  Pity the unfortunate users of
your code who *do* ask and get the wrong answer!  If you think that a
correct equality-of-state method is too expensive to implement and that
nobody will want it, plug in those two not-yet-implemented methods and
at least your users will find out that they don't have a working #= .

	Here is the clarification of what I wanted to express with
	choosing "intuitively" state- or identity-comparing for #=.
	Let's consider the sequence:
	
	String Array | Set | Bag DataBase
	
Presumably that should be

	String | Array | Set | Bag | Database

	DataBase is a fictive class with a) a complex structure of state, but 
	totally contained in the image, or b) a class which is wrapping 
	something external, which would be very slow to be tested for state-
	equality, because the existence of two dataBases with completely equal 
	state is allowed, so that the identity test can't be used as a 
	shortcut. 
	
	I am going to formulate the cardinal intention of a #= check like 
	this: I put the object in question into a variable and elsewhere I 
	want to check if some unknown object IS THE ONE I put in the variable. 
	I consider the main intention of Set to be reflected by this 
	formulation. 
	
Nonononono!  If you 'want to check if some unknown object IS THE ONE that'
you 'put in' a 'variable', the test you want is #== .

The whole point of the Set -=vs- IdentitySet distinction is that
IdentitySet *does* ask "is this object THE ONE you put in before"
but Set does *not* ask that; Set asks "is this object currently
LIKE the one you put in before".

	A Set should be able to hold objects of arbitrary classes. Choosing an 
	IdentitySet vs an "EqualStateSet" (not implemented in Squeak right 
	now)

Wrong.  "EqualStateSet" is *PRECISELY* what Set is and is designed to be.
That's why Set>>indexofElement: (Ambrai) Set>>scanFor: (Squeak) uses
#hash and #= to find an object whereas IdentitySet>>indexOfElement: (scanFor:)
uses #identityHash and #==.  (In fact, in Ambrai Smalltalk, that's the
*only* difference between Set and IdentitySet; Squeak also reimplements
#asIdentitySet, sensibly enough.)

It simply isn't true that "A Set should be able to hold objects of
arbitrary classes".  A true statement is that "A Set should be able
to hold any objects which respond to the #= and #hash methods in a
way that is consistent with their contracts."  Here, for example,
is beginning of the class comment for Set in Squeak:

    I represent a set of objects without duplicates.
    I can hold anything THAT RESPONDS TO #hash AND #-,
    except for nil.  ...
    NOTE THAT I RELY ON #=, not #==.
    IF YOU WANT A SET USING #==, USE IdentitySet.

	is not an option, if Sets are about "IS THE ONE" in it and the 
	answer to this question has to be implemented differently for 
	different classes. 
	
I do not see how anyone reading the class comment for Set in Squeak
could possibly thing that 'Sets are about "IS THE ONE" in it'.  They
are not.  You are explicitly advised to use IdentitySet if that's what
you want.

As a specific example, consider

    x := 10 raisedToInteger: 1000.
    y := x + 1 - 1.
    s := Set with: x.
    {s identityIncludes: x. s identityIncludes: y. s includes: y}
==> #(true false true)

When we ask whether y is in s, no, y is *not* "THE ONE" in s.
(#identityIncludes: answers false.)  But there *is* an element of
s which is #= to y.  (#includes: answers true.)

	Let's start with String. There are String methods which are returning 
	either a copy of the receiver, or the receiver itself unaltered or the 
	receiver itself but altered "in place".

True.  However, precisely because some of the methods *do* alter a string
"in place", you MUST care about String identity.  In particular, if you
put a String into a Set (in *any* Smalltalk) you *must* ensure that that
specific copy will not be altered while it is in the Set.

	Very often the intention of 
	the program is only to get the string's bytes written to a file or the 
	screen. Programming with Strings is mostly about their state, rarely 
	their pointer is of interest, seen from the POV of the result. The 
	question "IS IT THE ONE" is mostly answered yes, if the two objects 
	have equal states.

Not true.  With the exception of Symbols, *most* of the two when two Strings
have equal states the question "IS IT THE ONE" should be answered NO, they
are NOT the same object.  The question which gets the answer yes is "is
this one LIKE that one", and that's a different question.

	They may be identical or not, mostly it doesn't matter.

It doesn't matter right up to the point where you change one of the strings.
When you change a String, it matters a heck of a lot whether it is this
copy or that copy which gets changed.

	Now think of DataBase.

I've thought of it.  Equality of state between data bases is perfectly
well defined.  Given a suitable DUMP operation, it's even straightforward
to implement.  It may be hideously expensive.  For very large data bases,
it will probably be far too expensive to use.  The probability that anyone
will want to do it approaches zero.

THAT DOES NOT MAKE THE #== IMPLEMENTATION A CORRECT IMPLEMENTATION OF #=
FOR DATABASES.

As I wrote above, it is perfectly reasonable to decide that equality of
state should NOT be offered by some class, in which case you should do

    =    self notYetImplemented.
    hash self notYetImplemented.

(A subtle point here:  I take it that #notYetImplemented means that the
operation *could* be programmed but hasn't been, while #shouldNotImplement
means that the operation doesn't make sense even in principle.)

	There should be no methods in its offered 
	protocol for intended standard operations, which makes a copy of the 
	whole database, to simply change, add or remove something of it.

I'm having a bit of difficulty with that, because it isn't quite English.

Real data bases most definitely need an operation that makes a copy
for dump/restore.  I think you mean that INSERT, DELETE, and UPDATE
should not make copies.

	There should be only one exemplar of a specific database,
	duplicates other than for special intentions are nonsense.  If I
	add to DataBase for thingsOfTypeA something, I expect it to be
	still DataBase for thingsOfTypeA.
	
Yes, but you are still equivocating.  We've been over this before.
It is the same data base OBJECT but after a change it is in a different STATE.

	dbCopy := db copy.  "make a copy of the STATE"
	dbRef := db.        "make another REFERENCE to the OBJECT"
	db makeSomeSmallChange.
	{db == dbRef.  db = dbRef.  db = dbCopy}
 ===>   #(true true false)

	Now I am creating two DataBases, one is for thingsOfTypeA and the 
	other shall contain thingsOfTypeB. At the beginning they are empty and 
	not to distinguish by their state, so the question "IS IT THE ONE" can 
	only be answered correctly by testing for identity. 
	
That is because "IS IT THE ONE" *means* "are they the same object."
The question "do they have the same state" might be correctly answered
yes or it might be correctly answered no.

In particular, you have said that one is for 'thingsOfTypeA' and the
other is for 'thingsOfTypeB'.  This raises the question:
"is an empty collection that could only hold Bees
 the same as
 an empty collection that could only hold Wasps?"
In some type systems, you can't ask.
In some type systems, the answer is "yes".
In some type systems, the answer is "no".
If, for example, we have

    Object subclass: #Relation
      instanceVariableNames: 'constraint collection ...'

      methods:
        add: aTuple
          (constraint accepts: aTuple) ifFalse: [self error: 'bad type'].
          ...

then an empty Relation that holds only thingsOfTypeA and an empty Relation
that holds only thingsOfTypeB would NOT have the same state.  There is, as
it happens, a perfect analogue of this in Squeak already:  PluggableSets.
Two PluggableSets *should* be equal if and only if
    - they are both PluggableSets
    - they have equal hashBlocks and equal equalBlocks
    - they have the same size
    - they have equal elements (as reported by their common equalBlock)
Since equality of functions is not implementable, equality of PluggableSets
cannot be determined, and in my view,

    =     self shouldNotImplement.
    hash  self shouldNotImplement.

is the right implementation of equality for PluggableSets.

	Even if this were for some reason not so clear, the implementor would 
	possibly be forced by performance reasons to test for identity. 
	
Wrong.  If performance reasons are the consideration, you DON'T implement
a method WRONGLY, you mark it as unimplemented. 

	My sequence starts with a class which suggests a state comparison for 
	answering the question "IS IT THE ONE" end ends with a class which 
	needs identity comparison.

No, it doesn't.  Identity comparison is available for ALL the classes in
your sequence, and it is at various times appropriate for ALL of them.
Similarly, equality of states is well defined for ALL of the classes in
your sequence, it is simply more expensive and less likely to be used for
some than others.

The answer is that Database should mark equality as unimplemented,
and you should not put Database objects in Sets, only in IdentitySets.

	There has to be borderline, where the switch is made.

There is no borderline that says "at this point equality should cease
being equality".  The borderline is an ECONOMIC borderline that says
"at this point the payoff from implementing equality isn't big enough,
so equality should be stubbed out."

	In other Smalltalks it is drawn between Array and Set 

Well, no.  The line between "#= works" and "#= doesn't work" may
HAPPEN TO FALL there, but it is far from clear than anyone ever
consciously *drew* the line there or anywhere else.  I strongly
suspect that if #= had not been needed for String, we'd never have
had it for Array.  I'm quite sure that the vast majority of classes
that don't say anyting one way or another about #= are that way
because their hacker forgot to ask what #= should do.

	and in Squeak between Set and Bag. I think the existence of this 
	borderline in major Smalltalks, alone shows that your premise can only 
	be the question and not the answer. 
	
Wrong.  You are assuming that all other Smalltalks got it right.
You are assuming that all other Smalltalks are that way because of
conscious choice.  This is most unlikely.  There's a whole lot of
code has survived into modern Smalltalks from the late 70s.  Heck,
Squeak even still includes MappedCollection, which nothing in the image
uses, and is missing from several other Smalltalks; it's ancient blue book
code.

Squeak is not a commercial product for which backwards bug compatibility
is a requirement.  

	Perhaps it is debatable where the borderline should be drawn.

I deny that such a debate would be legitimate.  I deny that a border
between "#= gives right answers" and "#= gives wrong answers" should
ever be drawn.  The border I *will* tolerate is between "#= gives
right answers" and "#= reports an error because it cannot give any answers".
Draw that wherever you please.

	I was used to see a Set more like a DataBase, an opaque
	container, you are more mathematical - you think of it as a
	pattern, in a transparent cover, like a String.

Why do I have this feeling that "mathmatical" is used as an insult?

My point of view is severely pragmatic:  the *point* of #= is to
compare states, not identities, if there is a situation where comparing
identities is appropriate, #== exists and should be used, and classes
whose job is to implement mathematical structures (points, rectangles,
sequences, sets, finite maps) should behave consistently with their
models and with each other.  Above all, SUnit should be as useful as
possible.

Huh?  What has SUnit to do with this?

Suppose you are implementing a query language for XML (how hard can it be
to do better than XPath, eh?).  You have developed a parser and interpreter
for your query language.  Your main task was to get something that is so
simple and direct that it is obviously right.  But it isn't terribly fast,
so you are developing a new interpreter which should be faster.  In your
query language, answers are sets of nodes, not sequences as in XPath.  So
you have a TestCase whose setup calls the parser, and then you do

    self assert: (SafeEngine interpret: query) = (FastEngine interpret: query).

What you want is a way to see if you get the same set of nodes.
If #= acts like #==, this won't work.
If you *wanted* to ask whether they were the same object,
you would have used #== in the first place.
If you have a Smalltalk where there is no working #= for Sets,
you have to write your own, perhaps

    Set>>
      equals: anotherSet
        ^anotherSet class == self class and: [
         anotherSet size  =  self size and: [
         anotherSet allSatisfy: [:each | self includes: each]]]

which of course means that it is less efficient than it could be.

Or take another example.  You are writing a compiler.  (I have a particular
programming language in mind for this compiler.)  There's an "alternative"
construct which may define variables.  It's an error if there is some
variable that is live after this construct which is defined in some of
the choices but not other.

    s := alternative choices first definitions intersection: live.
    alternative choices do: [:each |
       (each definitions intersection: live) = s
         ifFalse: [alternative semanticError: 'inconsistent definitions']].
    live := live difference: s.

Here again, a true set equality operation is essential.  I have *oodles*
of examples where true set equality is a useful operation.

	Aside from that, compatibility and performance are an important
	issue, which favors the version of the other Smalltalks.
	
Wrong.  Wrong answers could be given infinitely fast and they would
still be BAD answers.  Compatibility means that if you write code that
works on sets, you should never use #=.  The ANSI standard gives *no*
useful guidance on it, many Smalltalks get it wrong for sets, bags,
and dictionaries (heck, one I know of even gets it wrong for Intervals),
so you can't use it.

Of course, you are then stuck with no easy way to compare sets for
equality and no guaranteed way to plug your own #equals: method into Set.
SHAME on those other Smalltalk providers, SHAME on them!




More information about the Squeak-dev mailing list