[BUG]Collection>>removeAll:

Tue Aug 20 20:36:21 UTC 2002

> Message: 33
> Date: Tue, 20 Aug 2002 12:35:28 +1200 (NZST)
> From: "Richard A. O'Keefe" <ok at cs.otago.ac.nz>
> [...]
>
> The library is broken in at least two ways:
> (1) The documentation says NOTHING about the two collections
> having to be
>     different.

Virtually *all* collection operations in Smalltalk assume that the
collections being operated on don't change unexpectedly during
execution; this bug is just one of the infinite number of ways that can
happen.  It is not documented because it is an obvious and universal
principle which experienced programmers understand. I'm sure you
understood it yourself- did you really need it told to you?  You
probably just didn't think that it applied to this case- but it does.

> (2) If an operation isn't going to work, it should bring that to our
>     attention instead of quietly producing incorrect answers.
>
> If removeAll: had the form
>
>     removeAll: aCollection
> 	"Remove each element of aCollection from the receiver.
> 	 If successful for each, answer aCollection.
> 	 Otherwise, create an error notification.
> 	 ArraysCollections cannot respond to this message.
> 	 This message cannot work if aCollection == the receiver,
> 	 because then the result would have to be empty and would
> 	 have to contain all the original elements at the same time."
>
> 	aCollection == self
> 	    ifTrue: [self error: 'Cannot remove a collection
> from itself'].
> 	aCollection do: [:each | self remove: each].
> 	^aCollection
>
> then David Griswold _would_ be right, the library _would_ be fine.
> But it simply won't do for the operation to quietly do the
> wrong thing.

Software that is not used for external, checked interfaces (e.g. a GUI,
file, or network interface) does *not* have the responsibility of
detecting and reporting all possible violations of the software's
interface contracts.  It would be nice if that was possible, but it is
simply theoretically impossible to do for imperative languages.   When
you pass the receiver to removeAll:, you have broken the contract, and
that is your problem, not the collection's.  Only if such a bug is so
common that it becomes a real nuisance should error checking code that
complicates or slows down execution of correct code be added.

A thought experiment: If you think that detecting and reporting all
these kind of bugs is the collection's responsibility, then tell me how
you will detect and report this similar and even more common kind of
bug:  The contract for a Dictionary or Set includes the requirement that
the object being hashed on not change its hash value or the results of
any = operation on it, during the time that it is in the collection.
For example, if a dictionary key changes its hash while it is in the
dictionary, the dictionary is corrupted.  This is a real bug, that
happens not infrequently, and which can be very hard to debug.

Now, according to the principles you state, it is the dictionary's
responsibility both to note that fact in its documentation, and to check
to make sure it doesn't happen.  To do that, the dictionary would need
to keep a separate record of the hash value for each key, and the
results of any = operations applied to it, and to check to see if they
have changed every time the collection is operated on, and then report a
'key mutation' error if that ever happens.  But that is clearly
ridiculous to do.  I can come up with an endless stream of bugs like
this, that are at least as important as the removeAll: bug, just as easy
to detect, and if you tried to detect them all, the libraries would be a
slow, bloated mess.  Reducto ad absurdum.

I sympathize with your wish to have all possible assumptions that the
libraries make documented and checked.  I've been down this path myself
before.  I think almost every programmer goes through a phase where he
wants to put these sorts of runtime checks in as many places as
possible, since detecting interfaces errors is so clearly a good thing.
But believe me, once you get started on trying to catch all possible
contract bugs this way at runtime (as opposed to with assertions or
other separate consistency checks, unit tests, typecheckers, etc. that
can be run statically or in 'stress test' mode, which is the right way
to do this sort of thing and which I strongly encourage), there is an
infinite series of similar bugs that are just as important to detect, so
how are you going to choose which ones to check?   That is why there are
*very* few checks like this in the Smalltalk libraries; one example
being the range check on OrderedCollection indexing, because it is so
important.

> Now, testing whether aCollection == self is pretty darned
> cheap compared
> with everything else that's going on.  It appears to be unavoidable if
> it's to behave sanely.  So when there is a fix that makes the
> operation
> WORK in the previously forbidden case, thus simplifying what
> a programmer
> has to understand, and this fix does not increase the cost of
> the cases
> that used to work, why NOT fix it?

It increases the complexity and size of the libraries, that's why not.
The point is that although this check doesn't seem like a big deal in
itself, if you did the same for all similar kinds of bugs (just as
important as this one), it *would* result in a problem.  So unless this
test can convincingly pass the "this bug is important" test, then it
doesn't deserve to go in.  If everyone thinks that it really is
important enough, that's fine- put it in.  But you *do* have to jump
over that "Very Important Bug" hurdle to justify putting it in the
common execution path, because otherwise you are on the slippery slope
to having to do validity checks on every argument to every method in the
system.  Programmers simply can't expect objects to fully check for
semantically correct interface usage at runtime, given today's
technology.

I think a better place for this sort of test is in some place out of the
production execution path, such as in an assertion that can be turned on
for testing and off for production, with no runtime cost when it is
turned off.  A easy mechanism for doing that in Smalltalk would be nice.

Cheers,
Dave