[BUG]Collection>>removeAll:

Wed Aug 21 01:34:07 UTC 2002

Ian Piumarta <ian.piumarta at inria.fr> wrote:
	FWIW, having taken the time to go look what the debate is actually about,
	and then having thought about it a little, I find myself totally in
	agreement with David Griswold.

No, #removeAll: is clearly broken.
There are two ways to fix it:
(1) *DOCUMENT* the restriction and *ENFORCE* it at run time.
(2) *REMOVE* the restriction.

The important thing here is that there is a perfectly natural way to
invoke #removeAll: which doesn't work and instead of giving you a
helpful error notification, it quietly gives you complete rubbish.

There have been times when I have looked for a way to make a collection
empty.  Obviously you can't do this with a collection that has a fixed
size (descendants of ArrayedCollection), but for anything else, it's
quite natural to look for a method

    aCollection removeAll.  "note no colon and no argument."

In my own collection ADTs in imperative languages, I have always
included such an operation, and it has always proved useful.
There isn't one in Smalltalk, but there IS an obvious thing to try:

	myCollection removeAll: myCollection "make it empty".

It's meaningful, it's occasionally useful, and it doesn't work.

When you discover that it doesn't work, the next thing is to sacrifice
the identity of the object and do

	myCollection := myCollection copyEmpty

except that oddly enough, that's only defined for SequenceableCollections.
Why don't we have

	Collection>>
	copyEmpty
	    "Answer a new collection just like the receiver but with
	     no elements.  Overridden in SequenceableCollection."

	    ^self class new

So we end up doing

	myCollection := #() as: myCollection class "make it empty".

(At least I hope this works.)

	Considering this method is identical in so
	many Smalltalks (I haven't looked at VW, but I'm almost confident that...

What exactly does this prove?  Of the Smalltalk textbooks I have checked while
thinking about this problem, one of them doesn't as much as *mention*
#removeAll:.  So buggy code has survived for 20 years.  What is surprising
about that?  Is this evidence that it works?  Not on this planet.  It clearly
is NOT identical in GemStone (or wasn't in 1991).

	;) and given that from my own experience the first question one
	must ask when iterating over any collection is "is this
	collection stable under this iteration (in the context of the
	given argument[s])?",

I've just been reading some notes "Why do we make it all so difficult?"
by Bob Doran of the University of Auckland.  He's been reflecting on the
teaching of programming, especially using Java, and has found some truly
HORRIBLE examples of code that is thrown at first year students, code which
is extremely confusing but never needed to be.

This question about "stable under this iteration" is one that many Smalltalk
beginners won't even begin to understand.  If Smalltalk is truly to be
"suitable for children of all ages", then it is the library designer's job
to make the library user's job as easy as reasonably possible.

For what it's worth, one of the reasons I still use Mawk (an implementation
of AWK with good performance) is the fact that when I write
	for (key in table) {
	    # do absolutely anything I like to table[]
	}
it is just plain going to WORK.  I wondered how come.  I looked at the code,
and discovered that whenever you iterate over a hash table in AWK, it makes
a temporary array holding all the keys and iterates over that instead.  My
Mawk code has been known to outperform Python and Perl, despite this copy.

One of the reasons I like functional languages such as Haskell is again the
fact that this is just plain not a problem I *ever* have to worry about.

	then I believe it should be considered "pragmatically correct"

It isn't clear to me what "pragmatically correct" means.
It appears to mean "some lazy programmer has got away with this for 20 years".

	-- and any "error" that might exist is
	indeed in the method comment which should include some phrase
	along the lines of:

		"If aCollection == self then the result is undefined."

That's a good start, but it is nowhere near enough.  Someone who has not
looked at the implementation because they've read a book (if they can find
one that tells them what #removeAll: returns; some do but they weren't in
my office at the time) and "know" what should happen has a right to expect
that the thing will either work or complain.

It is unforgiveable for a core library operation to QUIETLY return
garbage.  Forgiveness is obtainable only one condition that the code
is repaired.

A repair may take the form of raising an exception.  That's fine.

But quietly returning garbage is not tolerable.

	Otherwise we are starting to open a far larger can of worms than
	I think anybody wants to deal with.

I don't personally want to deal with it now that the body of code exists.
In the 20kSLOC Pascal data structure library I once wrote, I *did* take
pains to deal with it.  In fact the analogue of removeAll: was one of the
cases I considered back then.  While the job of slogging through the
collection classes (even *just* the collection classes) is daunting, if
we do it one method at a time we'll be doing *something* to make life
sweeter for Squeakers.

	For example, I challenge
	ANYBODY on this list to tell me (WITHOUT trying it and WITHOUT
	reading the code involved) the behaviour and/or result of the
	following:

		| oc |
		oc _ OrderedCollection with: #foo.
		^oc addAll: oc.

It's obvious what the answer SHOULD be.
It's obvious what the answer COULD be.
What the answer *is* I shudder to think.  Since addAll: adds things to
the end of an OrderedCollection, it could even be an infinite loop, so
I am *not* going to try this.

	While I concede that #removeAll:  might just have a
	mathematically "sound" solution (modulo identity complications),
	the #addAll:  case is far worse.

Surely we expect that
	x addAll: y
should have (except for the identity of its result)
the same effect as
	x addAll: y copy

#addAll: in OrderedCollection is handled by code that is specific to
OrderedCollection, so let's fix one class at a time.

Change these two methods:

    addAllFirst: anOrderedCollection
	"Add each element of anOrderedCollection at the beginning of
	 the receiver.  Answer an ordered collection containing the
	 elements added."
	|c|
	(c := anOrderedCollection) == self ifTrue: [c := c copy].
	c reverseDo: [:each | self addFirst: each].
	^c

    addAllLast: anOrderedCollection
	"Add each element of anOrderedCollection at the end of the
	 receiver.  Answer an ordered collection containing the
	 elements added."
	|c|
	(c := anOrderedCollection) == self ifTrue: [c := c copy].
	c do: [:each | self addLast: each].
	^c

	For one thing, the "correct"
	answer depends intimately on the semantics of the underlying
	collection (much more so than in the #removeAll:  case) and
	reconciling correctness with performance in all cases is
	probably (i) difficult, and (ii) highly undesirable (from
	pragmatic considerations).

I challenge all of this.

    c1 addAll: c2

is supposed to have the same effect as

    c2 copy someUnspecifiedPermutationDo: [:each | c1 add: each]

except for the result that is returned.

So whatever the implementation happens to be,

    (c := aCollection) == self ifTrue: [c := c copy].
    original code changed to use c instead of aCollection
    return c instead of aCollection

will do something sensible.

There WILL be an overhead:
    push receiver
    push method argument 1
    ==
    jump if false
but that is negligible compared with the other costs of #removeAll:    
and #addAll:.  Practically speaking, performance isn't harmed at all,
EXCEPT in the case where performance was previously infinitely bad
because the behaviour was wrong.

	Pragmatics aside, it's in the interests of portability (between
	Smalltalk dialects, supposing just for a moment that the
	denizens of Pleasuredome Squeak care about such things)
	explicitly not to rely on any particular behaviour in such
	"degenerate cases".

This amounts to an argument that we should never ever fix any bug that
any other Smalltalk implementation might happen to share.  I don't buy it.

	Otherwise I will inisit that Jecel's "assignment is a binary selector" be
	adopted and that in "ProcessorScheduler>>:= anObject" we test for
	"anObject isNil" and raise a "BlueBookExampleForHumourOnly" exception
	when needed.  Reductio Ad Absurdum, Batman!

Yes, the argument that we shouldn't fix Squeak's collection classes because
other Smalltalks are broken too is indeed absurd.

While I think that the BEST thing is that library operations should quietly
WORK in all reasonable cases, the next best thing is for them to put up an
error notification when they are about to fail.  But it is intolerable for
them to quietly return rubbish.