[BUG]Collection>>removeAll:

Richard A. O'Keefe ok at cs.otago.ac.nz
Wed Aug 21 00:51:10 UTC 2002


Chris Norton <chrisn at Kronos.com> wrote:
	You will not find an argument with me, Richard; your experience in computing
	far outshines mine.
	
It doesn't follow that I can't be dead wrong;
I have been before and I will be again, guaranteed.
That's why my personal motto isn't Kierkegaard's
"Fear most of all to be in error" (taken from Plato's Socrates)
but
"Fear most of all to REMAIN in error."
And rational/critical debate is a good way to get out of some errors.

	However, I will point out that the original comment, which I
	pasted into my version of the method, indicated that the returned
	collection should be the collection that was used as the argument to
	#removeAll:.

Absolutely right.  However, there's an "except" hidden in there,
"EXCEPT when aCollection == receiver", because the code doesn't
actually make sense then.

	I too was confused by this, so I looked at the implementation of
	#removeAll:  in VA Smalltalk and in VisualSmalltalk Enterprise
	-- all of these Smalltalks do the same thing -- they return the
	collection that was passed in.

You didn't need to do that; the interface, the problem, and very probably
the code all go back to Smalltalk-80.

	VA:
	
	Collection>>removeAll: aCollection
		"For each element in aCollection, remove the first element
		 from the receiver which is equal to this element."
	
		^aCollection do: [:element | self remove: element]

This code is broken.  The *comment* doesn't say what it is meant to return.
	
	VSE:
	
	Collection>>removeAll: aCollection
	    "Answer aCollection.  Remove all the elements
	     contained in aCollection from the receiver collection."
	
	    aCollection do: [:element | self remove: element].
	    ^aCollection
	
This code is broken.  The comment about what is removed is ambigous
when aCollection contains any repeated elements, e.g., #(1 1).

I well recall when I met Smalltalk in 1984 thinking that the interface
of the #remove* methods was fairly seriously inconvenient; the result I
would have preferred from all the #remove* methods is the receiver.

Interestingly enough, looking in my 1991 GemStone "Programming in OPAL"
manual, I find on page 10-2:
    remove: anObject
	Removes anObject from the receiver and returns THE RECEIVER.
	...
    remove: anObject ifAbsent: exceptionBlock
	Removes anObject from the receiver and returns THE RECEIVER.
	...
    removeAll: aCollection.
	Removes one occurrence of each element of aCollection from
	the receiver and returns THE RECEIVER.
	...

So I'm not the only one who thinks that the #remove* methods could have
a better result.  Certainly the Squeak 3.2 sources contain no sends to
#removeAll: where the result is even noticed, let alone depended on.

	What do you ANSI Smalltalk people think about this issue?

In the 1.9 draft (I'm too poor to buy standards; the pay of NZ academics
has grown at about half the inflation rate for at least the last 10 years)
section 5.7.5.5 says that the result of #removeAll: is "UNSPECIFIED".
I doubt that they changed this.  So we're both OK according to ANSI.

	Has this been wrong since the dawn of Smalltalk?
	
As we have seen, the code in so many Smalltalks is broken in the same way
that it looks very much as though it has always been broken.

Let's see, what does the Visual Works Application Developers' Guide say?

    Removing a Subcollection
	The removeAll: message allows you to remove all memebers (sic.!)
	of one collection from a target collection.  Send removeAll: to
	the collection from which you want elements removed.  The
	argument is a collection containing the elements to be removed.

	    | list |
	    list := List withAll: ColorValue constaneNames.
	    list removeAll: #(#red #green #blue).
	    ^list

	If an element is not found, an error is reported.
	
	Because removeAll: is defined in Collection, it can be used with
	any collections as receiver and argument.

Someone reading that wouldn't suspect that #removeAll: *had* a result.
They certainly wouldn't care to depend on it.  I checked three Smalltalk
textbooks to see what they said about the result, and again, they said
nothing about it.

So we have to go back to first principles.

Suppose someone DOES care about the result of #removeAll:.
Why would they care, and what would they care about?

Suppose I cared about the identity of aCollection.
If it's an object I already HAVE a reference to, so that I can TELL
whether it's that object or a copy, then I don't NEED it.
There is nothing to stop me writing
    pending removeAll: justCompleted.
    removed := justCompleted.

If I DON'T have a reference to aCollection, then I can't TELL
whether I got it or a copy back.
    removed := pending removeAll: (oracle tellMeWhatToRemove).

Could the contents be useful if the identity is not?
Yes, I could do this:
    processed addAll: (pending removeAll: (worker makeProgressOn: pending)).

The idea here is that the worker returns some subcollection of the pending
tasks that it has been able to complete.  Very well, what if it completes
ALL of them?  Especially if it is a diligent worker that always completes
all the tasks it knows of.  It might very well return the pending object.

With the fix I posted, this last code fragment will work.

Or suppose we want to know how much smaller the receiver got?

    decrease := (pending removeAll: (oracle tellMeWhatToRemove)) size.

If the oracle happens to return pending as what to remove, this will
break UNLESS #removeAll: returns a copy in that case.




More information about the Squeak-dev mailing list