[BUG]Collection>>removeAll:

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Aug 26 01:34:13 UTC 2002


Chris Norton <chrisn at Kronos.com> wrote:
	I am only opposed to the copying that can be plainly seen in
	your method (below).

Then I apologise for mis-reading what you wrote.
I thought you were (very reasonably) objecting to copy-in-#do:,
which was not my suggestion at all.

I honestly cannot see any reason to object to the copy in #removeAll:.

	I don't *know* what kind of penalties that that
	innocent looking #copy will result in.

Why does *THAT* bother you mean you don't know what kind of penalties
the innocent looking #remove: will result in?

In all the cases where #removeAll: is currently defined, #copy is O(N).
In all the cases where #removeAll: is currently defined, #removeAll: itself
is at least O(N); for OrderedCollection it is quadratic.

For new kinds of collections, who knows what #copy might cost?
But then, for new kinds of collections, who knows what #remove: might cost?
Why boggle at the mere _possibility_ that #copy _might_ be expensive,
but smile happily at an implementation which _is_ quadratic for
OrderedCollection?

	And, as I've said before, I
	would prefer to retain the identity of method argument.

Do you, personally, have any code that uses the result of #removeAll: at all?
Have you checked whether returning the right state would break it?
If not you personally, is there any code you have seen that works in Squeak
and depends on the result of #removeAll:, and would _not_ break if the
identity were right but the state wrong?

	In an effort to remain on a constructive plain, I'd like to promote Stephan
	Rudlof's implementation suggestion; it looks very promising to me.
	
	Collection>>removeAll: aCollection
	  self isEmpty
	    ifFalse: [self assert: [self ~~ aCollection].
	             aCollection do: [:each | self remove: each]].
	  ^aCollection
	
	In his implementation, he retains the original functionality and he adds a
	warning.  Without actually trying it, the code looks good to me.
	
Is this really a practical improvement on the code that I suggested?
Remember, I have repeatedly suggested

    self == aCollection ifTrue: [self error: 'some error message'].
    ^aCollection do: [:each | self remove: each]

as an acceptable alternative.  All Rudlof added to that was an efficiency
hack (isEmpty) that doesn't look as though it will pay off all that often,
and *does* most undesirably block the error message when the receiver is
empty.

Here's a demonstration that the current "hole" in the definition
is entirely an artefact of that way that #removeAll: happens to be
implemented.

Suppose someone was concerned about the quadractic cost of #removeAll:
for OrderedCollections and decided to do something about it.

Here is the code I wrote.  The variable names are short to save typing
time.  I am *not* proposing this for incorporation in Squeak.  It is to
make a point only.  There could well be transcription errors.

    removeAllLinearly: aCollection
	|s d e n z m|
	
	m := Dictionary new: aCollection size.
	z := [0].
	aCollection do: [:each | m at: each put: (m at: each ifAbsent: z) + 1].
	s := d := firstIndex - 1.
	[s < lastIndex] whileFalse: [
	    s := s + 1.
	    e := array at: s.
	    n := m at: t ifAbsent: z.
	    n == 0
	        ifTrue: [d := d + 1. array at: d put: e]
	        ifFalse: [m at: t put: n - 1]].
	lastIndex := d.
	^aCollection

I have checked the original of this, and it worked, and DID scale linearly.
For small OrderedCollections, it was slower than #removeAll:, but since it
scales linearly, it is eventually better.  If the overheads in Dictionary
were not so high, it could be competitive with #removeAll:.  (I am tempted
to add incrementAt: and decrementAt: methods...  Yep.  That shaved 25% off
the total.)

The thing is, this code _doesn't_ have an aliasing problem.  If we had
started with code like this, there would be a torrent of abuse hurled at
the head of anyone who dared to suggest the current #removeAll: implementation.




More information about the Squeak-dev mailing list