[BUG]Collection>>removeAll:

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Aug 26 23:55:08 UTC 2002


Stephan Rudlof <sr at evolgo.de> wrote concerning:
	> 	Collection>>removeAll: aCollection
	> 	  self isEmpty
	> 	    ifFalse: [self assert: [self ~~ aCollection].
	> 	             aCollection do: [:each | self remove: each]].
	> 	  ^aCollection

	I don't view this as a 'hack', if I'm looking for cheap and easy to
	understand efficiency improvements.
	
This
 - adds an extra full message send to *every* invocation (cost C)
 - speeds up cases where self is empty (benefit Bx where x is the
   average length of aCollection in this case)
Suppose that proportion p of invocations to #removeAll: have an
empty receiver and 1-p do not.

Then the overall change so C - pBx.
This is an efficiency *improvement* if and only if pBx > C iff p > C/Bx.

Now message sends are surprisingly expensive in Squeak.  
In fact you _can_ speed up #removeAll: quite substantially by removing
one, and if you are looking for cheap and easy to understand efficiency
improvements, you should start by REMOVING message sends like this:

	removeAll: aCollection
	    self == aCollection ifTrue: [... do something else ...]
	    ^aCollection do: [:each |
		self remove: each ifAbsent: [self errorNotFound: each]].

I've measured that and for the case of OrderedCollection receiver,
Array aCollection, and SmallInteger elements, the speedup was a consistent
30% over a range of sizes, with *no* slowdown anyhere.

	> that doesn't look as though it will pay off all that often,
	
	Who knows...
	
Well, we know C is large.  We have reason to suspect that x is small (since
#removeAll: applied to OrderedCollections is so inefficient that someone
would have noticed if x were in general large).  We also have reason to
suspect that #removeAll: is unlikely to be sent to a collection that the
programmer _knows_ to be empty, to p is going to be the proportion of cases
that were thought to be possibly non-empty but weren't, which is likely
quite small.

In short, in the absence of measurements, we have reason to expect this
to be a performance loss.

	> and *does* most undesirably block the error message when the receiver is
	> empty.
	
	OK, this is a point: my first suggestion treats an empty collection with
	itself as argument different from a not empty one.

What we are talking about here is "conceptual type errors".
It's the reason that the FIX I posted adds
removeAll: aCollection self shouldNotImplement
and related methods to ArrayedCollection; if the method won't work when
it _does_ remove something, it should not delay debugging by appearing to
be successful when it just happens not to remove anything this time.

	I have thought, the result of removing something empty from itself is
	obvious.

*If* it is the kind of thing that could *ever* remove something from itself,
yes.  The main point of this thread is that operations should be as simple
for programmers to understand and use safely as they reasonably can be;
every special case in the description of a method makes it harder to
understand and use correctly.r

    x removeAll: y
    removes all the elements of y (which can be any kind of collection)
    from x (which must be a kind of collection that can change size)

is simple to understand.

    x removeAll: y
    removes all the elements of y (which can be any kind of collection)
    from x (which can be any kind of collection)
    EXCEPT that y must not be x
    EXCEPT that if x is not empty it must be able to change size
    EXCEPT that if none of y's elements occurs in x, x can be any collection

is hard to understand and use correctly.

	But I have to admit, that this could cover programming errors.
	
For me, the issue is preventing errors in the first place by making
operations as simple to understand and use as we can afford to.

	For beeing consistent in the semantics for as empty as not empty
	Collections, I'd also be happy with
	
	Collection>>removeAll: aCollection
	  self assert: [self ~~ aCollection].
	  self isEmpty
	    ifFalse: [aCollection do: [:each | self remove: each]].
	  ^ aCollection
	
	; which should give the best from both worlds.
	
Except that there is no evidence that removing things from empty collections
is currently a performance problem that _needs_ fixing or that this change
_does_ fix it.  It could be true, so how about some numbers?




More information about the Squeak-dev mailing list