[BUG]Collection>>removeAll:

Tue Aug 27 09:31:53 UTC 2002

Richard A. O'Keefe wrote:
> Stephan Rudlof <sr at evolgo.de> wrote concerning:
> 	> 	Collection>>removeAll: aCollection
> 	> 	  self isEmpty
> 	> 	    ifFalse: [self assert: [self ~~ aCollection].
> 	> 	             aCollection do: [:each | self remove: each]].
> 	> 	  ^aCollection
> 
> 	I don't view this as a 'hack', if I'm looking for cheap and easy to
> 	understand efficiency improvements.
> 	
> This
>  - adds an extra full message send to *every* invocation (cost C)
>  - speeds up cases where self is empty (benefit Bx where x is the
>    average length of aCollection in this case)
> Suppose that proportion p of invocations to #removeAll: have an
> empty receiver and 1-p do not.
> 
> Then the overall change so C - pBx.
> This is an efficiency *improvement* if and only if pBx > C iff p > C/Bx.
> 
> Now message sends are surprisingly expensive in Squeak.  
> In fact you _can_ speed up #removeAll: quite substantially by removing
> one, and if you are looking for cheap and easy to understand efficiency
> improvements, you should start by REMOVING message sends like this:
> 
> 	removeAll: aCollection
> 	    self == aCollection ifTrue: [... do something else ...]
> 	    ^aCollection do: [:each |
> 		self remove: each ifAbsent: [self errorNotFound: each]].
> 
> I've measured that and for the case of OrderedCollection receiver,
> Array aCollection, and SmallInteger elements, the speedup was a consistent
> 30% over a range of sizes, with *no* slowdown anyhere.
> 
> 	> that doesn't look as though it will pay off all that often,
> 	
> 	Who knows...
> 	
> Well, we know C is large.  We have reason to suspect that x is small (since
> #removeAll: applied to OrderedCollections is so inefficient that someone
> would have noticed if x were in general large).  We also have reason to
> suspect that #removeAll: is unlikely to be sent to a collection that the
> programmer _knows_ to be empty, to p is going to be the proportion of cases
> that were thought to be possibly non-empty but weren't, which is likely
> quite small.
> 
> In short, in the absence of measurements, we have reason to expect this
> to be a performance loss.
> 
> 	> and *does* most undesirably block the error message when the receiver is
> 	> empty.
> 	
> 	OK, this is a point: my first suggestion treats an empty collection with
> 	itself as argument different from a not empty one.
> 
> What we are talking about here is "conceptual type errors".
> It's the reason that the FIX I posted adds
> removeAll: aCollection self shouldNotImplement
> and related methods to ArrayedCollection; if the method won't work when
> it _does_ remove something, it should not delay debugging by appearing to
> be successful when it just happens not to remove anything this time.
> 
> 	I have thought, the result of removing something empty from itself is
> 	obvious.
> 
> *If* it is the kind of thing that could *ever* remove something from itself,
> yes.  The main point of this thread is that operations should be as simple
> for programmers to understand and use safely as they reasonably can be;
> every special case in the description of a method makes it harder to
> understand and use correctly.r
> 
>     x removeAll: y
>     removes all the elements of y (which can be any kind of collection)
>     from x (which must be a kind of collection that can change size)
> 
> is simple to understand.
> 
>     x removeAll: y
>     removes all the elements of y (which can be any kind of collection)
>     from x (which can be any kind of collection)
>     EXCEPT that y must not be x
>     EXCEPT that if x is not empty it must be able to change size
>     EXCEPT that if none of y's elements occurs in x, x can be any collection
> 
> is hard to understand and use correctly.
> 
> 	But I have to admit, that this could cover programming errors.
> 	
> For me, the issue is preventing errors in the first place by making
> operations as simple to understand and use as we can afford to.
> 
> 	For beeing consistent in the semantics for as empty as not empty
> 	Collections, I'd also be happy with
> 	
> 	Collection>>removeAll: aCollection
> 	  self assert: [self ~~ aCollection].
> 	  self isEmpty
> 	    ifFalse: [aCollection do: [:each | self remove: each]].
> 	  ^ aCollection
> 	
> 	; which should give the best from both worlds.
> 	
> Except that there is no evidence that removing things from empty collections
> is currently a performance problem that _needs_ fixing or that this change
> _does_ fix it.  It could be true, so how about some numbers?
> 
> 
> 

Trial and error: I have tried, but

| colls emptyColl |
colls _ Collection allSubInstances.
emptyColl _ OrderedCollection new.
Smalltalk garbageCollect.
{
	[colls do: [:coll | emptyColl removeAll: coll]] timeToRun.
	[colls do: [:coll | emptyColl removeAllSr: coll]] timeToRun.
}

gives an error (for >>removeAll:)! We both have overlooked, that in the
current implementation it is *not* allowed to try to remove something from
an empty collection. So my speed up idea is nonsense with the current
semantics, which automatically gives an error notifier.

What works?

| colls emptyColl |
colls _ Collection allSubInstances.
emptyColl _ OrderedCollection new.
Smalltalk garbageCollect.
{
	[colls do: [:coll | emptyColl copyWithoutAll: coll]] timeToRun.
	[colls do: [:coll | emptyColl copyWithoutAllSr: coll]] timeToRun.
}

gives the timings
  #(4598 3984) #(4480 3820) #(4469 3861) #(4476 4307)
for
  copyWithoutAllSr: aCollection
	"Answer a copy of the receiver that does not contain any elements
	equal to those in aCollection."

	^ self isEmpty
		ifTrue: [self species new]
		ifFalse: [self reject: [:each | aCollection includes: each]]

; but this haven't been the problem.

What remains?

Just your speedup by omitting one message send per element via calling
>>remove:ifAbsent: directly together with the additional argument check by
  self == aCollection ifTrue: [... do something else ...]
or
  self assert: [self ~~ aCollection]
.

Greetings,

Stephan
-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3