[BUG]Collection>>removeAll:
Stephan Rudlof
sr at evolgo.de
Tue Aug 27 09:31:53 UTC 2002
Richard A. O'Keefe wrote:
> Stephan Rudlof <sr at evolgo.de> wrote concerning:
> > Collection>>removeAll: aCollection
> > self isEmpty
> > ifFalse: [self assert: [self ~~ aCollection].
> > aCollection do: [:each | self remove: each]].
> > ^aCollection
>
> I don't view this as a 'hack', if I'm looking for cheap and easy to
> understand efficiency improvements.
>
> This
> - adds an extra full message send to *every* invocation (cost C)
> - speeds up cases where self is empty (benefit Bx where x is the
> average length of aCollection in this case)
> Suppose that proportion p of invocations to #removeAll: have an
> empty receiver and 1-p do not.
>
> Then the overall change so C - pBx.
> This is an efficiency *improvement* if and only if pBx > C iff p > C/Bx.
>
> Now message sends are surprisingly expensive in Squeak.
> In fact you _can_ speed up #removeAll: quite substantially by removing
> one, and if you are looking for cheap and easy to understand efficiency
> improvements, you should start by REMOVING message sends like this:
>
> removeAll: aCollection
> self == aCollection ifTrue: [... do something else ...]
> ^aCollection do: [:each |
> self remove: each ifAbsent: [self errorNotFound: each]].
>
> I've measured that and for the case of OrderedCollection receiver,
> Array aCollection, and SmallInteger elements, the speedup was a consistent
> 30% over a range of sizes, with *no* slowdown anyhere.
>
> > that doesn't look as though it will pay off all that often,
>
> Who knows...
>
> Well, we know C is large. We have reason to suspect that x is small (since
> #removeAll: applied to OrderedCollections is so inefficient that someone
> would have noticed if x were in general large). We also have reason to
> suspect that #removeAll: is unlikely to be sent to a collection that the
> programmer _knows_ to be empty, to p is going to be the proportion of cases
> that were thought to be possibly non-empty but weren't, which is likely
> quite small.
>
> In short, in the absence of measurements, we have reason to expect this
> to be a performance loss.
>
> > and *does* most undesirably block the error message when the receiver is
> > empty.
>
> OK, this is a point: my first suggestion treats an empty collection with
> itself as argument different from a not empty one.
>
> What we are talking about here is "conceptual type errors".
> It's the reason that the FIX I posted adds
> removeAll: aCollection self shouldNotImplement
> and related methods to ArrayedCollection; if the method won't work when
> it _does_ remove something, it should not delay debugging by appearing to
> be successful when it just happens not to remove anything this time.
>
> I have thought, the result of removing something empty from itself is
> obvious.
>
> *If* it is the kind of thing that could *ever* remove something from itself,
> yes. The main point of this thread is that operations should be as simple
> for programmers to understand and use safely as they reasonably can be;
> every special case in the description of a method makes it harder to
> understand and use correctly.r
>
> x removeAll: y
> removes all the elements of y (which can be any kind of collection)
> from x (which must be a kind of collection that can change size)
>
> is simple to understand.
>
> x removeAll: y
> removes all the elements of y (which can be any kind of collection)
> from x (which can be any kind of collection)
> EXCEPT that y must not be x
> EXCEPT that if x is not empty it must be able to change size
> EXCEPT that if none of y's elements occurs in x, x can be any collection
>
> is hard to understand and use correctly.
>
> But I have to admit, that this could cover programming errors.
>
> For me, the issue is preventing errors in the first place by making
> operations as simple to understand and use as we can afford to.
>
> For beeing consistent in the semantics for as empty as not empty
> Collections, I'd also be happy with
>
> Collection>>removeAll: aCollection
> self assert: [self ~~ aCollection].
> self isEmpty
> ifFalse: [aCollection do: [:each | self remove: each]].
> ^ aCollection
>
> ; which should give the best from both worlds.
>
> Except that there is no evidence that removing things from empty collections
> is currently a performance problem that _needs_ fixing or that this change
> _does_ fix it. It could be true, so how about some numbers?
>
>
>
Trial and error: I have tried, but
| colls emptyColl |
colls _ Collection allSubInstances.
emptyColl _ OrderedCollection new.
Smalltalk garbageCollect.
{
[colls do: [:coll | emptyColl removeAll: coll]] timeToRun.
[colls do: [:coll | emptyColl removeAllSr: coll]] timeToRun.
}
gives an error (for >>removeAll:)! We both have overlooked, that in the
current implementation it is *not* allowed to try to remove something from
an empty collection. So my speed up idea is nonsense with the current
semantics, which automatically gives an error notifier.
What works?
| colls emptyColl |
colls _ Collection allSubInstances.
emptyColl _ OrderedCollection new.
Smalltalk garbageCollect.
{
[colls do: [:coll | emptyColl copyWithoutAll: coll]] timeToRun.
[colls do: [:coll | emptyColl copyWithoutAllSr: coll]] timeToRun.
}
gives the timings
#(4598 3984) #(4480 3820) #(4469 3861) #(4476 4307)
for
copyWithoutAllSr: aCollection
"Answer a copy of the receiver that does not contain any elements
equal to those in aCollection."
^ self isEmpty
ifTrue: [self species new]
ifFalse: [self reject: [:each | aCollection includes: each]]
; but this haven't been the problem.
What remains?
Just your speedup by omitting one message send per element via calling
>>remove:ifAbsent: directly together with the additional argument check by
self == aCollection ifTrue: [... do something else ...]
or
self assert: [self ~~ aCollection]
.
Greetings,
Stephan
--
Stephan Rudlof (sr at evolgo.de)
"Genius doesn't work on an assembly line basis.
You can't simply say, 'Today I will be brilliant.'"
-- Kirk, "The Ultimate Computer", stardate 4731.3
More information about the Squeak-dev
mailing list
|