[BUG]Collection>>removeAll:

Sun Aug 25 23:29:28 UTC 2002

Stephan Rudlof <sr at evolgo.de> wrote
    What about:

	Collection>>removeAll: aCollection
	  self isEmpty
	    ifFalse: [self assert: [self ~~ aCollection].
	             aCollection do: [:each | self remove: each]].
	  ^ aCollection

	Would also be a shortcut for removing some - potentially huge - collection
	from an already emptied one...

On Saturday I tried a few benchmarks.

Remember how sokme people were alleging that adding the test
    aCollection == self ifTrue: [" handle it somehow "].
    "more or less previous code"
would be too slow?

Thanks to the O(self size * aCollection size) behaviour of #removeAll:,
it really does turn out that even for relatively small collections, it
is practically impossible to measure any slowdown.

Not surprising really, because it turns out that for

    removeAllSafely: aCollection
	aCollection == self ifTrue: [^self removeAllSafely: aCollection copy].
	^aCollection do: [:each | self remove: each]

the total overhead for the normal is only two bytecodes, because changing
from
    aCollection do: [...].
    ^aCollection
to  ^aCollection do: [...]
turns out to save two bytecodes.

The real surprise came when I discovered that I could speed #removeAll:
up by 30% with very little effort.  (The technique is really really obvious
and the Self compiler would do it automatically.  I leave it as an exercise
for the reader.)

So:
(1) #removeAll: is quadratic, although it could be linear time
    for all the classes where it currently makes sense.
    (Yes, I am aware that it uses #= for deciding what to remove,
    not #==.  I am also aware of what we could easily do about that.)

(2) Even given its structure, there is a trivial change that would deliver
    the same results in all cases that could speed it up by 30% across the
    board.

I do NOT think that it is appropriate for people to worry too much about
hacks to #removeAll: that do not address any of its real problems, nor that
it is reasonable to complain overmuch about an added overhead of 2 bytecodes.

The end result of my benchmarking was a version of #removeAll: that could
handle the (x removeAll: x) case while running 30% faster for all other
cases.  So much for overhead!