[BUG]Collection>>removeAll:

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Aug 29 01:58:11 UTC 2002


NEW RESULTS!

There are people in this mailing list who are of the opinion
that the quirk in the implementation of #removeAll: is not a bug
but an obvious precondition that programmers should always have been
aware of, so it doesn't need fixing.

I am deeply indebted to mwai at geico.com for telling me that

	|coll|
	coll := OrderedCollection with: 1 with: 2.
	coll removeAll: coll
	coll inspect --> OrderedCollection()  
	
_works_ in IBM Smalltalk, despite IBM Smalltalk having the same
implementation of #removeAll:.

This raised the extremely interesting possibility that it might have
worked in Squeak once upon a time.  I decided to look for evidence.
If IBM Smalltalk got it right despite having the same #removeAll:,
the difference must be in #remove:ifAbsent:.

Here is how #remove:ifAbsent: works in Squeak.

    remove: oldObject ifAbsent: absentBlock
	|index|
	index := firstIndex.
	[index <= lastIndex] whileTrue: [
	    oldObject = (array at: index)
		ifTrue:  [self removeIndex: index.  ^oldObject]
		ifFalse: [index := index + 1]].
	^absentBlock value

    removeIndex: removedIndex
	array
	    replaceFrom: removedIndex
	    to:          lastIndex - 1
	    with:        array
	    startingAt:  removedIndex + 1.
	array at: lastIndex put: nil.
	lastIndex := lastIndex - 1.

Now removeIndex: is NOT original Holy Writ, it was changed by 'ar'
in 2002.05.22.  Presumably it was changed in order to exploit the
#replaceFrom:to:with:startingAt: primitive (105).  I have no idea
what used to be there before.

While the comment on Array>>replaceFrom:to:with:startingAt: does
not say so, I presume that the primitive does the same as
SequenceableCollection>>replaceFrom:to:with:startingAt:,
only faster.

That's a pity, because that means that we can only move array elements
DOWN to fill a hole, not UP, so that the first element of an ordered
collection is the most expensive one to remove.

A consequence of this is that
    oc removeAll: (oc copy)
takes quadratic time, when it _could_ take linear time.

Suppose we decide to make removing the _first_ element cheap by
changing removeIndex: to

    removeIndex: removedIndex
	removedIndex = firstIndex
	    ifTrue: [
	        array at: firstIndex put: nil.
	        firstIndex := firstIndex + 1.
	        ^self].
	array
	    replaceFrom: removedIndex
	    to:          lastIndex - 1
	    with:        array
	    startingAt:  removedIndex + 1.
	array at: lastIndex put: nil.
	lastIndex := lastIndex - 1.

NOW anOrderedCollection removeAll: anOrderedCollection works.
Without any change to #removeAll: at all!

Now, I don't _know_ whether removing the first element of an ordered
collection used to be cheap, but if ever it was, then nobody would have
thought to document the hole in the implementation of
OrderedCollection>>removeAll: because there wouldn't have _been_ any
hole, aliasing or no aliasing.

In the absence of <primitive: 105>, how might removeIndex: have been
written?  Obviously, to do the fewest moves:

    removeIndex: removedIndex
	(lastIndex + firstIndex) // 2 < removedIndex
	    ifTrue: [
		removedIndex + 1 to: lastIndex do: [:index |
		    array at: index - 1 put: array at: index].
		array at: lastIndex put: nil.
		lastIndex := lastIndex - 1]
	    ifFalse: [
		removedIndex - 1 to: 1 by: -1 do: [:index |
		    array at: index + 1 put: array at: index].
		array at: firstIndex put: nil.
		firstIndex := firstIndex + 1].

And there would not have been any hole in the implementation of
OrderedCollection>>removeAll:.

So the "don't change it" brigade may be fighting to preserve a bug
which was introduced comparatively recently.

I'm checking this stuff in Squeak 2.8.  I notice that
OrderedCollection>>removeallSuchThat: used
#removeIndex: to remove elements, thus taking a method that _could_
very easily have been linear time and making it take quadratic time.
That was added or changed by 'sma' in 2002.05.12.  It looks as though
all the methods in OrderedCollection that call #removeIndex: went in
at about the same time.

At any rate, it is now clear that Squeak cannot be precisely compatible 
with *both* VW (where coll removeAll: coll is said not to work)
*and* IBM Smalltalk (where it is said to work).  So we might as well be
compatible with the one that works.




More information about the Squeak-dev mailing list