Place for efficiency improvement [was: Re: [BUG]Collection>>removeAll:]

Stephan Rudlof sr at evolgo.de
Thu Aug 29 15:38:16 UTC 2002


Richard,

Wow, this is dawn interesting!

Independent of the discussion regarding the semantics of the
[remove|add]All: family you have found an important place for possible
efficiency improvements.

Richard A. O'Keefe wrote:
> NEW RESULTS!
> 
> There are people in this mailing list who are of the opinion
> that the quirk in the implementation of #removeAll: is not a bug
> but an obvious precondition that programmers should always have been
> aware of, so it doesn't need fixing.
> 
> I am deeply indebted to mwai at geico.com for telling me that
> 
> 	|coll|
> 	coll := OrderedCollection with: 1 with: 2.
> 	coll removeAll: coll
> 	coll inspect --> OrderedCollection()  
> 	
> _works_ in IBM Smalltalk, despite IBM Smalltalk having the same
> implementation of #removeAll:.
> 
> This raised the extremely interesting possibility that it might have
> worked in Squeak once upon a time.  I decided to look for evidence.
> If IBM Smalltalk got it right despite having the same #removeAll:,
> the difference must be in #remove:ifAbsent:.
> 
> Here is how #remove:ifAbsent: works in Squeak.
> 
>     remove: oldObject ifAbsent: absentBlock
> 	|index|
> 	index := firstIndex.
> 	[index <= lastIndex] whileTrue: [
> 	    oldObject = (array at: index)
> 		ifTrue:  [self removeIndex: index.  ^oldObject]
> 		ifFalse: [index := index + 1]].
> 	^absentBlock value
> 
>     removeIndex: removedIndex
> 	array
> 	    replaceFrom: removedIndex
> 	    to:          lastIndex - 1
> 	    with:        array
> 	    startingAt:  removedIndex + 1.
> 	array at: lastIndex put: nil.
> 	lastIndex := lastIndex - 1.
> 
> Now removeIndex: is NOT original Holy Writ, it was changed by 'ar'
> in 2002.05.22.  Presumably it was changed in order to exploit the
> #replaceFrom:to:with:startingAt: primitive (105).  I have no idea
> what used to be there before.
> 
> While the comment on Array>>replaceFrom:to:with:startingAt: does
> not say so, I presume that the primitive does the same as
> SequenceableCollection>>replaceFrom:to:with:startingAt:,
> only faster.
> 
> That's a pity, because that means that we can only move array elements
> DOWN to fill a hole, not UP, so that the first element of an ordered
> collection is the most expensive one to remove.
> 
> A consequence of this is that
>     oc removeAll: (oc copy)
> takes quadratic time, when it _could_ take linear time.
> 
> Suppose we decide to make removing the _first_ element cheap by
> changing removeIndex: to
> 
>     removeIndex: removedIndex
> 	removedIndex = firstIndex
> 	    ifTrue: [
> 	        array at: firstIndex put: nil.
> 	        firstIndex := firstIndex + 1.
> 	        ^self].
> 	array
> 	    replaceFrom: removedIndex
> 	    to:          lastIndex - 1
> 	    with:        array
> 	    startingAt:  removedIndex + 1.
> 	array at: lastIndex put: nil.
> 	lastIndex := lastIndex - 1.
> 
> NOW anOrderedCollection removeAll: anOrderedCollection works.
> Without any change to #removeAll: at all!
> 
> Now, I don't _know_ whether removing the first element of an ordered
> collection used to be cheap, but if ever it was, then nobody would have
> thought to document the hole in the implementation of
> OrderedCollection>>removeAll: because there wouldn't have _been_ any
> hole, aliasing or no aliasing.
> 
> In the absence of <primitive: 105>, how might removeIndex: have been
> written?  Obviously, to do the fewest moves:


> 
>     removeIndex: removedIndex
> 	(lastIndex + firstIndex) // 2 < removedIndex
> 	    ifTrue: [
> 		removedIndex + 1 to: lastIndex do: [:index |
> 		    array at: index - 1 put: array at: index].
> 		array at: lastIndex put: nil.
> 		lastIndex := lastIndex - 1]
> 	    ifFalse: [
> 		removedIndex - 1 to: 1 by: -1 do: [:index |
> 		    array at: index + 1 put: array at: index].
> 		array at: firstIndex put: nil.
> 		firstIndex := firstIndex + 1].

What about the ifFalse: case realized by a new Array prim (the ifTrue: case
corresponds to the current implementation); e.g.:

Array>>removeIndex: removedIndex
  (lastIndex + firstIndex) // 2 < removedIndex
    ifTrue:
      [array
        replaceFrom: removedIndex
        to: lastIndex - 1
        with: array
        startingAt: removedIndex+1.
      array at: lastIndex put: nil.
      lastIndex _ lastIndex - 1]
    ifFalse:
      [array
        replaceBackwardsFrom: removedIndex
        to: firstIndex + 1
        with: array
        startingAt: removedIndex-1.
      array at: firstIndex put: nil.
      firstIndex _ firstIndex + 1]

(note: code not tested); the new prim would be needed for the new method
  Array>>replaceBackwardsFrom:to:startingAt:
.

I think this would be a valuable improvement.

Comments?


Greetings,

Stephan

> 
> And there would not have been any hole in the implementation of
> OrderedCollection>>removeAll:.
> 
> So the "don't change it" brigade may be fighting to preserve a bug
> which was introduced comparatively recently.
> 
> I'm checking this stuff in Squeak 2.8.  I notice that
> OrderedCollection>>removeallSuchThat: used
> #removeIndex: to remove elements, thus taking a method that _could_
> very easily have been linear time and making it take quadratic time.
> That was added or changed by 'sma' in 2002.05.12.  It looks as though
> all the methods in OrderedCollection that call #removeIndex: went in
> at about the same time.
> 
> At any rate, it is now clear that Squeak cannot be precisely compatible 
> with *both* VW (where coll removeAll: coll is said not to work)
> *and* IBM Smalltalk (where it is said to work).  So we might as well be
> compatible with the one that works.
> 
> 
> 


-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3




More information about the Squeak-dev mailing list