Place for efficiency improvement [was: Re: [BUG]Collection>>removeAll:]
Stephan Rudlof
sr at evolgo.de
Thu Aug 29 15:38:16 UTC 2002
Richard,
Wow, this is dawn interesting!
Independent of the discussion regarding the semantics of the
[remove|add]All: family you have found an important place for possible
efficiency improvements.
Richard A. O'Keefe wrote:
> NEW RESULTS!
>
> There are people in this mailing list who are of the opinion
> that the quirk in the implementation of #removeAll: is not a bug
> but an obvious precondition that programmers should always have been
> aware of, so it doesn't need fixing.
>
> I am deeply indebted to mwai at geico.com for telling me that
>
> |coll|
> coll := OrderedCollection with: 1 with: 2.
> coll removeAll: coll
> coll inspect --> OrderedCollection()
>
> _works_ in IBM Smalltalk, despite IBM Smalltalk having the same
> implementation of #removeAll:.
>
> This raised the extremely interesting possibility that it might have
> worked in Squeak once upon a time. I decided to look for evidence.
> If IBM Smalltalk got it right despite having the same #removeAll:,
> the difference must be in #remove:ifAbsent:.
>
> Here is how #remove:ifAbsent: works in Squeak.
>
> remove: oldObject ifAbsent: absentBlock
> |index|
> index := firstIndex.
> [index <= lastIndex] whileTrue: [
> oldObject = (array at: index)
> ifTrue: [self removeIndex: index. ^oldObject]
> ifFalse: [index := index + 1]].
> ^absentBlock value
>
> removeIndex: removedIndex
> array
> replaceFrom: removedIndex
> to: lastIndex - 1
> with: array
> startingAt: removedIndex + 1.
> array at: lastIndex put: nil.
> lastIndex := lastIndex - 1.
>
> Now removeIndex: is NOT original Holy Writ, it was changed by 'ar'
> in 2002.05.22. Presumably it was changed in order to exploit the
> #replaceFrom:to:with:startingAt: primitive (105). I have no idea
> what used to be there before.
>
> While the comment on Array>>replaceFrom:to:with:startingAt: does
> not say so, I presume that the primitive does the same as
> SequenceableCollection>>replaceFrom:to:with:startingAt:,
> only faster.
>
> That's a pity, because that means that we can only move array elements
> DOWN to fill a hole, not UP, so that the first element of an ordered
> collection is the most expensive one to remove.
>
> A consequence of this is that
> oc removeAll: (oc copy)
> takes quadratic time, when it _could_ take linear time.
>
> Suppose we decide to make removing the _first_ element cheap by
> changing removeIndex: to
>
> removeIndex: removedIndex
> removedIndex = firstIndex
> ifTrue: [
> array at: firstIndex put: nil.
> firstIndex := firstIndex + 1.
> ^self].
> array
> replaceFrom: removedIndex
> to: lastIndex - 1
> with: array
> startingAt: removedIndex + 1.
> array at: lastIndex put: nil.
> lastIndex := lastIndex - 1.
>
> NOW anOrderedCollection removeAll: anOrderedCollection works.
> Without any change to #removeAll: at all!
>
> Now, I don't _know_ whether removing the first element of an ordered
> collection used to be cheap, but if ever it was, then nobody would have
> thought to document the hole in the implementation of
> OrderedCollection>>removeAll: because there wouldn't have _been_ any
> hole, aliasing or no aliasing.
>
> In the absence of <primitive: 105>, how might removeIndex: have been
> written? Obviously, to do the fewest moves:
>
> removeIndex: removedIndex
> (lastIndex + firstIndex) // 2 < removedIndex
> ifTrue: [
> removedIndex + 1 to: lastIndex do: [:index |
> array at: index - 1 put: array at: index].
> array at: lastIndex put: nil.
> lastIndex := lastIndex - 1]
> ifFalse: [
> removedIndex - 1 to: 1 by: -1 do: [:index |
> array at: index + 1 put: array at: index].
> array at: firstIndex put: nil.
> firstIndex := firstIndex + 1].
What about the ifFalse: case realized by a new Array prim (the ifTrue: case
corresponds to the current implementation); e.g.:
Array>>removeIndex: removedIndex
(lastIndex + firstIndex) // 2 < removedIndex
ifTrue:
[array
replaceFrom: removedIndex
to: lastIndex - 1
with: array
startingAt: removedIndex+1.
array at: lastIndex put: nil.
lastIndex _ lastIndex - 1]
ifFalse:
[array
replaceBackwardsFrom: removedIndex
to: firstIndex + 1
with: array
startingAt: removedIndex-1.
array at: firstIndex put: nil.
firstIndex _ firstIndex + 1]
(note: code not tested); the new prim would be needed for the new method
Array>>replaceBackwardsFrom:to:startingAt:
.
I think this would be a valuable improvement.
Comments?
Greetings,
Stephan
>
> And there would not have been any hole in the implementation of
> OrderedCollection>>removeAll:.
>
> So the "don't change it" brigade may be fighting to preserve a bug
> which was introduced comparatively recently.
>
> I'm checking this stuff in Squeak 2.8. I notice that
> OrderedCollection>>removeallSuchThat: used
> #removeIndex: to remove elements, thus taking a method that _could_
> very easily have been linear time and making it take quadratic time.
> That was added or changed by 'sma' in 2002.05.12. It looks as though
> all the methods in OrderedCollection that call #removeIndex: went in
> at about the same time.
>
> At any rate, it is now clear that Squeak cannot be precisely compatible
> with *both* VW (where coll removeAll: coll is said not to work)
> *and* IBM Smalltalk (where it is said to work). So we might as well be
> compatible with the one that works.
>
>
>
--
Stephan Rudlof (sr at evolgo.de)
"Genius doesn't work on an assembly line basis.
You can't simply say, 'Today I will be brilliant.'"
-- Kirk, "The Ultimate Computer", stardate 4731.3
More information about the Squeak-dev
mailing list
|