Stephan Rudlof sr@evolgo.de wrote: your proposal implementing SortedCollection with 'lazy data structures' is very interesting. ...
One question is: How is the relation between changes of the SortedCollection and looking >>at: it, since some time is spend for the checks there. If accessing it outweights the changes by magnitudes there could be a *small* performance penalty.
The first question is whether the overhead can be made small. Retaining the names a, m, z, instead of calling self pvtEnsureSorted one would do m == z ifFalse: [self pvtEnsureSorted] which costs push, push, compare, branch.
I made a subclass SlowerCollection of SortedCollection. The only change was that I made a copy of #at: and stuffed firstIndex == firstIndex ifFalse: [self error: 'Impossible']. into it at the appropriate place.
SortedCollection time for a million sends of #at: 1.901 seconds. SlowerCollection time for a million sends of #at: 2.071 seconds. That is, adding this check made #at: about 9% slower. BUT WE CAN DO BETTER!
SortedCollection inherits #at: from OrderedCollection, which does some dynamic bounds checking. Use four variables: firstIndex (from OrderedCollection), lastIndex (from OrderedCollection), lastSortedIndex (for laziness), and fakeLastIndex (to make #at: fast). Ensure that fakeLastIndex = (lastSortedIndex = lastIndex ifTrue: [lastIndex] ifFalse: [-1]). Then have #at: check against fakeLastIndex instead of lastIndex. In the branch that currently handles out-of-range indices, #at: would restore the stronger invariant and then resend to super.
With that hack the time penalty would be ZERO except for the first call to #at: after an addition.
The basic reason we can make the relative time penalty for SortedCollection effectively zero is that the time penalty for #at: in OrderedCollection is already rather high. If anyone was really worried about the speed of #at:, they should be thinking about writing a primitive for OrderedCollection>>at:.
squeak-dev@lists.squeakfoundation.org