I should make it clear that I am in complete and total agreement with Stephan Rudlof sr@evolgo.de that clients of SortedCollection should not be using an *UNCHECKED* #addLast:.
He found some code that was using #addLast: The first one is IndexFile>>readFrom: aStream messageFile: messageFile, its comment says: "Initialize myself from the given text stream. It is assumed that the .index file was written in order of ascending message timestamps, although this method is only less efficient, not incorrect, if this is not the case." A lazy implementation of SortedCollection with a sorting method that exploits existing order (and there are lots of those) would provide linear time building *without* any special bypass.
Let me make this clear: there are several known sorting methods, including some that are quite efficient, which have the pleasant property that if you ask them to sort N items that are already in sorted order, they only do O(N) comparisons; N-1 in some cases. This should be an affordable overhead. Dictionary>>keysSortedSafely "Answer a SortedCollection containing the receiver's keys." | sortedKeys | sortedKeys _ SortedCollection new: self size. sortedKeys sortBlock: [:x :y | "Should really be use <obj, string, num> compareSafely..." ((x isString and: [y isString]) or: [x isNumber and: [y isNumber]]) ifTrue: [x < y] ifFalse: [x class == y class ifTrue: [x printString < y printString] ifFalse: [x class name < y class name]]]. self keysDo: [:each | sortedKeys addLast: each]. ^ sortedKeys reSort This would seem to be a case for #addAll:.
SortedCollection>> addAll: aCollection aCollection size > (self size // 3) ifTrue: [aCollection do: [:each | super addLast: each]. self reSort] ifFalse: [aCollection do: [:each | self add: each]]. ^ aCollection This code is striking a balance between number of comparisons (ifFalse: case better) and number of moves (ifTrue: case usually better, but sort M then merge N+M would be better still, and for all but small N would do better than the ifTrue: code on comparisons as well).
The odd thing about it is that the cross-over point is not at all well chosen. If we let M = N/4, then the cost of this method is O(N**2) even though O(NlgN) is achievable.
Now I see a problem: the implementor (sma) of Dictionary>>keysSortedSafely possibly has seen, that in his case each element would be added one by one, instead of adding all unsorted followed by a reSort. And then he has tried to optimize it... How to deal with this case? Use a lazy implementation with an efficient order-exploiting sort and get #reSort out of the public interface. I see, I've opened up a can of worms... But my conviction stays, that my suggestion is - evolutionary - better than the current status. And an evolutionary change in the right direction - though admittedly not ideal - is better than no change.
The "evolutionary" change here would be to allow #addLast: but make it check that the class invariant would not be broken.
Everything can be made to go smoothly if we have *stronger* encapsulation than now (no addFirst:, no addLast:, and definitely no reSort) and a better (but still quite simple) implementation inside.
squeak-dev@lists.squeakfoundation.org