Re: [FIX] SortedCollectionFix-sr - Squeak-dev

30 Sep 2002

      I should make it clear that I am in complete and total agreement
with Stephan Rudlof sr@evolgo.de that clients of SortedCollection
should not be using an *UNCHECKED* #addLast:.
He found some code that was using #addLast:
    The first one is IndexFile>>readFrom: aStream messageFile: messageFile, its
    comment says:
      "Initialize myself from the given text stream. It is assumed that the
      .index file was written in order of ascending message timestamps,
      although this method is only less efficient, not incorrect, if this is not
      the case."

A lazy implementation of SortedCollection with a sorting method that
exploits existing order (and there are lots of those) would provide
linear time building *without* any special bypass.
Let me make this clear:  there are several known sorting methods, including
some that are quite efficient, which have the pleasant property that if you
ask them to sort N items that are already in sorted order, they only do
O(N) comparisons; N-1 in some cases.  This should be an affordable overhead.

    Dictionary>>keysSortedSafely
    	"Answer a SortedCollection containing the receiver's keys."
    	| sortedKeys |
    	sortedKeys _ SortedCollection new: self size.
    	sortedKeys sortBlock:
    		[:x :y |  "Should really be use <obj, string, num> compareSafely..."
    		((x isString and: [y isString])
    			or: [x isNumber and: [y isNumber]])
    			ifTrue: [x < y]
    			ifFalse: [x class == y class
    				ifTrue: [x printString < y printString]
    				ifFalse: [x class name < y class name]]].
    	self keysDo: [:each | sortedKeys addLast: each].
    	^ sortedKeys reSort

This would seem to be a case for #addAll:.
SortedCollection>>
    addAll: aCollection
    	aCollection size > (self size // 3)
    		ifTrue:
    			[aCollection do: [:each | super addLast: each].
    			self reSort]
    		ifFalse: [aCollection do: [:each | self add: each]].
    	^ aCollection

This code is striking a balance between number of comparisons
(ifFalse: case better) and number of moves (ifTrue: case usually better,
but sort M then merge N+M would be better still, and for all but small N
would do better than the ifTrue: code on comparisons as well).
The odd thing about it is that the cross-over point is not at all well
chosen.  If we let M = N/4, then the cost of this method is O(N**2)
even though O(NlgN) is achievable.
Now I see a problem: the implementor (sma) of Dictionary>>keysSortedSafely
    possibly has seen, that in his case each element would be added one by one,
    instead of adding all unsorted followed by a reSort. And then he has tried
    to optimize it...

    How to deal with this case?

Use a lazy implementation with an efficient order-exploiting sort
and get #reSort out of the public interface.

    I see, I've opened up a can of worms...
    But my conviction stays, that my suggestion is - evolutionary - better than
    the current status. And an evolutionary change in the right direction -
    though admittedly not ideal - is better than no change.
The "evolutionary" change here would be to allow #addLast: but make it
check that the class invariant would not be broken.
Everything can be made to go smoothly if we have *stronger* encapsulation
than now (no addFirst:, no addLast:, and definitely no reSort) and a better
(but still quite simple) implementation inside.