[FIX] SortedCollectionFix-sr

Richard A. O'Keefe squeak-dev at lists.squeakfoundation.org
Mon Sep 30 02:52:03 UTC 2002


I should make it clear that I am in complete and total agreement
with Stephan Rudlof <sr at evolgo.de> that clients of SortedCollection
should not be using an *UNCHECKED* #addLast:.

He found some code that was using #addLast:
	The first one is IndexFile>>readFrom: aStream messageFile: messageFile, its
	comment says:
	  "Initialize myself from the given text stream. It is assumed that the
	  .index file was written in order of ascending message timestamps,
	  although this method is only less efficient, not incorrect, if this is not
	  the case."
	
A lazy implementation of SortedCollection with a sorting method that
exploits existing order (and there are lots of those) would provide
linear time building *without* any special bypass.

Let me make this clear:  there are several known sorting methods, including
some that are quite efficient, which have the pleasant property that if you
ask them to sort N items that are already in sorted order, they only do
O(N) comparisons; N-1 in some cases.  This should be an affordable overhead.
	
	Dictionary>>keysSortedSafely
		"Answer a SortedCollection containing the receiver's keys."
		| sortedKeys |
		sortedKeys _ SortedCollection new: self size.
		sortedKeys sortBlock:
			[:x :y |  "Should really be use <obj, string, num> compareSafely..."
			((x isString and: [y isString])
				or: [x isNumber and: [y isNumber]])
				ifTrue: [x < y]
				ifFalse: [x class == y class
					ifTrue: [x printString < y printString]
					ifFalse: [x class name < y class name]]].
		self keysDo: [:each | sortedKeys addLast: each].
		^ sortedKeys reSort
	
This would seem to be a case for #addAll:. 

	SortedCollection>>
	addAll: aCollection
		aCollection size > (self size // 3)
			ifTrue:
				[aCollection do: [:each | super addLast: each].
				self reSort]
			ifFalse: [aCollection do: [:each | self add: each]].
		^ aCollection
	
This code is striking a balance between number of comparisons
(ifFalse: case better) and number of moves (ifTrue: case usually better,
but sort M then merge N+M would be better still, and for all but small N
would do better than the ifTrue: code on comparisons as well).

The odd thing about it is that the cross-over point is not at all well
chosen.  If we let M = N/4, then the cost of this method is O(N**2)
even though O(NlgN) is achievable.

	Now I see a problem: the implementor (sma) of Dictionary>>keysSortedSafely
	possibly has seen, that in his case each element would be added one by one,
	instead of adding all unsorted followed by a reSort. And then he has tried
	to optimize it...
	
	How to deal with this case?
	
Use a lazy implementation with an efficient order-exploiting sort
and get #reSort out of the public interface.
	
	I see, I've opened up a can of worms...
	But my conviction stays, that my suggestion is - evolutionary - better than
	the current status. And an evolutionary change in the right direction -
	though admittedly not ideal - is better than no change.

The "evolutionary" change here would be to allow #addLast: but make it
check that the class invariant would not be broken.

Everything can be made to go smoothly if we have *stronger* encapsulation
than now (no addFirst:, no addLast:, and definitely no reSort) and a better
(but still quite simple) implementation inside.



More information about the Squeak-dev mailing list