[FIX] SortedCollectionFix-sr

Stephan Rudlof sr at evolgo.de
Fri Sep 27 12:14:50 UTC 2002


Dear Andreas and others,

Andreas Raab wrote:
> Hi Stephan,
> 
> While I agree mostly with your reasoning (not entirely though because I
> think Richard has a number of equally good arguments wrt. to consistency
> of collection operations) there's one issue I strongly disagree with:
> 
> 

>>So there remains the performance thing: using SortedCollection>>add:
>>multiple times means, that each object has to be sorted in 
>>immediately. It is probably more performant to have a private
>>add method, which just adds an element, while leaving the
>>SortedCollection in an inconsistent state, and reSort it afterwards,
>>when all elements have been added. That has been my
>>idea of introducing the 'bypass' method for clients which 
>>really know about the internals.

What I have written here is not entirely correct: it has been an
interpretation of the *current* usage  and intent of using addLast: for
SortedCollections. My *idea* was to let it exist further.
Note: I do *not* use >>addLast: (or >>addLastWithoutSorting in my fix) at
the client side for calling SortedCollections.

> 
> 
> You don't have to use #add; you can use #addAll: which will choose the
> right way to add the elements dynamically. Your proposal is actually
> _really_ bad for optimization since it assumes "outside knowledge" about
> the way SortedCollection is implemented and renders any attempt to
> really optimize _it_ (instead of lots of clients guessing what a
> reasonably good optimization would be) pretty much useless. Spreading
> out "meager client side optimizations" while breaking the polymorphic
> protocol is never a good idea - if you want to optimize then choose a
> central, high-level, intent revealing place (such as, in fact,
> SortedCollection>>addAll:) and don't ever have the client
> second-guessing the structure and behavior of the object in question.
<...>

I agree.

Now I know better, why I have had some stomache going this way.

And why I wrote:
> Another - cleaner - variant avoiding these internal methods:
> collecting the
> to be added elements in another - not sorted! - Collection and calling
> SortedCollection>>addAll: for them.

But the world is not perfect: there *are* clients trying to optimize from
outside (not written by me). I just wanted to fix the problem without
breaking them and don't wanted to think deeper about them.
But now I think I have to...


Which are they?

The first one is IndexFile>>readFrom: aStream messageFile: messageFile, its
comment says:
  "Initialize myself from the given text stream. It is assumed that the
  .index file was written in order of ascending message timestamps,
  although this method is only less efficient, not incorrect, if this is not
  the case."

Its code uses the assumption, that if starting with an empty
SortedCollection >>addLast: is called repeatedly for a sequence of already
sorted elements, this would result into a correct SortedCollection without
calling >>reSort!

It uses knowledge of the internal representation of SortedCollection and
omits the class invariant ensuring >>reSort in some cases: dangerous!

This should be fixed *independently* of my fix suggestions. (Note: I don't
use Celeste: this may be a reason for another person to fix it.)

But since performance may be *very* important in this client case:
What about introducing a very special
  SortedCollection>>addAllProbablyPreSorted:
(or >>addAllCheckForPreSortedFirst:) to give it a hint?

Otherwise I hear other people crying...


The other one is

Dictionary>>keysSortedSafely
	"Answer a SortedCollection containing the receiver's keys."
	| sortedKeys |
	sortedKeys _ SortedCollection new: self size.
	sortedKeys sortBlock:
		[:x :y |  "Should really be use <obj, string, num> compareSafely..."
		((x isString and: [y isString])
			or: [x isNumber and: [y isNumber]])
			ifTrue: [x < y]
			ifFalse: [x class == y class
				ifTrue: [x printString < y printString]
				ifFalse: [x class name < y class name]]].
	self keysDo: [:each | sortedKeys addLast: each].
	^ sortedKeys reSort

This seems to be safe, since >>reSort is called in every case, but using
SortedCollection>>addAll: seems to be better here.

But let's take a look at
SortedCollection>>
addAll: aCollection
	aCollection size > (self size // 3)
		ifTrue:
			[aCollection do: [:each | super addLast: each].
			self reSort]
		ifFalse: [aCollection do: [:each | self add: each]].
	^ aCollection

Now I see a problem: the implementor (sma) of Dictionary>>keysSortedSafely
possibly has seen, that in his case each element would be added one by one,
instead of adding all unsorted followed by a reSort. And then he has tried
to optimize it...

How to deal with this case?


I see, I've opened up a can of worms...
But my conviction stays, that my suggestion is - evolutionary - better than
the current status. And an evolutionary change in the right direction -
though admittedly not ideal - is better than no change.


Greetings,

Stephan




More information about the Squeak-dev mailing list