[FIX] SortedCollectionFix-sr

Andreas Raab Andreas.Raab at gmx.de
Fri Sep 27 13:31:05 UTC 2002


Hi,

> I see, I've opened up a can of worms...

Not really. There are two independent issues here which make the
discussion much more complicated than it has to be. One is whether
#addFirst:/#addLast: should be allowed or not. I won't comment on it
since there are good arguments on either side but I think we all agree
that _if_ implemented they must not break the class invariant.

The other issue is that of optimization. Here I would argue that first
of all, "you" (which means not really you personally but anyone who
thinks that there needs to be an optimization for the cases you are
quoting) have to show me that reverting those "optimizations" really has
an impact. I wouldn't even bother thinking about optimizing these cases
before there is hard evidence for the problem.

So, for example, it would be trivial to rewrite

Dictionary>>keysSortedSafely
	^SortedCollection 
		withAll: self keys 
		sortBlock:[...]

SortedCollection class>>withAll: aCollection sortBlock: aBlock 
	"Answer an instance of me such that its elements
	are sorted according to the criterion specified in aBlock."
	^(super new: aCollection size) 
		sortBlock: aBlock; 
		addAll: aCollection

which is much more readable anyways and I would doubt that there's
really any speed impact. Same thing for Celeste - it is possible to
break the invariant in the creation method and whoever wrote that method
might as well not have used a SortedCollection to begin with. After all,
what's the point in using a class with a specific invariant if the first
thing you do is to break it?! Show me that it's slow and I'll show you
how to make it fast ;-)

Cheers,
  - Andreas

> -----Original Message-----
> From: squeak-dev-admin at lists.squeakfoundation.org 
> [mailto:squeak-dev-admin at lists.squeakfoundation.org] On 
> Behalf Of Stephan Rudlof
> Sent: Friday, September 27, 2002 2:15 PM
> To: squeak-dev at lists.squeakfoundation.org
> Subject: Re: [FIX] SortedCollectionFix-sr
> 
> 
> Dear Andreas and others,
> 
> Andreas Raab wrote:
> > Hi Stephan,
> > 
> > While I agree mostly with your reasoning (not entirely 
> though because I
> > think Richard has a number of equally good arguments wrt. 
> to consistency
> > of collection operations) there's one issue I strongly 
> disagree with:
> > 
> > 
> 
> >>So there remains the performance thing: using SortedCollection>>add:
> >>multiple times means, that each object has to be sorted in 
> >>immediately. It is probably more performant to have a private
> >>add method, which just adds an element, while leaving the
> >>SortedCollection in an inconsistent state, and reSort it afterwards,
> >>when all elements have been added. That has been my
> >>idea of introducing the 'bypass' method for clients which 
> >>really know about the internals.
> 
> What I have written here is not entirely correct: it has been an
> interpretation of the *current* usage  and intent of using 
> addLast: for
> SortedCollections. My *idea* was to let it exist further.
> Note: I do *not* use >>addLast: (or >>addLastWithoutSorting 
> in my fix) at
> the client side for calling SortedCollections.
> 
> > 
> > 
> > You don't have to use #add; you can use #addAll: which will 
> choose the
> > right way to add the elements dynamically. Your proposal is actually
> > _really_ bad for optimization since it assumes "outside 
> knowledge" about
> > the way SortedCollection is implemented and renders any attempt to
> > really optimize _it_ (instead of lots of clients guessing what a
> > reasonably good optimization would be) pretty much useless. 
> Spreading
> > out "meager client side optimizations" while breaking the 
> polymorphic
> > protocol is never a good idea - if you want to optimize 
> then choose a
> > central, high-level, intent revealing place (such as, in fact,
> > SortedCollection>>addAll:) and don't ever have the client
> > second-guessing the structure and behavior of the object in 
> question.
> <...>
> 
> I agree.
> 
> Now I know better, why I have had some stomache going this way.
> 
> And why I wrote:
> > Another - cleaner - variant avoiding these internal methods:
> > collecting the
> > to be added elements in another - not sorted! - Collection 
> and calling
> > SortedCollection>>addAll: for them.
> 
> But the world is not perfect: there *are* clients trying to 
> optimize from
> outside (not written by me). I just wanted to fix the problem without
> breaking them and don't wanted to think deeper about them.
> But now I think I have to...
> 
> 
> Which are they?
> 
> The first one is IndexFile>>readFrom: aStream messageFile: 
> messageFile, its
> comment says:
>   "Initialize myself from the given text stream. It is 
> assumed that the
>   .index file was written in order of ascending message timestamps,
>   although this method is only less efficient, not incorrect, 
> if this is not
>   the case."
> 
> Its code uses the assumption, that if starting with an empty
> SortedCollection >>addLast: is called repeatedly for a 
> sequence of already
> sorted elements, this would result into a correct 
> SortedCollection without
> calling >>reSort!
> 
> It uses knowledge of the internal representation of 
> SortedCollection and
> omits the class invariant ensuring >>reSort in some cases: dangerous!
> 
> This should be fixed *independently* of my fix suggestions. 
> (Note: I don't
> use Celeste: this may be a reason for another person to fix it.)
> 
> But since performance may be *very* important in this client case:
> What about introducing a very special
>   SortedCollection>>addAllProbablyPreSorted:
> (or >>addAllCheckForPreSortedFirst:) to give it a hint?
> 
> Otherwise I hear other people crying...
> 
> 
> The other one is
> 
> Dictionary>>keysSortedSafely
> 	"Answer a SortedCollection containing the receiver's keys."
> 	| sortedKeys |
> 	sortedKeys _ SortedCollection new: self size.
> 	sortedKeys sortBlock:
> 		[:x :y |  "Should really be use <obj, string, 
> num> compareSafely..."
> 		((x isString and: [y isString])
> 			or: [x isNumber and: [y isNumber]])
> 			ifTrue: [x < y]
> 			ifFalse: [x class == y class
> 				ifTrue: [x printString < y printString]
> 				ifFalse: [x class name < y 
> class name]]].
> 	self keysDo: [:each | sortedKeys addLast: each].
> 	^ sortedKeys reSort
> 
> This seems to be safe, since >>reSort is called in every 
> case, but using
> SortedCollection>>addAll: seems to be better here.
> 
> But let's take a look at
> SortedCollection>>
> addAll: aCollection
> 	aCollection size > (self size // 3)
> 		ifTrue:
> 			[aCollection do: [:each | super addLast: each].
> 			self reSort]
> 		ifFalse: [aCollection do: [:each | self add: each]].
> 	^ aCollection
> 
> Now I see a problem: the implementor (sma) of 
> Dictionary>>keysSortedSafely
> possibly has seen, that in his case each element would be 
> added one by one,
> instead of adding all unsorted followed by a reSort. And then 
> he has tried
> to optimize it...
> 
> How to deal with this case?
> 
> 
> I see, I've opened up a can of worms...
> But my conviction stays, that my suggestion is - evolutionary 
> - better than
> the current status. And an evolutionary change in the right 
> direction -
> though admittedly not ideal - is better than no change.
> 
> 
> Greetings,
> 
> Stephan
> 
> 




More information about the Squeak-dev mailing list