[FIX] SortedCollectionFix-sr

Stephan Rudlof sr at evolgo.de
Fri Sep 27 14:48:51 UTC 2002


Andreas,

I have made some tests (>>keysSortedSafelyAR is according your suggestion):

| t1 t2 dicts |
Smalltalk garbageCollect.
dicts _ Dictionary allInstances.
Smalltalk garbageCollect.
t1 _ [10 timesRepeat: [dicts do: [:dict | dict keysSortedSafelyAR]]] timeToRun.
Smalltalk garbageCollect.
t2 _ [10 timesRepeat: [dicts do: [:dict | dict keysSortedSafely]]] timeToRun.
{t1.t2}
 #(25972 22073) #(25942 22996) #(25575 22139)

| t1 t2 bigDict |
Smalltalk garbageCollect.
bigDict _ (Dictionary allInstances asSortedCollection: [:d1 :d2 | d1 size >
d2 size]) first.
Smalltalk garbageCollect.
t1 _ [10 timesRepeat: [bigDict keysSortedSafelyAR]] timeToRun.
Smalltalk garbageCollect.
t2 _ [10 timesRepeat: [bigDict keysSortedSafely]] timeToRun.
{t1.t2. bigDict size}
 #(1716 1297 952) #(1718 1297 952) #(1742 1298 952)


One change was necessary:

Andreas Raab wrote:
> Hi,
> 
> 
>>I see, I've opened up a can of worms...
> 
> 
> Not really. There are two independent issues here which make the
> discussion much more complicated than it has to be. One is whether
> #addFirst:/#addLast: should be allowed or not. I won't comment on it
> since there are good arguments on either side but I think we all agree
> that _if_ implemented they must not break the class invariant.
> 
> The other issue is that of optimization. Here I would argue that first
> of all, "you" (which means not really you personally but anyone who
> thinks that there needs to be an optimization for the cases you are
> quoting) have to show me that reverting those "optimizations" really has
> an impact. I wouldn't even bother thinking about optimizing these cases
> before there is hard evidence for the problem.
> 
> So, for example, it would be trivial to rewrite
> 
> Dictionary>>keysSortedSafely
> 	^SortedCollection 
> 		withAll: self keys 
> 		sortBlock:[...]
> 
> SortedCollection class>>withAll: aCollection sortBlock: aBlock 
> 	"Answer an instance of me such that its elements
> 	are sorted according to the criterion specified in aBlock."
> 	^(super new: aCollection size) 
> 		sortBlock: aBlock; 
> 		addAll: aCollection

; yourself


Greetings,

Stephan

> 
> which is much more readable anyways and I would doubt that there's
> really any speed impact. Same thing for Celeste - it is possible to
> break the invariant in the creation method and whoever wrote that method
> might as well not have used a SortedCollection to begin with. After all,
> what's the point in using a class with a specific invariant if the first
> thing you do is to break it?! Show me that it's slow and I'll show you
> how to make it fast ;-)
> 
> Cheers,
>   - Andreas
> 
> 
>>-----Original Message-----
>>From: squeak-dev-admin at lists.squeakfoundation.org 
>>[mailto:squeak-dev-admin at lists.squeakfoundation.org] On 
>>Behalf Of Stephan Rudlof
>>Sent: Friday, September 27, 2002 2:15 PM
>>To: squeak-dev at lists.squeakfoundation.org
>>Subject: Re: [FIX] SortedCollectionFix-sr
>>
>>
>>Dear Andreas and others,
>>
>>Andreas Raab wrote:
>>
>>>Hi Stephan,
>>>
>>>While I agree mostly with your reasoning (not entirely 
>>
>>though because I
>>
>>>think Richard has a number of equally good arguments wrt. 
>>
>>to consistency
>>
>>>of collection operations) there's one issue I strongly 
>>
>>disagree with:
>>
>>>
>>>>So there remains the performance thing: using SortedCollection>>add:
>>>>multiple times means, that each object has to be sorted in 
>>>>immediately. It is probably more performant to have a private
>>>>add method, which just adds an element, while leaving the
>>>>SortedCollection in an inconsistent state, and reSort it afterwards,
>>>>when all elements have been added. That has been my
>>>>idea of introducing the 'bypass' method for clients which 
>>>>really know about the internals.
>>>
>>What I have written here is not entirely correct: it has been an
>>interpretation of the *current* usage  and intent of using 
>>addLast: for
>>SortedCollections. My *idea* was to let it exist further.
>>Note: I do *not* use >>addLast: (or >>addLastWithoutSorting 
>>in my fix) at
>>the client side for calling SortedCollections.
>>
>>
>>>
>>>You don't have to use #add; you can use #addAll: which will 
>>
>>choose the
>>
>>>right way to add the elements dynamically. Your proposal is actually
>>>_really_ bad for optimization since it assumes "outside 
>>
>>knowledge" about
>>
>>>the way SortedCollection is implemented and renders any attempt to
>>>really optimize _it_ (instead of lots of clients guessing what a
>>>reasonably good optimization would be) pretty much useless. 
>>
>>Spreading
>>
>>>out "meager client side optimizations" while breaking the 
>>
>>polymorphic
>>
>>>protocol is never a good idea - if you want to optimize 
>>
>>then choose a
>>
>>>central, high-level, intent revealing place (such as, in fact,
>>>SortedCollection>>addAll:) and don't ever have the client
>>>second-guessing the structure and behavior of the object in 
>>
>>question.
>><...>
>>
>>I agree.
>>
>>Now I know better, why I have had some stomache going this way.
>>
>>And why I wrote:
>>
>>>Another - cleaner - variant avoiding these internal methods:
>>>collecting the
>>>to be added elements in another - not sorted! - Collection 
>>
>>and calling
>>
>>>SortedCollection>>addAll: for them.
>>
>>But the world is not perfect: there *are* clients trying to 
>>optimize from
>>outside (not written by me). I just wanted to fix the problem without
>>breaking them and don't wanted to think deeper about them.
>>But now I think I have to...
>>
>>
>>Which are they?
>>
>>The first one is IndexFile>>readFrom: aStream messageFile: 
>>messageFile, its
>>comment says:
>>  "Initialize myself from the given text stream. It is 
>>assumed that the
>>  .index file was written in order of ascending message timestamps,
>>  although this method is only less efficient, not incorrect, 
>>if this is not
>>  the case."
>>
>>Its code uses the assumption, that if starting with an empty
>>SortedCollection >>addLast: is called repeatedly for a 
>>sequence of already
>>sorted elements, this would result into a correct 
>>SortedCollection without
>>calling >>reSort!
>>
>>It uses knowledge of the internal representation of 
>>SortedCollection and
>>omits the class invariant ensuring >>reSort in some cases: dangerous!
>>
>>This should be fixed *independently* of my fix suggestions. 
>>(Note: I don't
>>use Celeste: this may be a reason for another person to fix it.)
>>
>>But since performance may be *very* important in this client case:
>>What about introducing a very special
>>  SortedCollection>>addAllProbablyPreSorted:
>>(or >>addAllCheckForPreSortedFirst:) to give it a hint?
>>
>>Otherwise I hear other people crying...
>>
>>
>>The other one is
>>
>>Dictionary>>keysSortedSafely
>>	"Answer a SortedCollection containing the receiver's keys."
>>	| sortedKeys |
>>	sortedKeys _ SortedCollection new: self size.
>>	sortedKeys sortBlock:
>>		[:x :y |  "Should really be use <obj, string, 
>>num> compareSafely..."
>>		((x isString and: [y isString])
>>			or: [x isNumber and: [y isNumber]])
>>			ifTrue: [x < y]
>>			ifFalse: [x class == y class
>>				ifTrue: [x printString < y printString]
>>				ifFalse: [x class name < y 
>>class name]]].
>>	self keysDo: [:each | sortedKeys addLast: each].
>>	^ sortedKeys reSort
>>
>>This seems to be safe, since >>reSort is called in every 
>>case, but using
>>SortedCollection>>addAll: seems to be better here.
>>
>>But let's take a look at
>>SortedCollection>>
>>addAll: aCollection
>>	aCollection size > (self size // 3)
>>		ifTrue:
>>			[aCollection do: [:each | super addLast: each].
>>			self reSort]
>>		ifFalse: [aCollection do: [:each | self add: each]].
>>	^ aCollection
>>
>>Now I see a problem: the implementor (sma) of 
>>Dictionary>>keysSortedSafely
>>possibly has seen, that in his case each element would be 
>>added one by one,
>>instead of adding all unsorted followed by a reSort. And then 
>>he has tried
>>to optimize it...
>>
>>How to deal with this case?
>>
>>
>>I see, I've opened up a can of worms...
>>But my conviction stays, that my suggestion is - evolutionary 
>>- better than
>>the current status. And an evolutionary change in the right 
>>direction -
>>though admittedly not ideal - is better than no change.
>>
>>
>>Greetings,
>>
>>Stephan
>>
>>
> 
> 
> 
> 


-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3




More information about the Squeak-dev mailing list