[FIX] SortedCollectionFix-sr
Stephan Rudlof
sr at evolgo.de
Fri Sep 27 14:48:51 UTC 2002
Andreas,
I have made some tests (>>keysSortedSafelyAR is according your suggestion):
| t1 t2 dicts |
Smalltalk garbageCollect.
dicts _ Dictionary allInstances.
Smalltalk garbageCollect.
t1 _ [10 timesRepeat: [dicts do: [:dict | dict keysSortedSafelyAR]]] timeToRun.
Smalltalk garbageCollect.
t2 _ [10 timesRepeat: [dicts do: [:dict | dict keysSortedSafely]]] timeToRun.
{t1.t2}
#(25972 22073) #(25942 22996) #(25575 22139)
| t1 t2 bigDict |
Smalltalk garbageCollect.
bigDict _ (Dictionary allInstances asSortedCollection: [:d1 :d2 | d1 size >
d2 size]) first.
Smalltalk garbageCollect.
t1 _ [10 timesRepeat: [bigDict keysSortedSafelyAR]] timeToRun.
Smalltalk garbageCollect.
t2 _ [10 timesRepeat: [bigDict keysSortedSafely]] timeToRun.
{t1.t2. bigDict size}
#(1716 1297 952) #(1718 1297 952) #(1742 1298 952)
One change was necessary:
Andreas Raab wrote:
> Hi,
>
>
>>I see, I've opened up a can of worms...
>
>
> Not really. There are two independent issues here which make the
> discussion much more complicated than it has to be. One is whether
> #addFirst:/#addLast: should be allowed or not. I won't comment on it
> since there are good arguments on either side but I think we all agree
> that _if_ implemented they must not break the class invariant.
>
> The other issue is that of optimization. Here I would argue that first
> of all, "you" (which means not really you personally but anyone who
> thinks that there needs to be an optimization for the cases you are
> quoting) have to show me that reverting those "optimizations" really has
> an impact. I wouldn't even bother thinking about optimizing these cases
> before there is hard evidence for the problem.
>
> So, for example, it would be trivial to rewrite
>
> Dictionary>>keysSortedSafely
> ^SortedCollection
> withAll: self keys
> sortBlock:[...]
>
> SortedCollection class>>withAll: aCollection sortBlock: aBlock
> "Answer an instance of me such that its elements
> are sorted according to the criterion specified in aBlock."
> ^(super new: aCollection size)
> sortBlock: aBlock;
> addAll: aCollection
; yourself
Greetings,
Stephan
>
> which is much more readable anyways and I would doubt that there's
> really any speed impact. Same thing for Celeste - it is possible to
> break the invariant in the creation method and whoever wrote that method
> might as well not have used a SortedCollection to begin with. After all,
> what's the point in using a class with a specific invariant if the first
> thing you do is to break it?! Show me that it's slow and I'll show you
> how to make it fast ;-)
>
> Cheers,
> - Andreas
>
>
>>-----Original Message-----
>>From: squeak-dev-admin at lists.squeakfoundation.org
>>[mailto:squeak-dev-admin at lists.squeakfoundation.org] On
>>Behalf Of Stephan Rudlof
>>Sent: Friday, September 27, 2002 2:15 PM
>>To: squeak-dev at lists.squeakfoundation.org
>>Subject: Re: [FIX] SortedCollectionFix-sr
>>
>>
>>Dear Andreas and others,
>>
>>Andreas Raab wrote:
>>
>>>Hi Stephan,
>>>
>>>While I agree mostly with your reasoning (not entirely
>>
>>though because I
>>
>>>think Richard has a number of equally good arguments wrt.
>>
>>to consistency
>>
>>>of collection operations) there's one issue I strongly
>>
>>disagree with:
>>
>>>
>>>>So there remains the performance thing: using SortedCollection>>add:
>>>>multiple times means, that each object has to be sorted in
>>>>immediately. It is probably more performant to have a private
>>>>add method, which just adds an element, while leaving the
>>>>SortedCollection in an inconsistent state, and reSort it afterwards,
>>>>when all elements have been added. That has been my
>>>>idea of introducing the 'bypass' method for clients which
>>>>really know about the internals.
>>>
>>What I have written here is not entirely correct: it has been an
>>interpretation of the *current* usage and intent of using
>>addLast: for
>>SortedCollections. My *idea* was to let it exist further.
>>Note: I do *not* use >>addLast: (or >>addLastWithoutSorting
>>in my fix) at
>>the client side for calling SortedCollections.
>>
>>
>>>
>>>You don't have to use #add; you can use #addAll: which will
>>
>>choose the
>>
>>>right way to add the elements dynamically. Your proposal is actually
>>>_really_ bad for optimization since it assumes "outside
>>
>>knowledge" about
>>
>>>the way SortedCollection is implemented and renders any attempt to
>>>really optimize _it_ (instead of lots of clients guessing what a
>>>reasonably good optimization would be) pretty much useless.
>>
>>Spreading
>>
>>>out "meager client side optimizations" while breaking the
>>
>>polymorphic
>>
>>>protocol is never a good idea - if you want to optimize
>>
>>then choose a
>>
>>>central, high-level, intent revealing place (such as, in fact,
>>>SortedCollection>>addAll:) and don't ever have the client
>>>second-guessing the structure and behavior of the object in
>>
>>question.
>><...>
>>
>>I agree.
>>
>>Now I know better, why I have had some stomache going this way.
>>
>>And why I wrote:
>>
>>>Another - cleaner - variant avoiding these internal methods:
>>>collecting the
>>>to be added elements in another - not sorted! - Collection
>>
>>and calling
>>
>>>SortedCollection>>addAll: for them.
>>
>>But the world is not perfect: there *are* clients trying to
>>optimize from
>>outside (not written by me). I just wanted to fix the problem without
>>breaking them and don't wanted to think deeper about them.
>>But now I think I have to...
>>
>>
>>Which are they?
>>
>>The first one is IndexFile>>readFrom: aStream messageFile:
>>messageFile, its
>>comment says:
>> "Initialize myself from the given text stream. It is
>>assumed that the
>> .index file was written in order of ascending message timestamps,
>> although this method is only less efficient, not incorrect,
>>if this is not
>> the case."
>>
>>Its code uses the assumption, that if starting with an empty
>>SortedCollection >>addLast: is called repeatedly for a
>>sequence of already
>>sorted elements, this would result into a correct
>>SortedCollection without
>>calling >>reSort!
>>
>>It uses knowledge of the internal representation of
>>SortedCollection and
>>omits the class invariant ensuring >>reSort in some cases: dangerous!
>>
>>This should be fixed *independently* of my fix suggestions.
>>(Note: I don't
>>use Celeste: this may be a reason for another person to fix it.)
>>
>>But since performance may be *very* important in this client case:
>>What about introducing a very special
>> SortedCollection>>addAllProbablyPreSorted:
>>(or >>addAllCheckForPreSortedFirst:) to give it a hint?
>>
>>Otherwise I hear other people crying...
>>
>>
>>The other one is
>>
>>Dictionary>>keysSortedSafely
>> "Answer a SortedCollection containing the receiver's keys."
>> | sortedKeys |
>> sortedKeys _ SortedCollection new: self size.
>> sortedKeys sortBlock:
>> [:x :y | "Should really be use <obj, string,
>>num> compareSafely..."
>> ((x isString and: [y isString])
>> or: [x isNumber and: [y isNumber]])
>> ifTrue: [x < y]
>> ifFalse: [x class == y class
>> ifTrue: [x printString < y printString]
>> ifFalse: [x class name < y
>>class name]]].
>> self keysDo: [:each | sortedKeys addLast: each].
>> ^ sortedKeys reSort
>>
>>This seems to be safe, since >>reSort is called in every
>>case, but using
>>SortedCollection>>addAll: seems to be better here.
>>
>>But let's take a look at
>>SortedCollection>>
>>addAll: aCollection
>> aCollection size > (self size // 3)
>> ifTrue:
>> [aCollection do: [:each | super addLast: each].
>> self reSort]
>> ifFalse: [aCollection do: [:each | self add: each]].
>> ^ aCollection
>>
>>Now I see a problem: the implementor (sma) of
>>Dictionary>>keysSortedSafely
>>possibly has seen, that in his case each element would be
>>added one by one,
>>instead of adding all unsorted followed by a reSort. And then
>>he has tried
>>to optimize it...
>>
>>How to deal with this case?
>>
>>
>>I see, I've opened up a can of worms...
>>But my conviction stays, that my suggestion is - evolutionary
>>- better than
>>the current status. And an evolutionary change in the right
>>direction -
>>though admittedly not ideal - is better than no change.
>>
>>
>>Greetings,
>>
>>Stephan
>>
>>
>
>
>
>
--
Stephan Rudlof (sr at evolgo.de)
"Genius doesn't work on an assembly line basis.
You can't simply say, 'Today I will be brilliant.'"
-- Kirk, "The Ultimate Computer", stardate 4731.3
More information about the Squeak-dev
mailing list
|