[FIX] SortedCollectionFix-sr

Stephan Rudlof squeak-dev at lists.squeakfoundation.org
Fri Sep 27 13:51:41 UTC 2002


Richard,

your proposal implementing SortedCollection with 'lazy data structures' is
very interesting. It removes the burden of all thinking about adding to a
SortedCollection performantly ( ;-) ).

Any volunteer out there to give it a try?

One question is:
  How is the relation between changes of the SortedCollection and looking
>>at: it, since some time is spend for the checks there. If accessing it
outweights the changes by magnitudes there could be a *small* performance
penalty.

But I guess 'normally' the proposal gives better performance and it is
always possible to convert it into a plain old OrderedCollection for
accessing it very often, if someone worries about this.


Greetings,

Stephan


Richard A. O'Keefe wrote:
<...>

> I think it's time to talk about lazy data structures.
> 
> We could add elements fast to SortedCollections like this.
> 
> 1.  The internal representation is an array partitioned thus:
> 	+----+------------+--------+----+
> 	|1   |a          z|       x|   n|
> 	+----+------------+--------+----+
> 
>     elements 1..a-1 : all nil
>     elements a..z   : sorted according to sortBlock
>     elements z+1..x : added elements not sorted yet
>     elements x+1..n : all nil
> 
>     Where I have written "a" and "x" you should understand the existing
>     instance variables of OrderedCollection (inherited by SortedCollection)
>     being used instead; the "z" instance variable is new.
> 
> 2.  add: anItem
> 	^super addLast: anItem
>     addFirst: anItem
> 	self shouldNotImplement
>     addLast: anItem
> 	self shouldNotImplement
>     addAll: aCollection
> 	^super addAllLast: aCollection
>     addAllFirst: aCollection
> 	self shouldNotImplement
>     addAllLast: aCollection
>         self shouldNotImplement
> 
>     No, I'm not being inconsistent here.  I'm saying "let someone build
>     a sorted collection in linear time if they can provide the elements
>     in sorted order."  With this kind of implementation, #add: _is_
>     adequate for the job.
> 
> 3.  A strictly private method would exist:
> 
>     pvtEnsureAllSorted		   "rather like current addAll:"
> 	x = z ifTrue: [^array].
> 	self pvtSortAddedElements. "sorts z+1..x"
> 	self pvtMerge.             "merges a..z with z+1..x"
> 	z := x.
> 	^array
> 
>     Ideally pvtSortAddedElements would use a method which is linear when
>     the elements are sorted already; Dijkstra's smooth-sort is
>     just one of many algorithms which are good at that, bottom-up
>     natural merge is easier to understand and works very well in practice.
> 
>     Using bottom-up natural merge, if you add M elements to a Sorted
>     Collection having N elements and then look at the result,
>     the adds would cost O(M) time,
>     pvtEnsureAllSorted would cost O(M) for pvtSortAddedElements
>     and O(M+N) for pvtMerge, for a total O(N+M).
>     Building an entire collection from N=0 would thus be O(M).
> 
> 4.  Other methods would call pvtEnsureAllSorted, e.g.
> 
>     at: index
> 	self assert: [index between: 1 and: x-a+1].
> 	^self pvtEnsureAllSorted at: index-a+1
> 
>     do: aBlock
> 	self pvtEnsureAllSorted.
> 	^super do: aBlock
> 
> 5.  Setting the sortblock would simply enlarge the unsorted area:
> 
>     sortBlock: aBlock
>         sortBlock := aBlock ifNil: [nil] ifNotNil: [aBlock fixTemps].
>         z := a-1.
> 
> 
> This is the clean way to do it.  The class invariant is weakened from
> "the elements are always sorted" to "the elements are always sorted
> whenever anyone else looks; when noone else is looking they are in a
> convenient configuration for sorting."  From the outside, it looks as
> though the elements are always sorted.
<...>
-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3




More information about the Squeak-dev mailing list