Sorting oddity - feature or ??

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu May 2 01:03:50 UTC 2002


Bob Arning  <arning at charm.net> wrote:
	Is the following a known and accepted feature of our sorting algorithm?
	
	[((1 to: 10000) collect: [ :x | x]) asSortedCollection]
	timeToRun ==> 98
	
	[((1 to: 10000) collect: [ :x | 3]) asSortedCollection]
	timeToRun ==> 20826
	
Array>>asSortedCollection results in
    a new empty SortedCollection is made
    (that) addAll: (the Array) is done
This just dumps all the new elements in then sorts everything.
The method used is one I'm not familiar with, but it's a variant
of quicksort.  Quicksort variants that don't use fat pivots are known
to perform very badly when there are lots of equal elements.

Quicksort was invented for a machine whose main storage capacity was
256 words (each 40 bits).  Not 256 M, not 256 k, just two hundred and
fifty six words of data.  The inventor, Hoare, KNEW it was inferior,
in the abstract, to a well-written merge sort, which to sort N items
needs N/2 item workspace.  He also KNEW that he didn't _have_ any spare
memory, so quicksort was much faster than a method he couldn't use.

Squeak doesn't run on those machines.  Squeak can afford the memory to
use merge sort.  A sorting method with a guaranteed worst case that's
better than the _average_ case of most quicksorts is not to be sniffed at.

(Yes, I have read the papers and books which tell you that quicksort is
blindingly fast.  But Sedgewick's Acta Informatica paper assumed that
the cost of #< was one quarter of the cost of a memory reference.  That's
not true in Squeak, and seldom true in most sorting applications.)

Alternatively, if it's thought that Squeak _must_ use O(lg N) space sorting,
perhaps the algorithm from Bentley & McIlroy "Engineering a Quick Sort",
Software Practice and Experience about 8 years ago, could be adapted.



More information about the Squeak-dev mailing list