Sorting help!

Richard A. O'Keefe ok at atlas.otago.ac.nz
Fri Nov 16 01:04:37 UTC 2001


<input> ::= <line>*
<line> ::= <title> ',' <name> ',' <reference> ',' <number> <line end>
<title>, <name>, <reference>, <number> ::=
    "arbitrary text not containing comma or line end".

Input to be sorted in ascending order of <title> (most significant)
then <name> then <reference> then <number>.

Let's be really sneaky about this.  If we can assume that the character
(Character value: 0) does not occur in the lines, then if we do
    nul := Character value: 0.
    aString replaceAll: $, with: nul.
as we read each line, then just stick the strings into an OrderedCollection,
then use #asSortedCollection, then as we write out each line do
    aString replaceAll: nul with: $,.
we'll get the right answer.

(If this were C, we'd want to use '\1' rather than '\0', but Smalltalk
has no problems with Character value: 0.)

Since this would use the primitive for string comparison, it's
probably about as fast as we can get.

If the information is in a file, it might be better to use an external
sorting program such as sort(1), but Squeak should be fine for quite a
few megabytes of data these days.

Does anyone know why  #sortBy: is in SequenceableCollection but
#sort and #sort: are in ArrayedCollection?  Is there any reason
why we couldn't have

    sort: aSortBlock
	|sorted|
	sorted := sort sortBy: aSortBlock.
	1 to: self size do: [:i | self at: i put: (sorted at: i)].
	^self.

    sort
	^self sort: [:x :y| x <= y]

(or, more precisely, code with the same effect) in SequenceableCollection?

I had expected that #sortBy: would act in place; tell me that there is a
method that returns a sorted copy and I'd expect it to be #sortedBy:.





More information about the Squeak-dev mailing list