Sorting Algorithm

Richard A. O'Keefe ok at atlas.otago.ac.nz
Mon Nov 19 01:47:27 UTC 2001


"Christopher Roach" <chrisroach at btinternet.com> wrote:
	Thanks Richard for the above code, I understand that you have not 
	tested it yet.
	
I didn't test it because I don't have your data to test it with; you do.
I have now made up some test data, and for *my* test data,

    |s c n r|
    n := Character value: 0.
    s := StandardFileStream oldFileNamed: 'data.raw'.
    c := OrderedCollection new.
    [s atEnd] whileFalse: [
        r := s nextLine.
        r replaceAll: $, with: n.
        c add: r].
    s close.
    c := c asSortedCollection.
    s := StandardFileStream newFileNamed: 'data.ord'.
    c do: [:t|
        t replaceAll: n with: $,.
        s nextPutAll: t; cr].
    s close.

wprks just fine.

	I am just thinking here... will smalltalk do a comparison on each 
	letter at a time?

Who cares how comparison is done?  All that matters is that it gets the
right answer.  If you really want to know, fire up a Browser and look at

    String>> < aString
        "Answer whether the receiver sorts before aString.
         The collection order is simple ASCII (with case difference)."
        ^ (self compare: self with: aString collated: AsciiOrder) = 1

    String>> compare: string1 with: string2 collected: order
        "Return 1, 2, or 3 if string1 is <, =, or > string2,
         with the collating order of characters given by the order array."
       |len1 len2 c1 c2|
       <primitive: 'primitiveCompareString' module: 'MiscPrimitivePlugins'>
       ...Slang code deleted...

If you look at the Slang code, you'll see that the order is a standard
lexicographic one, same as you'd get from C's strcmp() or Java's
String.compare(,).

	so lets say we have:
	aaaa
	compared to:
	aaab  does smalltalk go through first a's and compare it, then move 
	onto second a, compare that, until it gets to the fourth THEN it 
	realizes that they are different and moves on?  
	
The string comparison conventions that I am aware of are
    PL/CV: compare first by length, then (if same length) by content,
           left to right.
    COBOL: compare left to right, padding shorter string on right with ' '.
    Usual: compare left to right, padding shorter string on right with
	   minus infinity.

So the *effect* is that 'aaaa' < 'aaab'; to find out how it's really done
I'd have to check the C code in the VM, and I don't particularly want to do
that.  (From the Slang code I have a fair idea...)

	The code that Bert gave:
	
	order := [:a :b | 
	  a title < b title or: [
	    a title = b title and: [a name < b name or: [
	      a name = b name and: [a ref < b ref or: [
	        a number < b number]]]]]]
	
[There is an obvious error in this, which I leave as an exercise for
 the reader.]

	Does this in effect just get the first two characters of the field 
	and do a comparison?

Which field?  It compares (a title) with (b title) using whatever algorithm
is appropriate.  If (a title) and (b title) are strings, no, it WON'T stop
with the first characters, it will look at the others if necessary.

	the words "title" and "name" and all 
	that in this method - does that make a difference?

How should it?  What matters is what #title and the others DO, not what
they are called.  The suggestion was that _you_ would define a class
with access methods title, name, ref, and number.

	how does smalltalk regard that?

As up to you.

	could it just be:
	
	order := [:a :b | 
	  a  < b  or: [
	    a  = b  and: [a  < b  or: [
	      a  = b  and: [a  < b  or: [
	        a  < b ]]]]]]
	
Something seems to be missing here.  Bert was saying how you could
DEFINE comparison for your records; here you are just calling it.

Look, I've tested the code I posted, and have repeated above sans comments.
It works.

	Can you produce a record in smalltalk like in C++, so it could be 
	something like:
	
	Struct
	{
	char[] name,
	char [] title,
	char [] ref,
	int number
	}
	
	
You really need a good textbook, or at least a good tutorial.
In C++, there is no difference between a struct and a class;
there isn't any difference in Smalltalk either.
	
Note that for THIS task the code I have given above is all you need.
For interest, I enclose a basic implementation of a BibEntry class
whose instances can be compared (so there is no need to provide a
special sort block).

'From Squeak2.8 of 13 June 2000 [latest update: #2359] on 19 November 2001 at 2:39:57 pm'!
Magnitude subclass: #BibEntry
	instanceVariableNames: 'title name ref number '
	classVariableNames: ''
	poolDictionaries: ''
	category: 'BibStuff'!
!BibEntry commentStamp: 'raok 11/19/2001 14:23' prior: 0!
I represent a bibliography entry.
Use
	BibEntry from: 'title,name,ref,number'
to make an instance of me, and
	bibentry asString
to convert one of my instances back to a String.
!


!BibEntry methodsFor: 'comparing' stamp: 'raok 11/19/2001 14:26'!
< aBibEntry
	title = aBibEntry title ifFalse: [^title < aBibEntry title].
	name = aBibEntry name ifFalse: [^name < aBibEntry name].
	ref = aBibEntry ref ifFalse: [^ref < aBibEntry ref].
	^number < aBibEntry number! !

!BibEntry methodsFor: 'comparing' stamp: 'raok 11/19/2001 14:28'!
= aBibEntry
	^title = aBibEntry title and: [
	 name = aBibEntry name and: [
	 ref = aBibEntry ref and: [
	 number = aBibEntry number]]]! !

!BibEntry methodsFor: 'comparing' stamp: 'raok 11/19/2001 14:30'!
hash
	^(((title hash)*37 + name hash)*43 + ref hash)*97 + number hash! !


!BibEntry methodsFor: 'converting' stamp: 'raok 11/19/2001 14:36'!
asString
	^title, ',', name, ',', ref, ',', number! !


!BibEntry methodsFor: 'initialising' stamp: 'raok 11/19/2001 14:37'!
from: aString
	|ts|
	ts _ aString findTokens: ','.
	self assert: [ts size = 4].
	title _ ts at: 1.
	name _ ts at: 2.
	ref _ ts at: 3.
	number _ ts at: 4.
	^self
! !


!BibEntry methodsFor: 'accessing' stamp: 'raok 11/19/2001 14:24'!
name
	^name! !

!BibEntry methodsFor: 'accessing' stamp: 'raok 11/19/2001 14:24'!
number
	^number! !

!BibEntry methodsFor: 'accessing' stamp: 'raok 11/19/2001 14:24'!
ref
	^ref! !

!BibEntry methodsFor: 'accessing' stamp: 'raok 11/19/2001 14:24'!
title
	^title
! !

"-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "!

BibEntry class
	instanceVariableNames: ''!

!BibEntry class methodsFor: 'instance creation' stamp: 'raok 11/19/2001 14:37'!
from: aString
	^self new from: aString! !





More information about the Squeak-dev mailing list