MarkingKey

Richard A. O'Keefe ok at cs.otago.ac.nz
Wed Jun 12 02:24:27 UTC 2002


I wrote:
	> I have never EVER heard the term "marking key" used before, and I've
	> been studying and teaching algorithms for longer than I care to remember.
	> The term that is almost universally used is "sentinel",
to which Niko Schwarz <niko.schwarz at gmx.net> replied:
	Well, I didn't find the word in my dictionary, so I translated it like I found 
	it appropriate. But sentinel sounds nice, too.
	
It's not what "sounds nice".  "Sentinel" is quite simply the standard
English term for this concept.  Check almost any data structures and
algorithms textbook published in English and you are likely to find it.
For example, Sedgewick, Algorithms in C, 1990 edition first uses the
term on page 100.  In a discussion of insertion sort, oddly enough.

	>  - it is quite as common to have a use for "minus infinity" as
	>    for "plus infinity"
	>  - in many cases a sentinel is NOT an extreme value at all, but
	>    something having a particular property (being a blank, not being
	>    a letter, being divisible by something, having an area which is
	>    extreme, being the name of an empty file you are sure to be able
	>    to open, ...)
	
	Ho, that's not my marking key.

Precisely.  The point is, most of the time a sentinel is useful,
your "marking key" is not.

	They're quite useful, each one, like an element 
	z in a linked list.

Well, no, an element z in a linked list is NOT particularly useful,
and if any books are candidates for burning, books that make simple
linked lists look difficult are on the top of that pile.

	But this "sentinel" I'm talking about is a special one, and as 
	it seems to me, it is very common when working with basic data
	structures.
	
As I have demonstrated, your "marking key" didn't work with numbers,
and while that could be fixed, it still didn't work with Strings,
Dates, Times, or any other things that could be compared.  If your
"marking key" isn't any use when sorting an array of strings, what
use is it?

I repeat, I have studied and taught data structures for years,
and I cannot emphasise sufficiently strongly that sentinels really
are NOT all that useful.

The idea made sense when we were concerned about saving every cycle
we could on machines that could only execute a few thousand instructions
per second.  Even then, it only really made sense when the other
comparisons that were being done involved numbers.  Any comparison that
involves a procedure call, for example, is likely to be so much slower
than the integer comparisons that posting a sentinel eliminates that
it just isn't worth bothering.

One day it dawned on me that I COULDN'T use the sentinel technique in
Pascal, because if I had to sort an entire array, there just wasn't
any place to PUT the sentinel.  I then did the experiments that showed
me you very seldom get any worthwhile speedups from sentinels.

	> It is very seldom appropriate in the other languages either.
	> A sentinel should be a value that you KNOW CANNOT POSSIBLY occur as
	> legitimate data.  As such, sentinel values tend to be extremely
	> problem-specific.
	
	How "seldom appropriate"?

I mean that if you use maxInt as a sentinel in an array,
you are setting yourself up for disaster.
One day someone will innocently include maxInt as a value in the array,
and your algorithm will stop working.

What about using the IEEE 754 "+infinity" and "-infinity" values?
Well, they can be legitimate results of computation, so I can't use those
either.

Basically, in most languages, any time you have a predicate less(x,y),
ANY of the values that less(,) accepts as an argument COULD be legitimate
data in some container.  If you "steal" one of those values to use as a
sentinel, then one day you will have to pay for your crime, because one
day that value will appear as data that you have to handle as normal data.

	I need a value that is below or above anything else, 
	that's all.

No, you don't need it.  You just think you do.  Your problem is not
"find a magic value", it's "sort fast".  And the solution is not a
new "MarkingKey" class, it's an existing #sort or #sort: method in
an Array.

	And when numbers are keys (very common) than the biggest or 
	smallest number is the solution. 

Not as common as you might think.  Dates, Times, and/or Timestamps, yes.
But they aren't numbers.  When I sort, it's usually some kind of String,
or something even more complicated, such as (file name, stat(2) record
pairs in C).

You can only use your MarkingKey class when you KNOW you are sorting numbers.
And if you know you are sorting numbers, you should use a radix sort or a
bucket sort, not insertion sort.




More information about the Squeak-dev mailing list