Philosophical questions about collections

Richard A. O'Keefe ok at atlas.otago.ac.nz
Tue Jul 10 00:55:52 UTC 2001


I wrote:
	> Take the numbers.  5 and 5.0 are NOT equivalent in Smalltalk.
	>     [x := 5   * (2 raisedTo: 53). x + 1 = x] value is false
	>     [x := 5.0 * (2 raisedTo: 53). x + x = x] value is true 
	> If they behave differently, it doesn't make a lot of sense to regard them
	> as equal.
	
(x + x should of course be x + 1 in the second example.)
	
	You argue this here, and then argue differently below for collections.
	
Not at all.  I used *exactly* the same criterion in both cases:
  - if they behave differently under queries, they're not equal.

	In my view, it's simply very risky to make equality comparisons
	involving floats.

It's very risky to make equality comparisons involving floats *if* you
don't think about what you are doing.  But then, doing any kind of programming
without adequate background and thinking what you are doing is risky.
For any reasonable floating-point system (and while there have been some
highly UNreasonable ones, Squeak doesn't run on those machines) there is a
wide range of integers for which floating-point calculations are exact.
These days, we even know how to implement sqrt(), exp(), and ln() so that
the answer they give is the within half an ULP of the exact answer, and
in a reasonable amount of time too; unfortunately we cannot yet rely on
being given such implementations.  But for basic operations, we can.

	In general, we both agree that #= needs to be defined case by case, and
	that it's often arbitrary what exactly #= means.
	
Well, that's not quite what I meant.  What I meant is that there are
many *different* application-dependent notions of equality, and in the
absence of a strong mathematical theory to define what #= should mean,
it shouldn't mean *anything*.  For example, should a set and an array
that happens to have the same elements be regarded as #= ?  No, there's
a different relation #hasSameElementsAs: which should be named and used.

	> Let's confine ourselves to messages that do not alter the receiver,
	> because I've always understood = to be about value of current state,
	> so that (1 to: 4) and #(1 2 3 4) are not distinguished by #at:put:.
	
	That's fine with me, though I prefer these to be #~=.
	
	A fine default definition for #= in Collection, IMHO, would be:
	
		= aCollection
			self class == aCollection class and: [
				self asArray = aCollection asArray ].
	
That would be a lousy definition.  Consider this example:

    x := #('a' 'big' 'dog' 'chased' 'a' 'small' 'cat') asSet
    y := #('a' 'small' 'cat' 'chased' 'a' 'big' 'dog') asSet
    x asArray ==> #('big' 'chased' 'a' 'dog' 'small' 'cat')
    y asArray ==> #('big' 'chased' 'a' 'small' 'dog' 'cat')
    x = y ==> true
    x asArray = y asArray ==> true

It basically works only for collections where the order is part of the
abstract value, or where there is a canonical order for the elements so
that the result of asArray is determined by the abstract value, not the
concrete representation.

I can live with #asArray: not being determined by the abstract value;
I can't tolerate #= not being determined by the abstract value.
If #= isn't going to behave itself, it is better for it not to exist.

	Using #species instead of #class would be okay, as well -- I can't
	decide which is better.
	
It's precisely that which currently gets #() -vs- to: in trouble.
	
	Granted, as with comparing floats, it is dangerous in many cases
	to use this method.  Is it really useful to ask whether, say, two Heap's
	are #= ?
	
Basically, a Heap is a Bag (bag union for heaps is called "merging")
plus a total order and an operation for removing an element with extreme
value.  Equality of heaps is the same as equality of bags.  Equality is
thus uniquely defined for Heaps.  Whether it's useful is another matter;
one of my points is that IF #= is defined on heaps it should be defined
RIGHT.  This is one of the cases where there is a strong mathematical
theory that says uniquely what equality of values should mean.  If that's
not easily implementable, then have a definition that signals a
MeaningfulButNotImplemented error.

	Bleck, that's ugly.  The system should at least be consistent,

Hey, we agree!





More information about the Squeak-dev mailing list