Philosophical questions about collections

Tue Jul 10 16:23:18 UTC 2001

"Richard A. O'Keefe" <ok at atlas.otago.ac.nz> wrote:
> I wrote:
> 	> Take the numbers.  5 and 5.0 are NOT equivalent in Smalltalk.
> 	>     [x := 5   * (2 raisedTo: 53). x + 1 = x] value is false
> 	>     [x := 5.0 * (2 raisedTo: 53). x + x = x] value is true 
> 	> If they behave differently, it doesn't make a lot of sense to regard them
> 	> as equal.
> 	
> (x + x should of course be x + 1 in the second example.)
> 	
> 	You argue this here, and then argue differently below for collections.
> 	
> Not at all.  I used *exactly* the same criterion in both cases:
>   - if they behave differently under queries, they're not equal.

It's a fine line.  You say that + and = should behave the same for equal
numbers, but that at:put: doesn't have to for equal collections.  You
need a complicated definition of "query" to make this statement hold for
both disallowing "5 = 5.0" and allowing "#(1 2 3) = (1 to: 3)".  In my
view, the way you define "query" versus "modification" is going to
depend on individual cases.

Change of state isn't a good definition.  For example, toggling a button
object is a change of its state, but it probably isn't a wholesale
modification.  It would be irritating if a regular button and an
OnlyOnButton were treated as #=.  On the other hand, changing the block
that fires when a button is activated, would seem like a wholesale
modification.  I would consider it okay for two blocks to be #=, but for
one of them to have its activation block fixed.

> 	In general, we both agree that #= needs to be defined case by case, and
> 	that it's often arbitrary what exactly #= means.
> 	
> Well, that's not quite what I meant.  What I meant is that there are
> many *different* application-dependent notions of equality, and in the
> absence of a strong mathematical theory to define what #= should mean,
> it shouldn't mean *anything*.  For example, should a set and an array
> that happens to have the same elements be regarded as #= ?  No, there's
> a different relation #hasSameElementsAs: which should be named and used.

Err, yes.  So what do you think of having #= at all??  If you think it
is okay, then how do you define it?

It seems to me that we want some general notion of "same as".  I don't
see how to come up with a "same as" notion that covers all possible
classes, and so it needs to be defined in individual cases.

If one insists on completely identical behavior, then one is back to
#==.

> 	A fine default definition for #= in Collection, IMHO, would be:
> 	
> 		= aCollection
> 			self class == aCollection class and: [
> 				self asArray = aCollection asArray ].
> 	
> That would be a lousy definition.  Consider this example:
> 
>     x := #('a' 'big' 'dog' 'chased' 'a' 'small' 'cat') asSet
>     y := #('a' 'small' 'cat' 'chased' 'a' 'big' 'dog') asSet

Yes, it's a lousy definition for Set's.  But is it bad as a *default*
definition for Collection ?

> 
> I can live with #asArray: not being determined by the abstract value;
> I can't tolerate #= not being determined by the abstract value.
> If #= isn't going to behave itself, it is better for it not to exist.
> 
> 	Using #species instead of #class would be okay, as well -- I can't
> 	decide which is better.
> 	
> It's precisely that which currently gets #() -vs- to: in trouble.

Yes, so I can't decide.  This isn't a killing blow, as there are several
ways out of the problem:

	1. Interval>>species could be Interval instead of Array

	2. Interval>>rangeIncludes: could be removed

	3. You could view that using Interval>>rangeIncludes:, means that you
aren't using an Interval as a Collection, and thus that certain generic
Collection methods are invalidated.

I like #3, myself.

I get the feeling that #species is an overburdened idea, but it does
have a ring of "same as" floating around with it.  That's why I think #=
might want to pay attention to it, even if it does open up possible
problems.

> 	Granted, as with comparing floats, it is dangerous in many cases
> 	to use this method.  Is it really useful to ask whether, say, two Heap's
> 	are #= ?
> 	
> Basically, a Heap is a Bag (bag union for heaps is called "merging")
> plus a total order and an operation for removing an element with extreme
> value.  Equality of heaps is the same as equality of bags.  

... I disagree -- it's not so simple.  Heaps can have different sorting
criteria, for beginners.

But even if they don't, two heaps might have multiple #= elements in
them, and the elements might end up arranged in a different order.  Are
these two heaps still #=?

There's no one right Heap>>=, though maybe we could come up with a
reasonable one.

> Whether it's useful is another matter;
> one of my points is that IF #= is defined on heaps it should be defined
> RIGHT.  This is one of the cases where there is a strong mathematical
> theory that says uniquely what equality of values should mean.  If that's
> not easily implementable, then have a definition that signals a
> MeaningfulButNotImplemented error.

AFAIK, in math there isn't general #= notion except for #==; if you mean
something other than #==, you need a term for the particular case. 
e.g., "these two systems are homomorphic", or "these two expressions
have identical normal forms".  In math, it is quite informal to say
things like "these two items are equal", and it would depend on context
to what is meant.

#= is for programmer convenience, it seems to me.  I don't see a
consistent way to define it system-wide, except with broad philosophical
strokes.  If you use #=, it seems you simply have to know how the
underlying objects will understand it.  For numbers, there are two or
maybe three possible interpretations.  For Arrays there is only one
realistic one.  For collections in general?  Well, that's not so
obvious!

Lex