I wrote: > Take the numbers. 5 and 5.0 are NOT equivalent in Smalltalk. > [x := 5 * (2 raisedTo: 53). x + 1 = x] value is false > [x := 5.0 * (2 raisedTo: 53). x + x = x] value is true > If they behave differently, it doesn't make a lot of sense to regard them > as equal. (x + x should of course be x + 1 in the second example.) You argue this here, and then argue differently below for collections. Not at all. I used *exactly* the same criterion in both cases: - if they behave differently under queries, they're not equal.
In my view, it's simply very risky to make equality comparisons involving floats.
It's very risky to make equality comparisons involving floats *if* you don't think about what you are doing. But then, doing any kind of programming without adequate background and thinking what you are doing is risky. For any reasonable floating-point system (and while there have been some highly UNreasonable ones, Squeak doesn't run on those machines) there is a wide range of integers for which floating-point calculations are exact. These days, we even know how to implement sqrt(), exp(), and ln() so that the answer they give is the within half an ULP of the exact answer, and in a reasonable amount of time too; unfortunately we cannot yet rely on being given such implementations. But for basic operations, we can.
In general, we both agree that #= needs to be defined case by case, and that it's often arbitrary what exactly #= means. Well, that's not quite what I meant. What I meant is that there are many *different* application-dependent notions of equality, and in the absence of a strong mathematical theory to define what #= should mean, it shouldn't mean *anything*. For example, should a set and an array that happens to have the same elements be regarded as #= ? No, there's a different relation #hasSameElementsAs: which should be named and used.
> Let's confine ourselves to messages that do not alter the receiver, > because I've always understood = to be about value of current state, > so that (1 to: 4) and #(1 2 3 4) are not distinguished by #at:put:. That's fine with me, though I prefer these to be #~=. A fine default definition for #= in Collection, IMHO, would be: = aCollection self class == aCollection class and: [ self asArray = aCollection asArray ]. That would be a lousy definition. Consider this example:
x := #('a' 'big' 'dog' 'chased' 'a' 'small' 'cat') asSet y := #('a' 'small' 'cat' 'chased' 'a' 'big' 'dog') asSet x asArray ==> #('big' 'chased' 'a' 'dog' 'small' 'cat') y asArray ==> #('big' 'chased' 'a' 'small' 'dog' 'cat') x = y ==> true x asArray = y asArray ==> true
It basically works only for collections where the order is part of the abstract value, or where there is a canonical order for the elements so that the result of asArray is determined by the abstract value, not the concrete representation.
I can live with #asArray: not being determined by the abstract value; I can't tolerate #= not being determined by the abstract value. If #= isn't going to behave itself, it is better for it not to exist.
Using #species instead of #class would be okay, as well -- I can't decide which is better. It's precisely that which currently gets #() -vs- to: in trouble. Granted, as with comparing floats, it is dangerous in many cases to use this method. Is it really useful to ask whether, say, two Heap's are #= ? Basically, a Heap is a Bag (bag union for heaps is called "merging") plus a total order and an operation for removing an element with extreme value. Equality of heaps is the same as equality of bags. Equality is thus uniquely defined for Heaps. Whether it's useful is another matter; one of my points is that IF #= is defined on heaps it should be defined RIGHT. This is one of the cases where there is a strong mathematical theory that says uniquely what equality of values should mean. If that's not easily implementable, then have a definition that signals a MeaningfulButNotImplemented error.
Bleck, that's ugly. The system should at least be consistent,
Hey, we agree!
"Richard A. O'Keefe" ok@atlas.otago.ac.nz wrote:
I wrote:
Take the numbers. 5 and 5.0 are NOT equivalent in Smalltalk. [x := 5 * (2 raisedTo: 53). x + 1 = x] value is false [x := 5.0 * (2 raisedTo: 53). x + x = x] value is true If they behave differently, it doesn't make a lot of sense to regard them as equal.
(x + x should of course be x + 1 in the second example.)
You argue this here, and then argue differently below for collections.
Not at all. I used *exactly* the same criterion in both cases:
- if they behave differently under queries, they're not equal.
It's a fine line. You say that + and = should behave the same for equal numbers, but that at:put: doesn't have to for equal collections. You need a complicated definition of "query" to make this statement hold for both disallowing "5 = 5.0" and allowing "#(1 2 3) = (1 to: 3)". In my view, the way you define "query" versus "modification" is going to depend on individual cases.
Change of state isn't a good definition. For example, toggling a button object is a change of its state, but it probably isn't a wholesale modification. It would be irritating if a regular button and an OnlyOnButton were treated as #=. On the other hand, changing the block that fires when a button is activated, would seem like a wholesale modification. I would consider it okay for two blocks to be #=, but for one of them to have its activation block fixed.
In general, we both agree that #= needs to be defined case by case, and that it's often arbitrary what exactly #= means.
Well, that's not quite what I meant. What I meant is that there are many *different* application-dependent notions of equality, and in the absence of a strong mathematical theory to define what #= should mean, it shouldn't mean *anything*. For example, should a set and an array that happens to have the same elements be regarded as #= ? No, there's a different relation #hasSameElementsAs: which should be named and used.
Err, yes. So what do you think of having #= at all?? If you think it is okay, then how do you define it?
It seems to me that we want some general notion of "same as". I don't see how to come up with a "same as" notion that covers all possible classes, and so it needs to be defined in individual cases.
If one insists on completely identical behavior, then one is back to #==.
A fine default definition for #= in Collection, IMHO, would be:
= aCollection self class == aCollection class and: [ self asArray = aCollection asArray ].
That would be a lousy definition. Consider this example:
x := #('a' 'big' 'dog' 'chased' 'a' 'small' 'cat') asSet y := #('a' 'small' 'cat' 'chased' 'a' 'big' 'dog') asSet
Yes, it's a lousy definition for Set's. But is it bad as a *default* definition for Collection ?
I can live with #asArray: not being determined by the abstract value; I can't tolerate #= not being determined by the abstract value. If #= isn't going to behave itself, it is better for it not to exist.
Using #species instead of #class would be okay, as well -- I can't decide which is better.
It's precisely that which currently gets #() -vs- to: in trouble.
Yes, so I can't decide. This isn't a killing blow, as there are several ways out of the problem:
1. Interval>>species could be Interval instead of Array
2. Interval>>rangeIncludes: could be removed
3. You could view that using Interval>>rangeIncludes:, means that you aren't using an Interval as a Collection, and thus that certain generic Collection methods are invalidated.
I like #3, myself.
I get the feeling that #species is an overburdened idea, but it does have a ring of "same as" floating around with it. That's why I think #= might want to pay attention to it, even if it does open up possible problems.
Granted, as with comparing floats, it is dangerous in many cases to use this method. Is it really useful to ask whether, say, two Heap's are #= ?
Basically, a Heap is a Bag (bag union for heaps is called "merging") plus a total order and an operation for removing an element with extreme value. Equality of heaps is the same as equality of bags.
... I disagree -- it's not so simple. Heaps can have different sorting criteria, for beginners.
But even if they don't, two heaps might have multiple #= elements in them, and the elements might end up arranged in a different order. Are these two heaps still #=?
There's no one right Heap>>=, though maybe we could come up with a reasonable one.
Whether it's useful is another matter; one of my points is that IF #= is defined on heaps it should be defined RIGHT. This is one of the cases where there is a strong mathematical theory that says uniquely what equality of values should mean. If that's not easily implementable, then have a definition that signals a MeaningfulButNotImplemented error.
AFAIK, in math there isn't general #= notion except for #==; if you mean something other than #==, you need a term for the particular case. e.g., "these two systems are homomorphic", or "these two expressions have identical normal forms". In math, it is quite informal to say things like "these two items are equal", and it would depend on context to what is meant.
#= is for programmer convenience, it seems to me. I don't see a consistent way to define it system-wide, except with broad philosophical strokes. If you use #=, it seems you simply have to know how the underlying objects will understand it. For numbers, there are two or maybe three possible interpretations. For Arrays there is only one realistic one. For collections in general? Well, that's not so obvious!
Lex
squeak-dev@lists.squeakfoundation.org