[BUG] equivalence between strings and symbols

Allen Wirfs-Brock Allen_Wirfs-Brock at Instantiations.com
Mon Apr 10 19:43:03 UTC 2000


At 10:27 AM 4/10/00 -0700, John W. Sarkela wrote:
 >Originally I thought along your lines, Lex. Then I actually read
 >the ANSI spec. It suggests that collections of differing class
 >should never be equal.

I don't really think that this is the intent of the standard as expressed 
in these words (that I probably wrote) in section 5.7.8.2:

Unless specifically refined, the receiver and operand are equivalent if all 
of the following are true:
    1. The receiver and operand are instances of the same class.
    2. They answer the same value for the #size message.
    3. For all indices of the receiver, the element in the receiver at a 
given index is equivalent to the element in operand at the same index.

Note that it doesn't say anything universally about collections, it is only 
speaking about objects that conform to the protocol 
<sequenceReadableCollection>. It explicitly allows for the possibility that 
refinements of that protocol might define different rules for equivalence, 
including rules that do not require identical classes for the receiver and 
the operand.

However, the standard doesn't say anything about <readableSring> refining 
#= so the above applies. Hence, if  if liberal strings and literal symbols 
are implemented using different classes (and it's not at all obvious to me 
that this is a requirement) then
   'abc' = #'abc"  "should answer false
   #'abc' = 'abc'  "should answer false

One of the reason, that there isn't a refinement of #= is that #= for 
strings and symbols is that there historically been so much variation both 
between and within implementations in the definition of string and symbol 
=. Including the fact the definition of equality used for #>= and #<= of 
strings is different than that used for #=. In many ways, the real quality 
operation for all types of <readableString> objects is sameAs:. In fact, I 
believe that the intent is that
   'abc' sameAs: #'abc' and #'abc' sameAs: 'abc' should both answer true.

Also I not believe that the standard necessarily requires that the same 
class be used to implement mutable string objects and immutable literal 
string objects.  Thus, it isn't necessarily clear that you can assume that
    (String with:$a with: $b) = 'ab' will always answer true.

In fact, it is probably valid (according to the standard) to implement 
string literals using the same class as is used to implement symbols 
because <symbol> fulfills all the requirements of <readableString>


 > If the behavior must change (I believe it must)
 >there had better be strongly compelling reasons to go against
 >standard behavior.

If we could have found a way cleanup #= for strings (and symbols) without 
breaking all existing implementations and many existing applications we 
probably would have done so. If the squeak community doesn't share this 
imperative and has a good solution, have at it!

 >Given the standard I believe that either #= in SequenceableCollection
 >should check for class equality rather than species equality or else
 >redefine #= in String to preclude asymmetry when compared to symbols.

Remember that the standard only defines a protocol 
<sequenceReadableCollection> that specifies certain behaviors. It does not 
specify an implementation. In particular, it does not say that all objects 
that implement this behavior must inherit from some particular abstract 
class or how the implementation of the required behavior is factored among 
classes.

Allen_Wirfs-Brock at Instantiations.com





More information about the Squeak-dev mailing list