[BUG] equivalence between strings and symbols

John W. Sarkela john_sarkela at 4thEstate.com
Thu Apr 6 15:02:26 UTC 2000


If I understand correctly, there is no problem with the current
implementation of #= in SequenceableCollection. The ANSI spec
first asserts that #= should be symmetric and then that equivalent
objects should have identical hashes. (my paraphrase)

The current implementation of SequenceableCollection satisfies
these constraints admirably.

The problem arises because Symbol and String are the same
species of Object and Symbol redefines #= without a
corresponding redefinition in String. The redefinition
of #= in Symbol violates the invariant that #= be symmetric.
This violation can only occur in comparisons between sequenced
collections of characters of the same species as Symbol.

conclusion... 
    1. SequenceableCollection as it stands is correct
    2. there are no problems with hashing
    3. sequenced collections of characters that are not
    Symbols but of the same species as Symbol must ensure
    that they never return <true> when compared to a Symbol.

Minimal change to the system suggests redefining #= in String
to introduce a guard clause to ensure this invariant.

John Sarkela TFEI
[ | ] ye olde curly brace face :-}>

> From: "R. A. Harmon" <harmonra at webname.com>
> Reply-To: squeak at cs.uiuc.edu
> Date: Thu, 06 Apr 2000 08:35:34 -0500
> To: squeak at cs.uiuc.edu
> Subject: Re: [BUG] equivalence between strings and symbols
> Resent-From: squeak at cs.uiuc.edu
> Resent-Date: 6 Apr 2000 13:38:12 -0000
> 
> At 01:12 PM 4/5/00 -0700, John W. Sarkela wrote:
>> working with Squeak 2.7 we discovered the following situation
>> 
>> #squeak = 'squeak' false
>> 'squeak' = #squeak true
>> 
>> mathematics defines equivalence as a relation that is
>> 1. reflexive
>> 2. symmetric
>> 3. transitive
>> 
>> Well, the above just ain't symmetric.
>> 
>> The problem arises because Strings and Symbols are the
>> same species of collection. This satisfies the definition
>> of equivalence that String inherits, whilst Symbol redefines
>> equivalence to be identity.
>> 
>> My assertion is that in Smalltalk a String should never
>> be equivalent to a Symbol. (Too much code depends upon
>> Symbol identity being the same as equivalence.)
>> 
>> Given that, what is the correct course of action...
>> 1. Make Strings and Symbols different species.
>> 2. Redefine #= in String to ensure that a string and
>> a symbol never are declared equivalent
> 
> I'm adding and changing Squeak methods so they conform to the ANSI spec., so
> I selfishly suggest adding an #= method to class SequenceableCollection as
> it seems to correspond the <sequenceReadableCollection> protocol.  It
> refines #= as follows:
> 
> 5.7.8 Protocol: <sequencedReadableCollection>
> . . .
> 5.7.8.2 Message Refinement: = comparand
> . . .
> Definition: <Object>
> . . .
> The value of receiver = comparand is true if and only if the value of
> comparand = receiver would also be true.  If the value of receiver =
> comparand is true then the receiver and comparand must have equivalent hash
> values.  Or more formally:
> 
> receiver = comparand =>
> receiver hash = comparand hash
> 
> The equivalence of objects need not be temporally invariant.  Two
> independent invocations of #= with the same receiver and operand objects may
> not always yield the same results.  However, only objects whose
> implementation of #= is temporally invariant can be reliably stored within
> collections that use #= to discriminate objects.
> 
> Refinement: <sequenceReadableCollection>
> Unless specifically refined, the receiver and operand are equivalent if all
> of the following are true:
> 
> 1. The receiver and operand are instances of the same class.
> 2. They answer the same value for the #size message.
> 3. For all indices of the receiver, the element in the receiver at a
> given index is equivalent to the element in operand at the
> same index.
> 
> Element lookup is defined by the #at: message for the receiver and operand.
> 
> This would affect SequenceableCollection subclasses:
> 
> ArrayedCollection
> Array
> ByteArray
> String
> Symbol
> Interval
> OrderedCollection
> SortedCollection
> 
> 
> It will take me a while to sort out the effects of this change and determine
> which of the subclasses must have #= added or changed.
> 
> --
> Richard A. Harmon          "The only good zombie is a dead zombie"
> harmonra at webname.com           E. G. McCarthy
> Spencer, Iowa
> 





More information about the Squeak-dev mailing list