To make String and Symbol ANSI compliant regarding #=

Doug Way dway at mat.net
Mon Apr 17 20:54:21 UTC 2000


On Mon, 17 Apr 2000, Lex Spoon wrote:

> My vote is still for #abc = 'abc'.  It's intuitive, if you think of
> Symbols as high-performance Strings.

I guess this is the crux of the matter.  What should the semantics of
Symbol be?  Should it be thought of as a special String?

Personally, I've never thought of a Symbol as being a special kind of a
String, or being in any way related to a String.  But this might be
related to my having used other Smalltalks which treated Symbols and
Strings differently, I'm not sure.  When I learned Smalltalk, I guess I
considered Symbols as something you might use in place of enumerated types
from other languages, not as a kind of String.

Given this, having #abc = 'abc' seems totally alien and wrong to me.  But
*if* Symbols are really intended to be kinds of Strings, I guess it makes
some sense.  (Hmm, I notice that evaluating:   #hey, #there   will return
the string 'heythere'.  Ouch.)

I know that Symbol is a subclass of String in Squeak.  Perhaps the
semantic interpretation of Symbols as special Strings only exists because
of this current implementation in Squeak?  (I guess it's sometimes hard to
say which follows the other... implementation and semantics can get
intertwined...)

Apparently a long time ago, Symbol was called UniqueString...?  Certainly
I could see a UniqueString being considered a kind of String, and allowing
an instance of one to be equal to an instance of the other.  Since the
UniqueString class was renamed to Symbol, though, it seems like the
intention is that String and Symbol are not really the same kind of thing,
semantically.  (Just guessing here.)  Also, the fact that Strings are
declared with different punctuation (single quotes versus the # sign),
adds to the argument that they are different animals.

Most importantly, the default behavior in nearly all of the Smalltalk
hierarchy is that an instance of one class will not = an instance of a
different class, even for subclasses. (e.g.  #(1 2 3) asOrderedCollection
~= #(1 2 3) asSortedCollection )  It seems like you'd want to break this
rule only with very good reason... certainly it's a violation of
expectations from a beginner's point of view to have #joe = 'joe'.

Hmm, after looking at the hierarchy some more, I see that equality between
instances of different classes is mostly governed by the
"species" method.  A handful of classes implement "species" to override
the default behavior, including Symbol>>species which returns String.  I
guess this is just part of the Symbols-and-Strings-can-be-equal
implementation.

Okay, enough rambling.  I guess my main point is that, to me, treating
Symbols as semantically the same as Strings seems strange.  (Hey, that's
some good alliteration there.)  If the argument against changing this is
mostly that it will introduce bugs, that is not a sufficient argument in
my book.  Just make the change early in the 2.9alpha cycle, and let the
bugs get ironed out before 2.9 is done.  (Certainly at this relatively
early stage in Squeak's life, we shouldn't be too obsessed with
backward-compatibility.)

As a bonus, you'd be ANSI compliant, too.  But, even ignoring ANSI, I
think my arguments above make a case for #joe = 'joe' -> false.

- Doug Way
  EAI/Transom Technologies, Ann Arbor, MI
  http://www.transom.com
  dway at mat.net, @eai.com





More information about the Squeak-dev mailing list