[BUG] equivalence between strings and symbols

Stephan Rudlof sr at evolgo.de
Tue Apr 11 17:49:47 UTC 2000


Dan (and others, too, of course),

Dan Ingalls wrote:
> 
> Stephan Rudlof wrote:
> > >What about completely *removing* Symbol>>= ?
> > >
> > >Has anybody tried this (without changing other methods)?
> 
> Now, Stephan, the true Squeaker does not ask such a question ;-).

So it seems to be, that I'm not a true Squeaker... ;-).

> 
> When in doubt, try it out.
> 
> I did, and everything seems reasonably happy.  Meaning that I removed that one method, then recompiled the whole system, and everything seems to work as before.

I wasn't aware of how easy it is to recompile the whole system, but now
I know it!
For newbies: MethodFinder is a great tool!

> We should, of course, expect it to.  If two symbols were == before, they should be = after.  If they were not == before, they should not be = afterward.
> 
> I say "reasonably" because I have not timed anything yet.  There can be no doubt that = runs slower than == between large strings.  Next time I will time a recompilation of the system before and after -- this should be a good test, since the compiler works fairly heavily with Symbols.

I have made some benchmarks (Linux, Squeak2.8alpha, update: #1974,
Morphic), newest first:

Smalltalk garbageCollect.
[Smalltalk recompileAllFrom: 'Aardvark'.] timeToRun
"without Transcript"
 727570 "sr suggestion in ST"

 735792 "without Symbol>>="
 718127 "with Symbol>>="
 735708 "without Symbol>>="
 718915 "with Symbol>>="

"with Transcript"
 796683 "with Symbol>>="
 810646 "without Symbol>>="
 815783 "with Symbol>>="

With Transcript was a little funny, so I tried it without with
reasonable results (I'm not sure, that it was just the Transcript, which
has made the results funny, but it is slow).

My suggestion in ST was:

Symbol>>
= anObject
        (anObject isMemberOf: Symbol)
                ifTrue: [^ self == anObject].
        ^ super = anObject

Regarding performance it is about in the middle between with and without
ordinary Symbol>>=.
With a primitive implementation it should be some faster though: but I
think priority has to rework the code as suggested by Dan below;
probably a primitive is superfluous then.

> 
> > >Then
> > >     #abc = 'abc'
> > >     'abc' = #abc
> > >should both be true; and
> > >     #abc == 'abc'
> > >     'abc' == #abc
> > >should both be false.
> 
> I much prefer this approach to that of legislating against =-ity between behaviorally identical objects.
> 
> Allen_Wirfs-Brock commented...
> >The ANSI Standard was trying to allow for this possibility. I think it
> >succeeded in allowing this.
> >
> >Java also essentially takes the above approach. There is only a single
> >String class. The intern() method of a String returns a unique "canonical"
> >instance of the String that corresponds to the receiver.  Thus such a
> >canonical string has essentially the same semantics as a Smalltalk Symbol
> >but only a single class is used to represent both canonical and
> >non-canonical strings.
> 
> I have no problem with making Stephan's suggested change (I know others may also have suggested it).  Some things will slow down, but most can be located by searching for the pattern <Symbol literal> = <expression> or <expression> = <Symbol literal>.  Most of these could be replaced by == if speed is an issue, and we would be left with very little downside. (*)

My benchmarks seem to support this view.

Greetings,

Stephan (no a little bit more of a true Squeaker... ;-) )

> 
> It would certainly be nice to take a line item off the Squeak FAQ.
> 
>         - Dan
> 
> (*)  We should also scan all creations of Dictionary, as someone may have used a Dictionary instead of an IdentityDictionary, knowing that Symbol = ran about as fast as the == used in IdentityDictionary.

-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3





More information about the Squeak-dev mailing list