Smalltalk Design Question

Wed Jan 28 16:16:50 UTC 1998

Mark:

I'm not one of the original guys of whom you asked the question, but I
can't resist commenting anyway.

I've always hated systems that take what I write and change it to something
else. If I write #examineSituationAndTakePositiveAction I will not be happy
trying to read #examinesituationandtakepositiveaction, both because it is
nearly unreadable and because it is not what I wrote. WinDoze drives me
nuts with its insistance it know just how file names should be written, no
matter what I wrote.

I suspect that in the overall scheme of things there are relatively few
runtime errors due to typos, and those get caught right quickly. The
#asLowerCase versus #asLowerCase difference is trivial to fix when porting:
just add one new method that invokes the other one.

I don't see how polymorphism would be easier. As it now works, the method
name is a symbol, and when a real lookup is done it simply has to find a
matching symbol, which is really cheap since symbols have unique object
pointers. The length of the name is not an issue, and you may note that IBM
Smalltalk can inline the comparison of two symbols by just comparing two
32-bit object pointers.

If case didn't matter, and various cased names were considered equivalent,
then it would seem that lookup has to do a character-by-character
comparison of each method name.

If all names are mashed to some upper or lower case equivalent, and then
converted to symbols, it won't be any faster; it'll just be harder to read.

If you hide the mashing so that the code still shows the uppercase version
but the 'real' name is unicase, then what should this answer?

   #aSymbolJustInCase == #asymboljustincase

What if I pass #aSymbolJustInCase and the receiver does a #perform:? Should
perform create a lowercase symbol and then do the lookup? (Creating new
symbols is usually quite expensive).

My guess is that your IBM Smalltalk implementation is really slow because
it has to create new symbols. I've just been through tuning a program that
uses a lot of symbols (but not as method selectors) and allows mixed case.
I finally had to preprocess all the symbols rather than convert then as I
ran into them (often many times). I also wrote a special method (IBM
Smalltalk only - it uses private junk) for converting symbols to lower case
that only does the conversion when it has to:

!Symbol publicMethods !

asLowercaseSymbol
	" Answer a symbol converted to lower case.
	  Since this is a very expensive operation the obvious code:
			aSymbol asLowercase asSymbol
	  is not recommended. This method first checks to see if
	  it is necessary to perform the conversion and does so
	  only if an uppercase character is found. "

	[ :element :index |
		| newChar |
		newChar := CurrentLCCType asLowercase: element.
		newChar ~= element ifTrue: [
			^ self asLowercase asSymbol  ] ]
	applyWithIndex: self from: 1 to: self size.

	^ self! !

I just tried a case (in Squeak 1.3) where the doesNotUnderstand: method was:

doesNotUnderstand: aMessage
	aMessage selector == #ASDF ifTrue: [
		^ self perform: #asdf
			withArguments: aMessage arguments ].
	^ self perform: aMessage selector asLowercase asSymbol
		withArguments: aMessage arguments!

This was in a class with an #asdf method but no #ASDF or #aSdF methods. I
then tried two cases, one sending #ASDF and one sending #aSdF. The first
case took 133 microseconds for 10,000 sends and the second took 9833
microseconds. (I did not measure an empty test so the true ratio is
probably even worse.)

I won't guess what The Designer really had in mind; I've guessed wrong
several times. It might be that He'll speak to us.   :-)

Dave

At 16:19 -0500 1/27/98, Mark Wai wrote:
>Hi,
>
>I have a question that I think the "original" Smalltalk designers could help
>me understand:
>
>I always wondering why Smalltalk method signature (not keywords like self,
>super etc.,) are CaSe SENSITIVE.  Was this by design or due to
>implementation constraints?  I can understand why static languages are case
>sensitive(e.g. more efficient compiling) but for dynamic languages like
>Smalltallk, I can think of more advantages of being non-case sensitive
>rather than case sensitive.  Some obvious advantages of having Smalltalk
>non-case sensitive that I can think of are:
>
>-  reduce typo runtime errors
>-  polymorphism is easier due to lesser constraint (especially when the
>method signature is long)
>-  reduce ambigous methods that have same name but different cases
>etc,
>
>Of course, the Smalltalk compiler must not allow the same method define if
>the names are the same but the case is different.  Implementation wise, I
>think it should not be that hard (I have prototyped that in IBM Smalltalk a
>while ago, e.g. changing >>doesNotUnderstand: for quick hack).  Therefore, I
>think I must have missed something.  I sure hate seeing different Smalltalk
>dialects implements the same method with the same name but different case.
>I think one of those classic example is >>asUppercase versus >>asUpperCase.
>This creates portablitiy problem between multi-vendor dialects.  Maybe this
>should consider to be part of ANSI standard too.
>
>I appreciate any comments.
>
>Thanks.
>
>--
>Mark Wai
>Frontier Systems Architecture Inc.
>mailto: mwai at ibm.net or:[ mwai at frontiersa.com]
>__

_______________________________
David N. Smith
IBM T J Watson Research Center
Hawthorne, NY
_______________________________
Any opinions or recommendations
herein are those of the author
and not of his employer.