Proposal3: Make $_ a valid identifier character

Richard A. O'Keefe ok at atlas.otago.ac.nz
Tue May 30 03:14:39 UTC 2000


	> Let us not lose sight of the fact that Smalltalk (like Pascal) originally
	> didn't include low lines because it simply _couldn't_; the first version
	> of ASCII just plain didn't include them.
	
	Really?
Yes.

	This is historically correct?
Yes.

	With all the wierd glyphs, I'm surprised.  Can you give a source
	(or would someone form SqCentral care to comment)?
	
CACM, 1960-something.  Go to your local museum of computing and take
a look at the keyboard of a Model 33 Teletype.  There were quite a few
papers in CACM during the early 60s about the development of ASCII.
We _nearly_ got a logical negation sign (like the one now at 16rAC
in ISO Latin 1).  Some of the weirder characters (like ~, ^, `, and to
a certain extent |) make perfect sense when you realise that it was
*official* in ASCII that you could make accented letters by overstriking
(either using backspaces or carriage returns).  For that, see for example
ECMA-6 "7-Bit coded Character Set", 6th edition, section 7 (p8):

	While all graphic characters specified in this International
	Standard are spacing characters, it is possible, by using
	BACKSPACE or CARRIAGE RETURN, to image two or more graphic
	characters at the same character position ...

	For example, SOLIDUS and EQUALS SIGN may be combined to image
	"not equals". ...

	Diacritical marks may be allocated ot the bit combinations
	specified in 6.4.3 and be available for composing accented
	letters.  For each composition a sequence of three characters,
	the first or last of which is the letter to be accented and
	the second of which is BACKSPACE may be used.  Furthermore,
	QUOTATION MARK, APOSTROPHE or COMMA can be associated with a
	letter by means of BACKSPACE for the composition of an accented
	letterr with a diaeresis, an acute accent, or a cedilla respectively.

Strictly speaking, the set of characters which could *standardly* be
encoded in ASCII was rather larger than the set which can be encoded
directly in ISO Latin 1 or any other 8-bit character set.  It's just
that some characters (like underlined lowercase w with acute accent)
might take 5 bytes (_ BS ' BS w).  And characters like " had to be
compromises between good-looking quotation marks (which they aren't)
and good-looking umlauts (which they aren't either).  Still, an imaging
device _could_ have translated entire BS sequences to appropriate images.

This construction was *not* included in the 8-bit ISO 8859 family, which
is thus *not* an upwards-compatible extension of ECMA-6 = ISO 646 = ASCII.

	[snip]
	> Clinging to BaStudlyCaps because "that's the way it has always been done
	> in Smalltalk" (as another person suggested)
	
	I really presume you mean me as the "another person" 

No.  I wouldn't have written an*other* person if I had meant that.

	FWIW, some means of transparently dealing with underscore scarred code
	would be welcome. And certainly, in non-code browsers in Squeak, having
	arrows where there should be underscores is annoying. I grant that there
	are problems to be solved. But I'd still rather deal with them then deal
	with loads of underscore infected identifiers.
	
Perhaps we can both cool down our language.
It is understood that some people have come to love their chains.
It is also understood that some people like identifiers to be readable
and pronounceable and welcome technological advances that permit this.

	So, let's agree that we differ, and that I exist to  make your reading
	life a living hell ;)
	
No, I *shan't* agree to that.

You see, there is a possible middle ground.

We _agree_ that the assignment arrow is a better symbol for
assignment than :=.
I presume that we _agree_ that the up arrow is a really neat
symbol for "return" in Smalltalk.
We _agree_ that it would be useful to be able to view text
that uses (backspaceless) ASCII correctly (better still, ISO Latin 1
or Windows 1252).
We _agree_ that people should be able to read Smalltalk code.
We _agree_ that there is more than one experience of readability.
We only disagree about which convention is better.

As I wrote in a previous message, let's have four distinct characters
left-arrow low-line up-arrow circumflex-accent,
so that left arrow remains available.

So have a *lexical* rule in Squeak
	<lower case letter> <low line> <upper case letter>
 =>     <lower case letter> <upper case letter>
and a browser Preference for whether
	<lower case letter> <upper case letter>
is displayed as is (for BaStudlyCaps lovers) or as
	<lower case letter> <low line> <upper case letters>
(for phrases_Are_Not_Words thinkers).

Then I could see (and *read) Sequenceable_Collection while
others who had not enabled that option would see SequenceableCollection.
Also, I could write (and *read) Multi_Word_Identifiers,
but Bijan Parsia, looking at the *same* code, would see MultiWordIdentifiers.

That way, everyone gets to decide for themselves what is readable,
but we can still communicate and share code.





More information about the Squeak-dev mailing list