ASCII code

Mike Klein mike at twinsun.com
Sat Jul 25 12:20:33 UTC 1998



On Fri, 24 Jul 1998, Hans-Martin Mosner wrote:

> I'd like to suggest that Squeak switches to Unicode.

I think that this would be a good idea, as well.
There are a couple of Squeak/Smalltalk-specific ramifications I'd like to
point out.

In Unicode U+005F is the LOW LINE or SPACING UNDERSCORE
rather than the left arrow used as the assignment operator in Squeak.
The Unicode standard doesn't even provide a cross-reference to the left
arrow for that code point. (Anybody remember the ASR33 teletypes?)
The LEFTWARDS ARROW is U+2190.

Although I prefer seeing the LEFTWARDS ARROW over COLON + EQUALS,
I prefer COLON + EQUALS over SPACING UNDERSCORE.
Viewing code snippets outside of the Squeak environment is often annoying.

There is a similar, but less annoying problem with the return operator.
U+005E is the CIRCUMFLEX ACCENT.
U+2191 is the UPWARDS ARROW.

The only ASCII control code that has specified semantice in Unicode
is U+0009 HORIZONTAL TAB. Smalltalk uses CR, conventionaly as a line-end
character, and this often causes headaches moving between OS's with 
differing conventions. Unicode provides

U+2028 LINE SEPARATOR
  and
U+2029 PARAGRAPH SEPARATOR

Since Smalltalk treats all white-space equivalently, this is not really
a language issue, but I'd thought I'd point out their existance as a
steping-stone to treating line-end-convention issues.

Several of the characters used as binary operators have ambiguous
semantics in ASCII and are disambiguated in Unicode.

For example:

U+002D is the HYPHEN-MINUS
U+2012 is the MINUS SIGN

We also have 
U+00D7 MULTIPLICATION SIGN
not to mention the Zapf dingbat
U+2715 MULTIPLICATION X

Once we are using multi-byte characters, we would need to choose an
encoding (UTF8 would be a good choice)

Heading off into the blue-plane (or, perhaps an APL-induced halucination)
we have, now hundreds of new characters for binary operators.

U+2208 ELEMENT OF   (looks sorta like an E -- from set theory)
could be implemented as

<U+2208> aCollection
	^aCollection includes: self

Or perhaps we could embed Morphs into source-code that would represent
mathematical expressions in a much more readable fasion (sort of like
the new graphing calculator on the macintosh -- Definately check this
out if you have never seen it)

But to take off into the blue, we need to build up speed in the pink;
Does anybody know where to get a font that encodes all visible unicode
characters?  It sure would be handy.

-- Mike Klein





More information about the Squeak-dev mailing list