Unicode support

Michael Klein Mklein at nts.net
Tue Sep 21 20:23:45 UTC 1999


Unicode in Squeak/Smalltalk issues:
These are more desgin issues, rather than the actual mechanics of
implementing Unicode.
Still, a Unicode implentation in squeak would be usefull without
resolving any of the below.

- The assignment character.

Unicode has a left-pointing-arrow. Squeak seems to have
an assumed character encoding that nobody really seems to have thought
about.
Im sure everybody has seen Squeak's left-arrow turn into an underscore
when leaving the
environment.

- Return Character.

Similarly up-arrow and caret are confused.  This is usually not as big a
problem since
they look roughly equivalent, but may cause quite a hassle for people
who are try to
compose with a caret (both foreign languages, and mathematics)

- Line end conventions.

Unicode has a line separator character and a paragraph separator
character to distinguish
the semantics that most OS's confuse with Lf/CR/CRLF

- Other characters.

There are other characters that have more specific semantics then ASCII.
 For example,
the ASCII minus sign: -    can represent a dash or a minus sign; both of
which have
unique code point, in addition to the umbiguous ascii version.  These
are others.
(This issue is pretty similar to the line-end-convention).

- Potentially *much* larger list of valid binary selectors.

Depending upon your disposition towards APL, this may be a good or bad
thing.
Personnaly, I'd love to have a binary sellect and collect so one could
compose without
distracting parenthesis.  On the other hand, I've got a syntactic sweet
tooth :-)

- Upper & lower case may not exist in the language you are using.

I once showed a couple of Japanese programmer how easy it is to use
Kanji identifiers (in VW).
They thought it was very nice. With only 6 keywords in the language
(counting thisContext),
compiler mods are not to difficult.  Still, there is the code base...
There was a multi-lingual
Squeak thread a while back....

- Private Use area

If we are, as a community, going to start creating "special" characters
to represent
objects embedded in text, this is the place to do it.  Perhaps a
Smalltalk-community-wide
registry of said characters, sort like IANA.

While on the subject, I think that this is the wrong approach.

Objects should not be embedded in text, text should be embedded in
objects.

-- MIke Klein

P.S.  Anyone still programming Smalltalk with two kinds of colons :-?





More information about the Squeak-dev mailing list