Unicode support
Michael Klein
Mklein at nts.net
Tue Sep 21 20:23:45 UTC 1999
Unicode in Squeak/Smalltalk issues:
These are more desgin issues, rather than the actual mechanics of
implementing Unicode.
Still, a Unicode implentation in squeak would be usefull without
resolving any of the below.
- The assignment character.
Unicode has a left-pointing-arrow. Squeak seems to have
an assumed character encoding that nobody really seems to have thought
about.
Im sure everybody has seen Squeak's left-arrow turn into an underscore
when leaving the
environment.
- Return Character.
Similarly up-arrow and caret are confused. This is usually not as big a
problem since
they look roughly equivalent, but may cause quite a hassle for people
who are try to
compose with a caret (both foreign languages, and mathematics)
- Line end conventions.
Unicode has a line separator character and a paragraph separator
character to distinguish
the semantics that most OS's confuse with Lf/CR/CRLF
- Other characters.
There are other characters that have more specific semantics then ASCII.
For example,
the ASCII minus sign: - can represent a dash or a minus sign; both of
which have
unique code point, in addition to the umbiguous ascii version. These
are others.
(This issue is pretty similar to the line-end-convention).
- Potentially *much* larger list of valid binary selectors.
Depending upon your disposition towards APL, this may be a good or bad
thing.
Personnaly, I'd love to have a binary sellect and collect so one could
compose without
distracting parenthesis. On the other hand, I've got a syntactic sweet
tooth :-)
- Upper & lower case may not exist in the language you are using.
I once showed a couple of Japanese programmer how easy it is to use
Kanji identifiers (in VW).
They thought it was very nice. With only 6 keywords in the language
(counting thisContext),
compiler mods are not to difficult. Still, there is the code base...
There was a multi-lingual
Squeak thread a while back....
- Private Use area
If we are, as a community, going to start creating "special" characters
to represent
objects embedded in text, this is the place to do it. Perhaps a
Smalltalk-community-wide
registry of said characters, sort like IANA.
While on the subject, I think that this is the wrong approach.
Objects should not be embedded in text, text should be embedded in
objects.
-- MIke Klein
P.S. Anyone still programming Smalltalk with two kinds of colons :-?
More information about the Squeak-dev
mailing list
|