Experimental 3.5-1 VM and multilingual support
Ian Piumarta
ian.piumarta at inria.fr
Tue Mar 4 07:52:00 UTC 2003
Folks,
For the intrepid explorers out there, I've just updated the 3.5-1devel
tarballs available in the usual place [1] (GNU/Linux{386,ppc}, Darwin/ppc
and a MacOSX .app bundle). I'm particularly interested to hear about any
problems with 8-bit characters, either in copy/paste between Squeak and
remote applications or with international keyboards. Here's the bulk of
the beef...
3.5-1 can now use arbitrary character set encodings for: the internal
font encoding, the encoding used to copy/paste text from/to remote
applications, and the encoding it expects the filesystem to be using.
The VM is now also capable of supplying the clipboard text to X11
applications that request STRING_UTF8 conversion. Three new
command-line switches and three new environment variables are provided
to control the behaviour, as follows:
-encoding <enc> (or SQUEAK_ENCODING="<enc>")
tells the VM which encoding is being used by the fonts within the
image (and hence the encoding which is used for 8-bit characters
arriving either from the keyboard or from text copied from
elsewhere). The default is still MacRoman, but if you are using
the X11Fonts package then to get the accents back in the right
places in all the X11 fonts simultaneously just set
SQUEAK_ENCODING="ISO-8859-15" (or
SQUEAK_ENCODING="Latin9" which is the same thing)
in your environment. This default will change to ISO-8859-15
when the image drops the Apple fonts.
-pathenc <enc> (or SQUEAK_PATHENC="<enc>")
tells the VM what encoding the filesystem is using. Modern FS
(Darwin and RedHat8 and maybe others too) use UTF-8 to encode
8-bit chars in pathnames. (Older Unixes probably either use
Latin1 or simply barf or behave randomly according to how
individual applications are written.) The default is "UTF-8"
(which is where the current Unix FS trend is heading for). All
file operations WITHIN THE VM SUPPORT CODE now convert the
pathname character encoding between SQUEAK_ENCODING and
SQUEAK_PATHENC as appropriate. (If you have Latin1 chars in your
paths then just set SQUEAK_PATHENC="ISO-8859-1" and things should
work perfectly.) File operations in other plugins (if there are
such beasts) will probably get things hopelessly wrong on UTF-8
based filesystems. If any plugin writers out there want to know
how to fix this situation (trivially) then send me email.
Note that the above relates only to pathnames. What the image
chooses to do with the _contents_ of files is not the concern
of the VM support code...
-textenc <enc> (or SQUEAK_TEXTENC="<enc>")
tells the VM which encoding to use when asking other applications
for the selection (or pasteboard) contents. The default is
ISO-8859-1 on X11 (since that's the standard 8-bit text encoding
for X applications). The default is UTF-8 on MacOSX (for the same
reason). If you would like Squeak to ask other X11 apps for
selections converted as STRING_UTF8 then set
SQUEAK_TEXTENC="UTF-8"
but be warned that there are still _very_ few X11 apps that
correctly honour such requests; Emacs in particular doesn't know
what to do with them. (Transferring UTF-8 text between two Unix
Squeak VMs in this way [naturally] works just fine. ;)
Note that setting SQUEAK_TEXTENC will not change the way Squeak
_answers_ selection requests: if the requestor gives STRING as the
target conversion type then it will get Latin1 encoded text; if it
asks for STRING_UTF8 then it will (correctly) get UTF-8 encoded
text.
Encoding names are not case-sensitive.
Other improvements for MacOSX users include:
- 8-bit chars in HFS+ paths now work correctly (comes for free with
the PATHENC conversion, and was pretty much the itch I scratched
to arrive at all of the above encoding madness ;)
- the final few problems with international keyboards should be
fixed (Squeak should respond to deadkeys exactly like all other
applications)
- a problem with Squeak failing to reactivate correctly when
deminiaturising from the dock (requiring a click away from and
then back in the Squeak window) should no longer occur.
The X11 display driver currently doesn't implement deadkeys or
multikey composition at all. (I think I've figured out enough about
the X input method stuff to make this work, but it would be
significant hassle. If anybody really, _really_ wants this then let
me know and I'll do some experiments.)
Bon courage !
Ian
Note: the encoding names follow the IANA-registered character set
names. The following are recoginsed on MacOSX (where I have to
provide a table to convert from a string name to an OS constant; names
on the same line are equivalent):
MACROMAN MAC MACINTOSH CSMACINTOSH
UTF8 UTF-8
ISOLATIN9 LATIN9 ISO-8859-15
ISOLATIN1 LATIN1 ISO-8859-1
Adding (lots of) others is trivial (but someone will have to prod me
to do it).
The X11 code uses the iconv(3) function that is built into most modern
versions of libc. (GNU/Linux and BSD users got limited support in
glibc2.2 and much more complete support in glibc2.3.) The VM
therefore recognises all registered coding systems (given a
sufficiently modern libc) including the entire Latin series, all the
MS codepages (too keep our antarctic friends happy) and even EBCDIC on
many systems. If you have the 'iconv' program then the complete (very
long) list of supported encodings can be printed by running 'iconv -l'.
More information about the Squeak-dev
mailing list
|