[Vm-dev] Strawman proposal for m17n
K. K. Subramaniam
kksubbu.ml at gmail.com
Sat Oct 16 12:12:44 UTC 2010
I wanted to revisit old discussions in 2003/2005/2007 about getting Squeak VM
to handle multilingual inputs in Indic context. Indic keyboard input can come
through XIM in multiple languages regardless of the locale setting. LANG may
be set to en_US.UTF-8 or en_IN.UTF-8. XIM input method engines are used to
generate multilingual keystrokes so the app only sees UTF-8 encoded
characters, not keys. The current design for passing multiple encodings into
the image will not work for m17n.
Currently, the logic for keycode and the keychar (i.e. character typed,
possibly composed of multiple keycodes) are splattered across the VMs and
images. Tying input encoding to locales complicates Indic support. Composition
is platform-specific and is best handled in VM.
It looks to me that the complications are due to multiplexing of two keyboard
input streams here - control codes (buttons) and text codes (Characters).
Buttons are used to fire operations while Characters go into text streams.
Button codes need to deal with modifiers and codes < 127 but not with m17n,
AFAIK. Characters codes don't need modifiers but need to deal with m17n issues.
Here is my proposal to move forward without affecting existing deployments:
1. Map key codes into button codes (e.g. OK, Cancel, Cut, Copy, ...) in the VM
itself and pass only button codes into the image. I like Lex Spoon's proposal
to start with X11 encodings for buttons (keysymdef.h). One of the button
codes can be reserved for a soft keyboard. New images can start using these
codes and be ready to run on handhelds and tablets too.
2. Henceforth all VMs will encode all char inputs in utf8 except for non-en
Latin1 locales. For these locales, latin1 will be used by default and utf8 if
-compositioninput is used. The new VM will pass a dummy button on startup to
signal input encoding, or we could introduce a new primitive to signal this
state. New images can use it to unify clipboard and input interpreters. Old
images can be patched to work with new VMs.
I am not fully aware of past history in code page issues for latin1 locales,
so this is just a strawman. Please do point out gaps in it.
Thanks .. Subbu
More information about the Vm-dev