[Vm-dev] Strawman proposal for m17n

K. K. Subramaniam kksubbu.ml at gmail.com
Sat Oct 16 12:12:44 UTC 2010


Hi,

I wanted to revisit old discussions in 2003/2005/2007 about getting Squeak VM 
to handle multilingual inputs in Indic context. Indic keyboard input can come 
through XIM in multiple languages regardless of the locale setting. LANG may 
be set to en_US.UTF-8 or en_IN.UTF-8. XIM input method engines are used to 
generate multilingual keystrokes so the app only sees UTF-8 encoded 
characters, not keys. The current design for passing multiple encodings into 
the image will not work for m17n.

Currently, the logic for keycode and the keychar (i.e. character typed, 
possibly composed of multiple keycodes) are splattered across the VMs and 
images. Tying input encoding to locales complicates Indic support. Composition 
is platform-specific and is best handled in VM.

It looks to me that the complications are due to multiplexing of two keyboard 
input streams here - control codes (buttons) and text codes (Characters). 
Buttons are used to fire operations while Characters go into text streams. 
Button codes need to deal with modifiers and codes < 127 but not with m17n, 
AFAIK. Characters codes don't need modifiers but need to deal with m17n issues.

Here is my proposal to move forward without affecting existing deployments:

1. Map key codes into button codes (e.g. OK, Cancel, Cut, Copy, ...) in the VM 
itself and pass only button codes into the image. I like Lex Spoon's proposal 
to start with X11 encodings for buttons (keysymdef.h).  One of the button 
codes can be reserved for a soft keyboard. New images can start using these 
codes and be ready to run on handhelds and tablets too.

2. Henceforth all VMs will encode all char inputs in utf8 except for non-en 
Latin1 locales. For these locales, latin1 will be used by default and utf8 if 
-compositioninput is used. The new VM will pass a dummy button on startup to 
signal input encoding, or we could introduce a new primitive to signal this 
state. New images can use it to unify clipboard and input interpreters. Old 
images can be patched to work with new VMs.

I am not fully aware of past history in code page issues for latin1 locales, 
so this is just a strawman. Please do point out gaps in it.

Thanks .. Subbu


More information about the Vm-dev mailing list