On 14.10.2010, at 03:49, K. K. Subramaniam wrote:
Hi Bert,
The LANG check you introduced in etoys.sh will break Indic m17n fixes because most Indic environments leave the locale as en_IN.UTF-8 or en_US.UTF-8 and then choose the most appropriate language for a given app. This is why I patched etoys.sh to use compositioninput if .UTF-8 encoding is specified in $LANG. Input encoding does not depend on language code.
Subbu
But every Linux I know uses utf-8 now. E.g. German uses de_DE.UTF-8 and English uses en_US.UTF-8. Since I don't know what compositioninput does, I thought it better to restrict it to the languages you mentioned.
So, how would it affect the other languages?
I previously thought the encoding part of LANG referred to file names. I was surprised you thought it had anything to do with X11 input.
- Bert -
On Thursday 14 Oct 2010 9:30:00 pm Bert Freudenberg wrote:
On 14.10.2010, at 03:49, K. K. Subramaniam wrote:
Hi Bert,
The LANG check you introduced in etoys.sh will break Indic m17n fixes because most Indic environments leave the locale as en_IN.UTF-8 or en_US.UTF-8 and then choose the most appropriate language for a given app. This is why I patched etoys.sh to use compositioninput if .UTF-8 encoding is specified in $LANG. Input encoding does not depend on language code.
Subbu
But every Linux I know uses utf-8 now. E.g. German uses de_DE.UTF-8 and English uses en_US.UTF-8. Since I don't know what compositioninput does, I thought it better to restrict it to the languages you mentioned.
compositioninput flag is used by vm-display-X11 to use multi-byte input method instead of Latin-1 input method. This allows input method engines in Xlib to switch in different encoders for keyboard input without recompiling an app. It is required for m17n inputs. See XmbLookupString(3) and XLookupString(3) for details.
So, how would it affect the other languages?
It wont. compositioninput uses locale settings and is backward compatible with Latin-1 input methods.
I previously thought the encoding part of LANG referred to file names. I was surprised you thought it had anything to do with X11 input.
Encoding covers characters (LC_CTYPE). Legacy environments don't tack .UTF-8, so I am using it as a hint that the environment is more recent and will support m17n.
Given that Unicode is now widely accepted and implemented, we could use compositioninput as default and deal with legacy encodings as exceptions.
Subbu
On 14.10.2010, at 10:20, K. K. Subramaniam wrote:
On Thursday 14 Oct 2010 9:30:00 pm Bert Freudenberg wrote:
On 14.10.2010, at 03:49, K. K. Subramaniam wrote:
Hi Bert,
The LANG check you introduced in etoys.sh will break Indic m17n fixes because most Indic environments leave the locale as en_IN.UTF-8 or en_US.UTF-8 and then choose the most appropriate language for a given app. This is why I patched etoys.sh to use compositioninput if .UTF-8 encoding is specified in $LANG. Input encoding does not depend on language code.
Subbu
But every Linux I know uses utf-8 now. E.g. German uses de_DE.UTF-8 and English uses en_US.UTF-8. Since I don't know what compositioninput does, I thought it better to restrict it to the languages you mentioned.
compositioninput flag is used by vm-display-X11 to use multi-byte input method instead of Latin-1 input method. This allows input method engines in Xlib to switch in different encoders for keyboard input without recompiling an app. It is required for m17n inputs. See XmbLookupString(3) and XLookupString(3) for details.
So, how would it affect the other languages?
It wont. compositioninput uses locale settings and is backward compatible with Latin-1 input methods.
How so? The latin-1 locales use utf32 now, they do not expect multiple utf-8 bytes.
I previously thought the encoding part of LANG referred to file names. I was surprised you thought it had anything to do with X11 input.
Encoding covers characters (LC_CTYPE). Legacy environments don't tack .UTF-8, so I am using it as a hint that the environment is more recent and will support m17n.
Given that Unicode is now widely accepted and implemented, we could use compositioninput as default and deal with legacy encodings as exceptions.
Subbu
But that would require more changes on the image side, right? For now your m17n environment is only used by a few locales.
- Bert -
On Thursday 14 Oct 2010 11:17:39 pm Bert Freudenberg wrote:
But that would require more changes on the image side, right? For now your m17n environment is only used by a few locales.
No. composition input is backward compatible. Images which depend on legacy encodings will not be affected.
M17n is expected to be backward compatible with legacy encodings. After all, multilingual means accepting multiple encodings in a single paragraph. It should not rule out the use of a single encoding across the entire paragraph.
Subbu
On 14.10.2010, at 11:02, K. K. Subramaniam wrote:
On Thursday 14 Oct 2010 11:17:39 pm Bert Freudenberg wrote:
But that would require more changes on the image side, right? For now your m17n environment is only used by a few locales.
No. composition input is backward compatible. Images which depend on legacy encodings will not be affected.
M17n is expected to be backward compatible with legacy encodings. After all, multilingual means accepting multiple encodings in a single paragraph. It should not rule out the use of a single encoding across the entire paragraph.
Subbu
That has nothing to do with my concern.
What I'm worried about is this: If the locale is "de_DE.UTF-8" then the image will use Latin1Environment and UTF32InputInterpreter. That interpreter expects either a utf32 keycode or MacRoman. But if the VM now sends UTF-8 instead of setting the utf32 field, it would break, no?
- Bert -
On Friday 15 Oct 2010 1:23:10 am Bert Freudenberg wrote:
What I'm worried about is this: If the locale is "de_DE.UTF-8" then the image will use Latin1Environment and UTF32InputInterpreter. That interpreter expects either a utf32 keycode or MacRoman. But if the VM now sends UTF-8 instead of setting the utf32 field, it would break, no?
Recent VMs generate keycodes in evtBuf sixth taking encoding into account for latin-1 but not Indic languages. In M17nInputInterpreter>>nextCharFrom:firstEvt:, I check for evtBuf sixth before falling back to UTF-8 so it should not affect Latin-1 input.
In any case, compositionInput uses direct method by default. For instance, I use UIM which defaults to direct method (us_intl layout). When I need to type an Indic character, I press SHIFT-CTRL to switch languages and SHIFT-SPACE to toggle on composite encoding for three-byte Indic characters and toggle it off for accented characters like ü or é.
If you have access to de_DE.UTF-8 systems, please give M17n a try. Just turn on composition input and add 'de' to M17nEnvironment>>supportedLanguages and do: LanguageEnvironment resetKnownEnvironments; clearDefault. HandMorph clearInterpreters.
If input breaks, it is a major defect.
Subbu
On 14.10.2010, at 18:15, K. K. Subramaniam wrote:
On Friday 15 Oct 2010 1:23:10 am Bert Freudenberg wrote:
What I'm worried about is this: If the locale is "de_DE.UTF-8" then the image will use Latin1Environment and UTF32InputInterpreter. That interpreter expects either a utf32 keycode or MacRoman. But if the VM now sends UTF-8 instead of setting the utf32 field, it would break, no?
Recent VMs generate keycodes in evtBuf sixth taking encoding into account for latin-1 but not Indic languages. In M17nInputInterpreter>>nextCharFrom:firstEvt:, I check for evtBuf sixth before falling back to UTF-8 so it should not affect Latin-1 input.
But currently M17nInputInterpreter is not used for latin-1 languages so this is irrelevant.
In any case, compositionInput uses direct method by default. For instance, I use UIM which defaults to direct method (us_intl layout). When I need to type an Indic character, I press SHIFT-CTRL to switch languages and SHIFT-SPACE to toggle on composite encoding for three-byte Indic characters and toggle it off for accented characters like ü or é.
The question is how the VM passes key events to the image with the "direct method" if compositioninput is enabled. If it uses the utf32 field in a single event is fine. If it uses utf-8 via multiple events it is not.
If you have access to de_DE.UTF-8 systems, please give M17n a try. Just turn on composition input and add 'de' to M17nEnvironment>>supportedLanguages and do: LanguageEnvironment resetKnownEnvironments; clearDefault. HandMorph clearInterpreters.
If input breaks, it is a major defect.
Subbu
We're constantly talking past each other ;)
You were saying that without further changes to the image, it is safe for us to enable composition input, if LANG specifies UTF-8 encoding.
The case I am worried about (see above) is if the VM uses composition input but the image does not use your m17n input interpreter, which would be the case for most European languages.
I am out of country so cannot easily test this. And x2sqKeyCompositionInput() is too complex to just verify the source is sane.
- Bert -
On Friday 15 Oct 2010 7:20:55 am Bert Freudenberg wrote:
The case I am worried about (see above) is if the VM uses composition input but the image does not use your m17n input interpreter, which would be the case for most European languages.
etoys.sh and M17n patch go together so your worry is misplaced. Even if it were to be true, it would affect only non-en latin-1 locales on Linux platform only. Multilingual text input depends on compositioninput being present for en locales too. etoys.sh can be patched to skip composition input for these locales for now (de, el, es, fr, it, nl, ps, pt, ro, ru, si, sk, sv, tr).
So here is my plan for handling input code: 1. On Linux, Etoys VM will use composed input for all locales except non-en Latin-1 and UTF32 non-en Latin-1 2. On Mac, it will use MacRoman for all locales. 3. On Win32, it will use UTF32 for all locales. 4. composed input for the rest.
Subbu
etoys-dev@lists.squeakfoundation.org