UTF8 Squeak
Yoshiki Ohshima
yoshiki at squeakland.org
Sat Jun 9 01:29:48 UTC 2007
Subbu,
> > > Well, UTF8 is just an encoding of Unicode code points, So, Squeak will
> > > have to support Unicode. Its language and tools will need to handle
> > > Unicode code points and UTF8 streams. Internally, whether code points or
> > > UTF8 encoding is used would depend on the context.
> >
> > Why do you get the impression that Squeak doesn't support it?
> Squeak's Unicode/UTF8 support seemed incomplete. I couldn't get Squeak on
> Linux to take in .AN= or .FNp.
It is incomplete in many ways. Sure. But that wasn't the issue you
were raising; you were talking about the interface between the image
and VM but the hard part.
> How about :
> a) Use Unicode chars in literals and text fields.
You can do this already.
> I should be able to write
> math equations in PluggableText.
This is irrelevalent with the encoding scheme the system use.
> b) Use Unicode chars in names (object, method, variable, symbols). Children
> should be able to name their scripts and variables in their language in
> Etoys.
We have been doing this many years already. What we can't do is to
display Indic characters yet (which will be solved very soon).
> c) See fallback glyphs for Unicode. Like four hex digits laid out 2x2 in a
> small box the same height as the current font. It works much better than []
> box.
This would be definitely good. (BTW, Andreas did similar stuff (not the
numbers in a box)).
> d) Have Buttons that generate Unicode. This could be used to build soft
> keyboards. (cf. PopUpMenu>>readKeyboard uses asciiValue :-().
For some experiment, it would be good.
> e) Use Modal input - codes coming in from Sensors could be button presses
> (e.g. ESC, hotkeys to switch keyboard layouts, ) or multilingual text
> sequences.
Not sure what you mean. Japanese input with IME does this already.
> f) See 'current language' indicator in input fields.
What do you mean by "input fields"?
> Handling backspace will be language dependent.
Yes.
> > Using UTF-8 internally throughot the system would be a challenge,
> > especially thinking about that the overloaded methods like at:,
> > at:put: and all of these have to be disambiguated as to what it means.
> at:put: is a random access operation and UTF-8 is not meant for such purposes.
> UTF-8 works well for streams of characters and Unicode for random access and
> lookup. This is what I meant when I said it would depend on context. Then
> there are mixed streams like keyboard input. I could be reading button
> presses (like Enter for OK) or reading in a stream of characters in a text
> field. We may need instream character codes to switch modes and
> language.
One way is to rely on the OS features.
> I am still coming upto speed on Squeak multilingual support and these
> observations are based on my explorations so far. It is quite possible that I
> may have missed something.
Even from misunderstanding, any comments are welcome^^;
-- Yoshiki
More information about the Squeak-dev
mailing list
|