UTF8 Squeak

Yoshiki Ohshima yoshiki at squeakland.org
Sat Jun 9 01:29:48 UTC 2007


  Subbu,

> > > Well, UTF8 is just an encoding of Unicode code points, So, Squeak will
> > > have to support Unicode. Its language and tools will need to handle
> > > Unicode code points and UTF8 streams. Internally, whether code points or
> > > UTF8 encoding is used would depend on the context.
> >
> >   Why do you get the impression that Squeak doesn't support it?
> Squeak's Unicode/UTF8 support seemed incomplete. I couldn't get Squeak on 
> Linux to take in .AN= or .FNp.

  It is incomplete in many ways.  Sure.  But that wasn't the issue you
were raising; you were talking about the interface between the image
and VM but the hard part.

> How about :
> a) Use Unicode chars in literals and text fields.

  You can do this already.

> I should be able to write 
> math equations in PluggableText.

   This is irrelevalent with the encoding scheme the system use.

> b) Use Unicode chars in names (object, method, variable, symbols). Children 
> should be able to name their scripts and variables in their language in 
> Etoys.

  We have been doing this many years already.  What we can't do is to
display Indic characters yet (which will be solved very soon).

> c) See fallback glyphs for Unicode. Like four hex digits laid out 2x2 in a 
> small box the same height as the current font. It works much better than [] 
> box.

  This would be definitely good.  (BTW, Andreas did similar stuff (not the
numbers in a box)).

> d) Have Buttons that generate Unicode. This could be used to build soft 
> keyboards. (cf. PopUpMenu>>readKeyboard uses asciiValue :-().

  For some experiment, it would be good.

> e) Use Modal input - codes coming in from Sensors could be button presses 
> (e.g. ESC, hotkeys to switch keyboard layouts, ) or multilingual text 
> sequences.

  Not sure what you mean.  Japanese input with IME does this already.

> f) See 'current language' indicator in input fields.

  What do you mean by "input fields"?

>  Handling backspace will be language dependent.

  Yes.
  
> > Using UTF-8 internally throughot the system would be a challenge,
> > especially thinking about that the overloaded methods like at:,
> > at:put: and all of these have to be disambiguated as to what it means.
> at:put: is a random access operation and UTF-8 is not meant for such purposes. 
> UTF-8 works well for streams of characters and Unicode for random access and 
> lookup. This is what I meant when I said it would depend on context. Then 
> there are mixed streams like keyboard input. I could be reading button 
> presses (like Enter for OK) or reading in a stream of characters in a text 
> field. We may need instream character codes to switch modes and
> language.

  One way is to rely on the OS features.

> I am still coming upto speed on Squeak multilingual support and these 
> observations are based on my explorations so far. It is quite possible that I 
> may have missed something.

  Even from misunderstanding, any comments are welcome^^;

-- Yoshiki



More information about the Squeak-dev mailing list