UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

Yoshiki Ohshima yoshiki at squeakland.org
Thu Jun 7 18:58:05 UTC 2007


  Subbu,

At Thu, 7 Jun 2007 19:26:14 +0530,
subbukk wrote:
> 
> On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
> > Yes, it would seem to simplify matters to use UTF-8 consistently for
> > interfacing between the image and the VM.  Instead of the VM picking
> > an encoding and telling the image which one it picked, it could go
> > ahead and convert it to UTF-8.
> >
> > This applies not just to filenames, but every place where text is
> > exchanged between the Smalltalk world and the VM, for example keyboard
> > events and the clipboard.
> This is not an easy job as the assumption of ASCII pervades Squeak. The only 
> system that I am aware of that bit the bullet and went the whole hog is Plan 
> 9. The team got the kernel, library and utilities to work with UTF8 as basic 
> character unit and wrote about experience:
>    http://plan9.bell-labs.com/sys/doc/utf.html

  If "this" is the interface between the Smalltalk world and the VM,
it is not that hard thing.  There are only three paths for such
interfacing, and you just convert at there.

  It might be just a matter of self-defence, but I still think that
the way we did it (i.e., not change the VM first, and rely on the
image level conversion) was the right thing.

  Back in 1999:
  - we were more concerned about small devices such as MI-series
    Zaurus.  On that, adding the conversion table from/to Shift-JIS to
    Unicode was significant.  We seem to care less about obscure
    platforms in these days, we care less flabors of Unix, as you
    provide the Linux version, it more or less works everywhere. And
    Windows, Mac and Linux (alright, only if Tim pretends, Acorn) are
    only platforms people care. 
  - Releasing an image that requires a single version of VM would have
    been a mistake.  Not all Squeak users was tech savvy.  Some users
    have restrictions in terms of what they can change on their
    computers (at schools and such).  Providing working installers for
    all major platforms was (still is) a large task.

> Is there a kernel image that just contains basic Squeak and VMMaker where one 
> could try building a UTF-8 Squeak? Smaller the better.

  Ian might put his vmm-n.n-n image on the squeakvm.org sometime
soon.

-- Yoshiki



More information about the Squeak-dev mailing list