UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])
yoshiki at squeakland.org
Thu Jun 7 18:58:05 UTC 2007
At Thu, 7 Jun 2007 19:26:14 +0530,
> On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
> > Yes, it would seem to simplify matters to use UTF-8 consistently for
> > interfacing between the image and the VM. Instead of the VM picking
> > an encoding and telling the image which one it picked, it could go
> > ahead and convert it to UTF-8.
> > This applies not just to filenames, but every place where text is
> > exchanged between the Smalltalk world and the VM, for example keyboard
> > events and the clipboard.
> This is not an easy job as the assumption of ASCII pervades Squeak. The only
> system that I am aware of that bit the bullet and went the whole hog is Plan
> 9. The team got the kernel, library and utilities to work with UTF8 as basic
> character unit and wrote about experience:
If "this" is the interface between the Smalltalk world and the VM,
it is not that hard thing. There are only three paths for such
interfacing, and you just convert at there.
It might be just a matter of self-defence, but I still think that
the way we did it (i.e., not change the VM first, and rely on the
image level conversion) was the right thing.
Back in 1999:
- we were more concerned about small devices such as MI-series
Zaurus. On that, adding the conversion table from/to Shift-JIS to
Unicode was significant. We seem to care less about obscure
platforms in these days, we care less flabors of Unix, as you
provide the Linux version, it more or less works everywhere. And
Windows, Mac and Linux (alright, only if Tim pretends, Acorn) are
only platforms people care.
- Releasing an image that requires a single version of VM would have
been a mistake. Not all Squeak users was tech savvy. Some users
have restrictions in terms of what they can change on their
computers (at schools and such). Providing working installers for
all major platforms was (still is) a large task.
> Is there a kernel image that just contains basic Squeak and VMMaker where one
> could try building a UTF-8 Squeak? Smaller the better.
Ian might put his vmm-n.n-n image on the squeakvm.org sometime
More information about the Squeak-dev