Unicode support (File names wasRe: Warning:Large Babel translation)

Yoshiki Ohshima Yoshiki.Ohshima at acm.org
Sat Nov 22 14:01:59 UTC 2003


  Andreas,

> >   Actually, Lex and I seems to agree that we'll want to have a way to
> > set the VM's encoding, not only the image the the report from the VM,
> > from the image for certain platforms.  On Windows, it can still go
> > without it for some time.  But on Mac OS X, which *can* be UTF-8
> > based, the new VM has to behave as if it is the same old MacRoman VM
> > for the old images sake, or has to behave as if UTF-8 aware, based on
> > the image running on it.  As far as I know, Hayashi-san is working on
> > this implementation.
> 
> To be honest, I don't like this particular idea at all. If we assume that
> the VMs must support a variety of encodings then it makes the implementation
> even more complex for no clear gain. While I can see that for some interim
> period the VM may indeed support multiple encodings I don't really see that
> as a long-term viable option. There's just too much work involved with no
> clear benefit.

  Ah, but multiple here means "two", and it is only if necessary.  If
the natural/native encoding of platform is incompatible with MacRoman,
and the VM has to deal with those two, there must be something.

  And yes, this is for the transition.  In the other word, this "two"
encodings solution is for "old image & new VM" combination.

> >   Yes.  To ease the burden on VM maintainers, who are after all
> > outnumbered by Squeak image level programmers, we want to do some
> > stuff in image.
> 
> Right. So once more let's start this with The Simplest Thing That Could
> Possibly Work. Namely a primitive by which the VM can report the encoding it
> is currently using. From there on, about everything can be done in the
> image.

  If the VM doesn't do irreversible modification to the data, that is
the case on Windows VM, about everything can be done in the image,
because it has all bits.  But, if the VM does something irreversible,
that's not.  And letting Mac OS X VM support UTF-8 is the latter case.

> If we want to play smarts at the VM level we may even use that
> primitive to distinguish between an "encoding-aware" image (one sending the
> primitive would be assumed to be able to cope with the encoding) as well as
> old images which don't know nothing.

  This means "the image has a way to set the encoding of VM," right?
Just one binary digit of information from image to VM, and VM
distinguish the image version.  That's the idea.

-- Yoshiki




More information about the Squeak-dev mailing list