File names was Re: Warning: Large Babel translation

Andreas Raab andreas.raab at gmx.de
Sat Nov 15 14:49:41 UTC 2003


Yoshiki wrote:
> I don't think we should pick UTF-8 as a single (external) encoding
> for image, and let the VM does the all normalization work.  There are
> many system that the native encoding is not Unicode based.  In that
> case, VM will need to carry a big table of conversion for CJK
> characters, which I don't think a good idea, because there is no such
> things like a single consistent conversion table for this purpose...
> To overcome this problem, we should carry a Squeak-standard Unicode
> conversion table and treat it as the official one.

I agree that we shouldn't pick UTF-8 as the exclusive interface for the VM -
we need the VM to report to us what it expects. Partly because for some uses
of Squeak we may want to use a Unicode subset, or maybe don't use the wide
characters at all (for example, for cell phones shipped in the US ;) So the
VM should report what it expects, and the image should deal with whatever it
gets. That makes it easy to deal flexibly with situations where the VM
otherwise would have to guess what exactly to do.

Diego wrote:
> The #squeakToIso and #isoToSqueak pair don't cover this problem?
> 
> What is needed? Something like #currentPlatformEncodeToSqueak?

Something that tells the image in what format to expect and to send data to
the VM. #isoToSqueak and #squeakToIso will work well, if and only if the VM
expects ISO encoding (this is probably a good start for any new port). If it
doesn't, the result won't be any better than it is today. Depending on the
VM we may have varying encodings - for example, a bare hardware platform may
want to keep things as simple as possible whereas something like Windows
which is used in lots of different settings may give you something more
general (such as UTF-8) and take the burden of translating it appropriately.

Cheers,
  - Andreas




More information about the Squeak-dev mailing list