File names was Re: Warning: Large Babel translation

Sat Nov 15 06:41:43 UTC 2003

  Hello,

> > What we really need here is some way to query the VM about what it expects
> > to see for "strings" and use it consistently. This may be different for
> > varying platforms but it would most likely be UTF8 for all windows platforms
> > (since it's trivial to convert UTF8 forth and back to the underlying code
> > page). 
> Trivial is possibly a bit exagerated but UTF8 is surely a good 
> compromise to move ahead while keeping backward compatibility to ASCII.
> So the English speaking world woudn't be have to bother about our 
> European, Asian and African encoding problems.

  Well, most of the non-English, non-Unicode encodings are more or
less compatible with ASCII^^;

> > If enough people get bitten by the
> > inability to write their umlauts/accents this may trigger a force large
> > enough to get the "in-image issues" solved.
> 
> This is a long standing problem which needs a good solution at the base. 
> UTF-8 is such a solution.

  I don't think we should pick UTF-8 as a single (external) encoding
for image, and let the VM does the all normalization work.  There are
many system that the native encoding is not Unicode based.  In that
case, VM will need to carry a big table of conversion for CJK
characters, which I don't think a good idea, because there is no such
things like a single consistent conversion table for this purpose...
To overcome this problem, we should carry a Squeak-standard Unicode
conversion table and treat it as the official one.

-- Yoshiki