UTF-8 (was: [Special Report] 3.6 is out, now what? :-)

Yoshiki Ohshima Yoshiki.Ohshima at acm.org
Sat Oct 11 13:27:56 UTC 2003


  Daniel,

> Yoshiki Ohshima <Yoshiki.Ohshima at acm.org> wrote:
> >   As I wrote somewhere, I still want the Diego's translation stuff to
> > have the chained dictionary.  Otherwise, it should be relatively easy
> > to merge Diego's stuff and rest of m17n stuff.
> You talked a bit about the chained dictionaries stuff, but I didn't get
> the gist of that or why its needed. Does it mean nested environments for
> translations, with inheritance? how would this change the programmer's
> model/api? this might be worth another thread...

  The idea is that we would want to avoid single large table for all
of the applications loaded into an image.  It *should* make it easy to
let each package has its own translation tables, and free the
programmers from worrying about the conflict with other packages.

  I think it would be done not through inheritance, but data structure.

> >   Beware that I'm going to propose, (err, I propose) to switch to
> ;-) 
> > UTF-8 file out and latin-1 internal representation.  To maintain the
> > backward compatibility, we would want to have different fileout
> > suffixes.  (Also, I think that to increment classVersion of
> > ImageSegment and to squeakToIso the strings in the segment is
> > necessary.)
> Does UTF-8 fileout format mean that fileing in old changeset will be
> transparent? I seem to recall that ASCII is more or less legal
> UTF-8...

  The multilingual file stream I have allows you to switch the
encoding of the file, or converter object associated with it, so that
you can file in the MacRoman file out into the m17n image.  However,
the system has to know in some way the encoding used in the file it is
loading.

  The other problem is that there are rather many places where the
right-half of MacRoman characters are used already.  Classes written
by Göran, the control buttons for BookMorphs, anything with
middle-dot, etc.  Sumi-san and I wrote a method that converts the
.changes into the UTF-8, so it is doable to move to UTF-8 with all
changes retained.

-- Yoshiki



More information about the Squeak-dev mailing list