Unicode support (File names was Re: Warning: Large Babel translation)

Yoshiki Ohshima Yoshiki.Ohshima at acm.org
Mon Nov 17 22:38:21 UTC 2003


  Lex,

> >   Another important point is that we'll need the in image conversion
> > anyway.  Again, we don't want to use UTF-8 for the internal
> > representation, the internal string has to be converted before passed
> > to primitives.  (So, what kind of data structure do you imagine to use
> > as the internal representation?)
> 
> Ack!  We do *not* need in-image conversion.   Doesn't it disturb you
> that a minimal language like Smalltalk might end up being *required* to
> carry around translation tables for any encoding a VM might request?  It
> bothers me deeply and is the crux of my disturbance with this idea.  I
> would very much like to have simple images be possible which are not
> fully multinationalized.  Even more, I would like images to not be
> required to dynamically load code beacuse they are running on a new
> VM.

  You can make a minimal image if you like.  Also, you won't be
required to dynamically load code because they are merely running on a
new VM.  You'll be required only when your code or the text in your
image require those table.

> To contrast, we certainly do need translation *in the VM* on some
> platforms.  For example, different filesystems can use different
> encodings for the filename, and so the problem can't simply be ducked to
> the image.

  What is the example for this?  If you mount a Windows file system to
Mac OS, the OS converts the path names so the VM won't have to deal
with the difference of filesystems.

> At best, I can imagine allowing the image and VM to negotiate a
> different encoding under some circumstances, as a performance
> improvement.  But it would be nice if there is a simple interface
> available for images that don't care.

  Just declaring "UTF-8 is the only one" wouldn't be the solution.

> >   Also, if you write a program that access a web server (hehe, you
> > did, actually), the code that the server returns can be anything.  You
> > need to convert the response from the server to the internal
> > representation before render it.
> 
> Yes.  But we are talking about the interface between the image and the
> VM, not the image and the web.  Not every image need to have a web
> browser that understands arbitrary encodings.

  Of course...  But, you agree with that some applications need
in-image conversion?  That was my point.  Surely, not every image need
to have a web browser... who said it needs?

> >   Well, don't worry about it.  Your code won't be affected by the m17n
> > stuff too much.  The ASCII world in Squeak will more or less stays the
> > same.
> 
> It will affect me if I write a primitive that accepts strings as
> arguments.  It will also affect me if my 90 MB type inference image
> stops loading.

  It will, but it will anyway if you choose an internal representaion
other than today's String, which you will need to do anyway.

> And anyway, I care a LOT about Squeak.  I want it to be the best system
> it can be.

  So, who doesn't on this list?

  Now, I really would like to know your suggestion on the internal
representation...

-- Yoshiki



More information about the Squeak-dev mailing list