Unicode support (File names was Re: Warning: Large Babel
translation)
Yoshiki Ohshima
Yoshiki.Ohshima at acm.org
Mon Nov 17 22:38:21 UTC 2003
Lex,
> > Another important point is that we'll need the in image conversion
> > anyway. Again, we don't want to use UTF-8 for the internal
> > representation, the internal string has to be converted before passed
> > to primitives. (So, what kind of data structure do you imagine to use
> > as the internal representation?)
>
> Ack! We do *not* need in-image conversion. Doesn't it disturb you
> that a minimal language like Smalltalk might end up being *required* to
> carry around translation tables for any encoding a VM might request? It
> bothers me deeply and is the crux of my disturbance with this idea. I
> would very much like to have simple images be possible which are not
> fully multinationalized. Even more, I would like images to not be
> required to dynamically load code beacuse they are running on a new
> VM.
You can make a minimal image if you like. Also, you won't be
required to dynamically load code because they are merely running on a
new VM. You'll be required only when your code or the text in your
image require those table.
> To contrast, we certainly do need translation *in the VM* on some
> platforms. For example, different filesystems can use different
> encodings for the filename, and so the problem can't simply be ducked to
> the image.
What is the example for this? If you mount a Windows file system to
Mac OS, the OS converts the path names so the VM won't have to deal
with the difference of filesystems.
> At best, I can imagine allowing the image and VM to negotiate a
> different encoding under some circumstances, as a performance
> improvement. But it would be nice if there is a simple interface
> available for images that don't care.
Just declaring "UTF-8 is the only one" wouldn't be the solution.
> > Also, if you write a program that access a web server (hehe, you
> > did, actually), the code that the server returns can be anything. You
> > need to convert the response from the server to the internal
> > representation before render it.
>
> Yes. But we are talking about the interface between the image and the
> VM, not the image and the web. Not every image need to have a web
> browser that understands arbitrary encodings.
Of course... But, you agree with that some applications need
in-image conversion? That was my point. Surely, not every image need
to have a web browser... who said it needs?
> > Well, don't worry about it. Your code won't be affected by the m17n
> > stuff too much. The ASCII world in Squeak will more or less stays the
> > same.
>
> It will affect me if I write a primitive that accepts strings as
> arguments. It will also affect me if my 90 MB type inference image
> stops loading.
It will, but it will anyway if you choose an internal representaion
other than today's String, which you will need to do anyway.
> And anyway, I care a LOT about Squeak. I want it to be the best system
> it can be.
So, who doesn't on this list?
Now, I really would like to know your suggestion on the internal
representation...
-- Yoshiki
More information about the Squeak-dev
mailing list
|