m17n Squeak
Yoshiki.Ohshima at acm.org
Yoshiki.Ohshima at acm.org
Tue Feb 25 02:43:22 UTC 2003
Hello,
First of all, I made up a web site based on the draft of our paper
about the multilingualized Squeak. If you are curious, take a look at:
http://www.is.titech.ac.jp/~ohshima/squeak/m17npaper/index.html
The actual image I'm working on is now *too* flux to release.
However, it looks like I'm not going to have time to round up it for
next a few months, so it might make sense to send this version, along
with a "to-do" list, to whom really curious.
Below is a little comment on this topic.
* I think adopting an "Unicode-based" character representation is
doable and not a too big step.
* Regarding the MacRoman vs. Latin-1 issue, I would vote for
internal latin-1. I actually have bit-editted and re-ordered the
NewYork fonts to make it compatible with Latin-1. It turned out
that I'm not a good font designer, it wasn't too difficult task
anyway.
(I did this almost four years ago and proposed to move to latin-1
internal encoding, because I thought this is entirely reasonable,
but it didn't get too much attntion that time...)
* The VM doesn't have to be modified. It is quite possible that the
InputSensor and/or HandMorph take care of the encoding translation
based on the version of VM it is running on. This is also true
for multi-octet character handling. If you do it in VM, this
would mean that you need to add a hard-wired table and logic to
the VM. Which is not usually desiable.
I don't have the character tables in handy so I might be wrong,
but is there any MacRoman character you can input from keyboard
which is not in the latin-1? I'm guessing not and if so, it
wouldn't cause too much trouble if we change the internal encoding
to latin-1.
On Japanese Unix, the characters passed from the OS to Squeak VM
is *usually* in EUC encoding. It can be in SJIS or UTF-8
depending on the user setting. So, this is another complicated
thing you would not want to hook up with the keyboard input
handing logic in VM.
* I really don't know about the licensing issue around AccuFont.
The font I'm using for my experiment is EFont
(http://openlab.jp/efont/index.html.en). It is fixed-width font
and not quite nice, but anyway it let me go forward.
* But, remember, there will never be a such thing like "complete
glyph set for Unicode".
* While Unicode defines many "algorithms" around text processing,
not all of them doesn't have to be implemented. Some of the
design doesn't look nice or necessary.
* Existing text in image can be an issue. But you can convert them
at once, and there should not be too much trouble...
* From Ed's email...
I can imagine some part of character processing routine in Nihongo
images were fearfully slow, but I would say it is not a big deal,
if I'm asked now. (I'm sorry Ed-san, but I can't quite remember
what exactly I said.) We can make it faster in one way or
another.
"having native Unicode might make text-processing, searching,
language parsing and web serving a more easy proposition for
people in multi-byte character environments."
I don't know if this is true...
* CM fonts would be nice, but I have never tried to render them into
something like 12 pixel high bitmap. How would it look?
-- Yoshiki
More information about the Squeak-dev
mailing list
|