m17n Squeak

Yoshiki.Ohshima at acm.org Yoshiki.Ohshima at acm.org
Tue Feb 25 02:43:22 UTC 2003


  Hello,

  First of all, I made up a web site based on the draft of our paper
about the multilingualized Squeak.  If you are curious, take a look at:

http://www.is.titech.ac.jp/~ohshima/squeak/m17npaper/index.html

  The actual image I'm working on is now *too* flux to release.
However, it looks like I'm not going to have time to round up it for
next a few months, so it might make sense to send this version, along
with a "to-do" list, to whom really curious.

  Below is a little comment on this topic.

  * I think adopting an "Unicode-based" character representation is
    doable and not a too big step.

  * Regarding the MacRoman vs. Latin-1 issue, I would vote for
    internal latin-1.  I actually have bit-editted and re-ordered the
    NewYork fonts to make it compatible with Latin-1.  It turned out
    that I'm not a good font designer, it wasn't too difficult task
    anyway.

    (I did this almost four years ago and proposed to move to latin-1
    internal encoding, because I thought this is entirely reasonable,
    but it didn't get too much attntion that time...)

  * The VM doesn't have to be modified.  It is quite possible that the
    InputSensor and/or HandMorph take care of the encoding translation
    based on the version of VM it is running on.  This is also true
    for multi-octet character handling.  If you do it in VM, this
    would mean that you need to add a hard-wired table and logic to
    the VM.  Which is not usually desiable.

    I don't have the character tables in handy so I might be wrong,
    but is there any MacRoman character you can input from keyboard
    which is not in the latin-1?  I'm guessing not and if so, it
    wouldn't cause too much trouble if we change the internal encoding
    to latin-1.

    On Japanese Unix, the characters passed from the OS to Squeak VM
    is *usually* in EUC encoding.  It can be in SJIS or UTF-8
    depending on the user setting.  So, this is another complicated
    thing you would not want to hook up with the keyboard input
    handing logic in VM.

  * I really don't know about the licensing issue around AccuFont.
    The font I'm using for my experiment is EFont
    (http://openlab.jp/efont/index.html.en).  It is fixed-width font
    and not quite nice, but anyway it let me go forward.

  * But, remember, there will never be a such thing like "complete
    glyph set for Unicode".

  * While Unicode defines many "algorithms" around text processing,
    not all of them doesn't have to be implemented.  Some of the
    design doesn't look nice or necessary.

  * Existing text in image can be an issue.  But you can convert them
    at once, and there should not be too much trouble...

  * From Ed's email...

    I can imagine some part of character processing routine in Nihongo
    images were fearfully slow, but I would say it is not a big deal,
    if I'm asked now.  (I'm sorry Ed-san, but I can't quite remember
    what exactly I said.)  We can make it faster in one way or
    another.

    "having native Unicode might make text-processing, searching,
    language parsing and web serving a more easy proposition for
    people in multi-byte character environments."

    I don't know if this is true...

  * CM fonts would be nice, but I have never tried to render them into
    something like 12 pixel high bitmap.  How would it look?

-- Yoshiki



More information about the Squeak-dev mailing list