m17n Squeak

Tue Feb 25 22:17:27 UTC 2003

Hey everybody. Yoshiki mentions he's going to be busy. We've been
through this scenario before - someone does some important work that
shows the way to get somewhere we want to be, then becomes busy
(discouraged, distracted, whatever), and since nobody knows how to move
it forward, it gets stuck, incomplete.

If you don't want to see this happen to Yoshiki's
Unicode/multinationalization work, read on.

Then either we don't get it (Interval web browser), or someone has to do
it over again, wasting time and effort redoing the others effort
(modules: Joseph -> Henrik -> ...).

We also have counter examples, though - the StarBrowser port to Squeak
wasn't very comfy, it wasn't something I'd use daily, when Roel
contributed the first version. Ned took it up, learned the design, added
his expertise, and now it's useful. 

Let face it - we don't generally have patrons paying for us to Squeak.
This means that real life's demand will often interrupt a project you're
doing, even if lot's of people would really like to see it succeed.

Well, if someone wants us to make the most of Yoshiki's work, help him
out. It's important to get involved while he still has a little time to
pass on the knowledge. It would be especially great if someone can get
involved that knows at least something about, and is interested in the
technical side of multilanguage/Unicode support (the standards, the
terms, the algorithms). He's better qualified than I to say what's to be
done, but here are some things you can do that will help most projects:
* Read/play with all the papes/documentation/help/demos available by the
original contributor.
* Read definite documentation in the field/play with such ideas/tools
outside the Squeak world to get a better perspective on the project,
what it does do, what it decides not to do, and what is left to be done.
* Communicate what's known to the rest of the world - when the
contributor is running out of time, he doesn't have the time to describe
in details emails to the whole world what it does. People coming in to
help move things along should make sure they get technically
knowledgeable about a topic, and then reflect the status to the
community. Just one guy actively seeking knowledge and asking questions
is a much better use of the original contributors time, than us teeming
masses with varied levels of immidiate interest.
* Try to either understand or figure out with the community what's
missing to bring the work to the smallest useful deployment, and try to
get that working. From there, more people will be able to get into the
act.
* Work with the originator and the relevant people (stewards of relevant
 packages, VM maintainers, us Guides especially if it affects the core) to 
identify what might stall the work, and find solutions.

In short, whenever someones pushing the edge and the work might be
interrupted, we need to have project-buddies that'll carry it the next
few miles. They'll have the support of the community, of course, but
someone has to focus on it. 

If anyone cares about this work, knows or is willing to serious study
the topic, and can commit some time to it, pipe up now while Yoshiki
still has some time to pass it on.

Daniel

Yoshiki.Ohshima at acm.org wrote:
>   Hello,
> 
>   First of all, I made up a web site based on the draft of our paper
> about the multilingualized Squeak.  If you are curious, take a look at:
> 
> http://www.is.titech.ac.jp/~ohshima/squeak/m17npaper/index.html
> 
>   The actual image I'm working on is now *too* flux to release.
> However, it looks like I'm not going to have time to round up it for
> next a few months, so it might make sense to send this version, along
> with a "to-do" list, to whom really curious.
> 
>   Below is a little comment on this topic.
> 
>   * I think adopting an "Unicode-based" character representation is
>     doable and not a too big step.
> 
>   * Regarding the MacRoman vs. Latin-1 issue, I would vote for
>     internal latin-1.  I actually have bit-editted and re-ordered the
>     NewYork fonts to make it compatible with Latin-1.  It turned out
>     that I'm not a good font designer, it wasn't too difficult task
>     anyway.
> 
>     (I did this almost four years ago and proposed to move to latin-1
>     internal encoding, because I thought this is entirely reasonable,
>     but it didn't get too much attntion that time...)
> 
>   * The VM doesn't have to be modified.  It is quite possible that the
>     InputSensor and/or HandMorph take care of the encoding translation
>     based on the version of VM it is running on.  This is also true
>     for multi-octet character handling.  If you do it in VM, this
>     would mean that you need to add a hard-wired table and logic to
>     the VM.  Which is not usually desiable.
> 
>     I don't have the character tables in handy so I might be wrong,
>     but is there any MacRoman character you can input from keyboard
>     which is not in the latin-1?  I'm guessing not and if so, it
>     wouldn't cause too much trouble if we change the internal encoding
>     to latin-1.
> 
>     On Japanese Unix, the characters passed from the OS to Squeak VM
>     is *usually* in EUC encoding.  It can be in SJIS or UTF-8
>     depending on the user setting.  So, this is another complicated
>     thing you would not want to hook up with the keyboard input
>     handing logic in VM.
> 
>   * I really don't know about the licensing issue around AccuFont.
>     The font I'm using for my experiment is EFont
>     (http://openlab.jp/efont/index.html.en).  It is fixed-width font
>     and not quite nice, but anyway it let me go forward.
> 
>   * But, remember, there will never be a such thing like "complete
>     glyph set for Unicode".
> 
>   * While Unicode defines many "algorithms" around text processing,
>     not all of them doesn't have to be implemented.  Some of the
>     design doesn't look nice or necessary.
> 
>   * Existing text in image can be an issue.  But you can convert them
>     at once, and there should not be too much trouble...
> 
>   * From Ed's email...
> 
>     I can imagine some part of character processing routine in Nihongo
>     images were fearfully slow, but I would say it is not a big deal,
>     if I'm asked now.  (I'm sorry Ed-san, but I can't quite remember
>     what exactly I said.)  We can make it faster in one way or
>     another.
> 
>     "having native Unicode might make text-processing, searching,
>     language parsing and web serving a more easy proposition for
>     people in multi-byte character environments."
> 
>     I don't know if this is true...
> 
>   * CM fonts would be nice, but I have never tried to render them into
>     something like 12 pixel high bitmap.  How would it look?
> 
> -- Yoshiki