Squeak multilingualization on SqueakMap

Boris Gaertner Boris.Gaertner at gmx.net
Sun Apr 6 19:40:26 UTC 2003


<Yoshiki.Ohshima at acm.org> wrote:
(on Wednesday, April 02, 2003 8:09 PM)

>   Hello,
> 
>   I found it awkward to keep saying "I have no time to do this," so I
> made an SAR package for the multilingualization.
> 
>   I only test it on vanilla 3.4 image, and some features are not
> implemented.  Also, I anticipated format changes on the .changes file,
> so I don't know if I can keep going along with this SAR style
> installer.  (Well, it is also true that we can do whatever we want in
> Squeak, so there will be always a workaround this, though.)
> 
>   The fixes from Boris are included and the workspace that appears at
> the end of installation shows the example code from him.  Thank you
> Boris.
> 
>   As always, any comments and suggestions are welcome,
> 
> -- Yoshiki
> 
The move to 3.4 is a very pleasant progress: In Squeak 3.2,
the debugger does not work properly in MVC - this is a 
serious problem for MVC users and it was fixed in 3.3.
The SAR installation package works excellent - thank you
for making it available.

Now some words about my plans to experiment with and
to hopefully contribute to your work:

My short-termed interests are additional fonts and additions
to Scamper. At this moment I try to adapt a font editor to your
font representations and I think that I will finish this soon. 

Fonts:
As to the fonts, there are some really good free bdf-fonts available
in the internet. It is entirely possible to find all glyphs of the
blocks 'CJK Unified Ideographs' and 'Hangul Syllables' in the
web. (in one size only, but for the beginning that is sufficient.)
At http://www.bgaertner.gmxhome.de/UnicodeResources.htm
you find details and code that can be used to load large bdf-fonts
into a 3.4 image.
I loaded the ClearlyU font and the cmex24m.bdf  font into a
Squeak 3.4 image. To do that, I used code that splits these
large fonts into many StrikeFonts. Glyphs from U+4E00 to
U+4EFF are placed into one StrikeFont, glyphs from
U+4F00 to U+4FFF into a different StrikeFont and so on.
This is not what we really need, but at least I can use my
font editor to look at the fonts.

What I want to do next is loading these fonts into 
multilingualized Squeak. I think that I will need some
additional subclasses of class Unicode to do this.

Currently the class Unicode does not have subclasses for
these glyph blocks:
  UnicodeHangulSyllables
  UnicodeKangXi Radicals
  UnicodeCJKRadicalsSupplement
  UnicodeBoPoMoFo
  UnicodeHangulJamoCompatibility
  UnicodeCJKUnifiedIdeographs
  UnicodeCJKUnifiedIdeographsExtensionA
  UnicodeCJKUnifiedIdeographsExtensionB

The absence of classes for these blocks is not a surprise,
because your  support for these writings is currently based 
on encodings like GB2312 and KSX1001.

A few words about the usefulnes of these blocks:
  CJK Unified Ideographs    -  obvious!
  Hangual Syllables      - obvious!
  KangXi Radicals
      useful for support tools that show CJK ideographs
      in a "radical + additional strokes" order. All data for this
      ordering can be found in the Unicode support file
      UniHan.txt
  HangulJamoCompatibility
      useful for support tools that allow the selection of a
      hangul syllable by its choseong, its jungseong and a jongseong.
      The algorithm that is needed to do this is described in chapter
      3.11 of the Unicode Documentation.
  BoPoMoFo
      useful for support tools that allow the selection of
      an ideograph based on its mandarin pronounciation.
      Pronounciations can be found in the file  UniHan.txt 

Now my questions: 
1. Will we have subclasses for these Unicode blocks? 
2. What leading chars will be assigned to these blocks?


-- Boris





More information about the Squeak-dev mailing list