Squeak multilingualization on SqueakMap
Boris Gaertner
Boris.Gaertner at gmx.net
Sun Apr 6 19:40:26 UTC 2003
<Yoshiki.Ohshima at acm.org> wrote:
(on Wednesday, April 02, 2003 8:09 PM)
> Hello,
>
> I found it awkward to keep saying "I have no time to do this," so I
> made an SAR package for the multilingualization.
>
> I only test it on vanilla 3.4 image, and some features are not
> implemented. Also, I anticipated format changes on the .changes file,
> so I don't know if I can keep going along with this SAR style
> installer. (Well, it is also true that we can do whatever we want in
> Squeak, so there will be always a workaround this, though.)
>
> The fixes from Boris are included and the workspace that appears at
> the end of installation shows the example code from him. Thank you
> Boris.
>
> As always, any comments and suggestions are welcome,
>
> -- Yoshiki
>
The move to 3.4 is a very pleasant progress: In Squeak 3.2,
the debugger does not work properly in MVC - this is a
serious problem for MVC users and it was fixed in 3.3.
The SAR installation package works excellent - thank you
for making it available.
Now some words about my plans to experiment with and
to hopefully contribute to your work:
My short-termed interests are additional fonts and additions
to Scamper. At this moment I try to adapt a font editor to your
font representations and I think that I will finish this soon.
Fonts:
As to the fonts, there are some really good free bdf-fonts available
in the internet. It is entirely possible to find all glyphs of the
blocks 'CJK Unified Ideographs' and 'Hangul Syllables' in the
web. (in one size only, but for the beginning that is sufficient.)
At http://www.bgaertner.gmxhome.de/UnicodeResources.htm
you find details and code that can be used to load large bdf-fonts
into a 3.4 image.
I loaded the ClearlyU font and the cmex24m.bdf font into a
Squeak 3.4 image. To do that, I used code that splits these
large fonts into many StrikeFonts. Glyphs from U+4E00 to
U+4EFF are placed into one StrikeFont, glyphs from
U+4F00 to U+4FFF into a different StrikeFont and so on.
This is not what we really need, but at least I can use my
font editor to look at the fonts.
What I want to do next is loading these fonts into
multilingualized Squeak. I think that I will need some
additional subclasses of class Unicode to do this.
Currently the class Unicode does not have subclasses for
these glyph blocks:
UnicodeHangulSyllables
UnicodeKangXi Radicals
UnicodeCJKRadicalsSupplement
UnicodeBoPoMoFo
UnicodeHangulJamoCompatibility
UnicodeCJKUnifiedIdeographs
UnicodeCJKUnifiedIdeographsExtensionA
UnicodeCJKUnifiedIdeographsExtensionB
The absence of classes for these blocks is not a surprise,
because your support for these writings is currently based
on encodings like GB2312 and KSX1001.
A few words about the usefulnes of these blocks:
CJK Unified Ideographs - obvious!
Hangual Syllables - obvious!
KangXi Radicals
useful for support tools that show CJK ideographs
in a "radical + additional strokes" order. All data for this
ordering can be found in the Unicode support file
UniHan.txt
HangulJamoCompatibility
useful for support tools that allow the selection of a
hangul syllable by its choseong, its jungseong and a jongseong.
The algorithm that is needed to do this is described in chapter
3.11 of the Unicode Documentation.
BoPoMoFo
useful for support tools that allow the selection of
an ideograph based on its mandarin pronounciation.
Pronounciations can be found in the file UniHan.txt
Now my questions:
1. Will we have subclasses for these Unicode blocks?
2. What leading chars will be assigned to these blocks?
-- Boris
More information about the Squeak-dev
mailing list
|