[squeak-dev] how to create an UTF-8 character
K. K. Subramaniam
subbukk at gmail.com
Sat Sep 27 11:44:30 UTC 2008
On Saturday 27 Sep 2008 11:45:38 am stephane ducasse wrote:
> Why do I get a visual representation? How the mapping is done from the
> unicode to the glyph.
Unicode codepoints are processed by a shaping engine to generate a graphic.
The term 'glyph' (carving in Greek) is historical since typefaces were carved
from metal. The shaping engine is trivial in the case of Latin-1 character
set. The first 256 code points are same as Extended ASCII and the graphic can
be looked up in a font table. Rendering "hello" on the screen involves
extracting the box dimensions and graphic of h, e, l, o from a font table,
laying out five boxes and then rendering appropriately into the five boxes.
Other languages have thousands of such graphics (pictals?) and the rendering
algorithms are complex enough to require a shaping engine with pluggable
rendering algorithms. google for Dr. Yannis Haralambous works for details.
> Should we always passed via a transformation?
UTF-8 is recommended when passing Unicode strings across programs and machines
for the sake of backward compatibility. Within a program, the choice of
encoding depends on the string handling requirements. For instance, if a
program deals with palindromes, then an encoding for "rés" like:
<r> <grave> <e> <s>
will break current algorithms that just reverse the string of codepoints.
> How the encodings schema (UTF-*) associates a code point to its glyph?
The Unicode sequence "hello world" transformed into UTF-8 is same as its
Extended ASCII encoding. The process is more involved for Asian languages, so
a separate shaping engine is required. Examples are Pango, Qt shaping engine,
Uniscribe etc.
Regards .. Subbu
More information about the Squeak-dev
mailing list
|