[squeak-dev] how to create an UTF-8 character

Sat Sep 27 11:44:30 UTC 2008

On Saturday 27 Sep 2008 11:45:38 am stephane ducasse wrote:
> Why do I get a visual representation? How the mapping is done from the  
> unicode to the glyph.
Unicode codepoints are processed by a shaping engine to generate a graphic. 
The term 'glyph' (carving in Greek) is historical since typefaces were carved 
from metal. The shaping engine is trivial in the case of Latin-1 character 
set. The first 256 code points are same as Extended ASCII and the graphic can 
be looked up in a font table. Rendering "hello" on the screen involves 
extracting the box dimensions and graphic of h, e, l, o from a font table, 
laying out five boxes and then rendering appropriately into the five boxes. 
Other languages have thousands of such graphics (pictals?) and the rendering 
algorithms are complex enough to require a shaping engine with pluggable 
rendering algorithms. google for Dr. Yannis Haralambous works for details.

> Should we always passed via a transformation?
UTF-8 is recommended when passing Unicode strings across programs and machines 
for the sake of backward compatibility. Within a program, the choice of 
encoding depends on the string handling requirements. For instance, if a 
program deals with palindromes, then an encoding for "rés" like:
   <r> <grave> <e> <s>
will break current algorithms that just reverse the string of codepoints.

> How the encodings schema (UTF-*) associates a code point to its glyph?
The Unicode sequence "hello world" transformed into UTF-8 is same as its 
Extended ASCII encoding. The process is more involved for Asian languages, so 
a separate shaping engine is required. Examples are Pango, Qt shaping engine, 
Uniscribe etc.

Regards .. Subbu