Re: [squeak-dev] how to create an UTF-8 character

29 Sep 2008


      Am 29.09.2008 um 11:11 schrieb Norbert Hartl:
...
On Mon, 2008-09-29 at 18:53 +0200, stephane ducasse wrote:
...
...
...
...
...
Am I the only one using the generic en/decoding functionality in
Squeak in the form of #convertTo/FromEncoding?
Convert from "Squeak" to UTF-8
aString convertToEncoding: 'utf-8'
do I understand correctly that such a aString is a sequence of
unicode
codepoints?
...
At first the utf-8 is a sequence of bytes. These bytes are a space
optimzed encoding of a code point (utf-8). If you decode those bytes
you get your code point (unicode). From a sequence of code points
you can derive a character. In most cases (for us westerners) it  
will
be a single code point AFAIK.
I'm trying to really understand in Squeak. :)
What we call character is what then?
Is it a codepoint? or the looked up glyph in a font table?
I don't know. I've never dealt with how squeak does those things
A character represents a single code point. A font maps code points to  
glyphs.
A character also encodes a language-tag (a.k.a. leading char) but we  
all seem to agree that's a bad idea, it was done to allow easier  
migration of old code (for many eastern languages a code point and a  
font is not enough for rendering, you also need to know the language).
- Bert -