Am 29.09.2008 um 11:11 schrieb Norbert Hartl:
On Mon, 2008-09-29 at 18:53 +0200, stephane ducasse wrote:
Am I the only one using the generic en/decoding functionality in
Squeak in the form of #convertTo/FromEncoding?
Convert from "Squeak" to UTF-8 aString convertToEncoding: 'utf-8'
do I understand correctly that such a aString is a sequence of unicode codepoints?
At first the utf-8 is a sequence of bytes. These bytes are a space optimzed encoding of a code point (utf-8). If you decode those bytes you get your code point (unicode). From a sequence of code points you can derive a character. In most cases (for us westerners) it will be a single code point AFAIK.
I'm trying to really understand in Squeak. :) What we call character is what then? Is it a codepoint? or the looked up glyph in a font table?
I don't know. I've never dealt with how squeak does those things
A character represents a single code point. A font maps code points to glyphs.
A character also encodes a language-tag (a.k.a. leading char) but we all seem to agree that's a bad idea, it was done to allow easier migration of old code (for many eastern languages a code point and a font is not enough for rendering, you also need to know the language).
- Bert -