[squeak-dev] how to create an UTF-8 character

Mon Sep 29 18:24:36 UTC 2008

Am 29.09.2008 um 11:11 schrieb Norbert Hartl:

> On Mon, 2008-09-29 at 18:53 +0200, stephane ducasse wrote:
>>>>>> Am I the only one using the generic en/decoding functionality in
>>>>> Squeak in the form of #convertTo/FromEncoding?
>>>>>
>>>>> Convert from "Squeak" to UTF-8
>>>>> aString convertToEncoding: 'utf-8'
>>>>
>>>>
>>>> do I understand correctly that such a aString is a sequence of
>>>> unicode
>>>> codepoints?
>>>>>
>>> At first the utf-8 is a sequence of bytes. These bytes are a space
>>> optimzed encoding of a code point (utf-8). If you decode those bytes
>>> you get your code point (unicode). From a sequence of code points
>>> you can derive a character. In most cases (for us westerners) it  
>>> will
>>> be a single code point AFAIK.
>>
>> I'm trying to really understand in Squeak. :)
>> What we call character is what then?
>> Is it a codepoint? or the looked up glyph in a font table?
>>
> I don't know. I've never dealt with how squeak does those things

A character represents a single code point. A font maps code points to  
glyphs.

A character also encodes a language-tag (a.k.a. leading char) but we  
all seem to agree that's a bad idea, it was done to allow easier  
migration of old code (for many eastern languages a code point and a  
font is not enough for rendering, you also need to know the language).

- Bert -