[Newbies] Re: Character #asciiValue vs #charCode

Andreas Raab andreas.raab at gmx.de
Fri Jan 7 21:10:29 UTC 2011


On 1/7/2011 9:51 PM, nicolas cellier wrote:
> So, what the hell means this bitAnd: 16r3FFFFF ?
> Well, because in Squeak Character encoding, bits above don't encode the
> character by itself but the so called #leadingChar. This leadingChar holds
> information about the environment and the encoding which should be used to
> interpret the charCode.

The background of which is Han unification 
(http://en.wikipedia.org/wiki/Han_unification). The language environment 
(encoded in the upper bits) disambiguates the character if necessary.

Cheers,
   - Andreas

> In fact, the charCode will most likely return a unicode code point
> (http://en.wikipedia.org/wiki/ISO/CEI_10646), except if leadingChar ~= 0, which
> can be the case for some east-asian languages environments.
>
> Note that a previous replacement - #codePoint - appears unsent...
> This codePoint does not deal with leadingChar, so i'm not sure it's correct.
>
> Hope it helps.
>
> Nicolas



More information about the Beginners mailing list