[Newbies] Re: Character #asciiValue vs #charCode
andreas.raab at gmx.de
Fri Jan 7 21:10:29 UTC 2011
On 1/7/2011 9:51 PM, nicolas cellier wrote:
> So, what the hell means this bitAnd: 16r3FFFFF ?
> Well, because in Squeak Character encoding, bits above don't encode the
> character by itself but the so called #leadingChar. This leadingChar holds
> information about the environment and the encoding which should be used to
> interpret the charCode.
The background of which is Han unification
(http://en.wikipedia.org/wiki/Han_unification). The language environment
(encoded in the upper bits) disambiguates the character if necessary.
> In fact, the charCode will most likely return a unicode code point
> (http://en.wikipedia.org/wiki/ISO/CEI_10646), except if leadingChar ~= 0, which
> can be the case for some east-asian languages environments.
> Note that a previous replacement - #codePoint - appears unsent...
> This codePoint does not deal with leadingChar, so i'm not sure it's correct.
> Hope it helps.
More information about the Beginners