[Newbies] Re: Character #asciiValue vs #charCode

nicolas cellier nicolas.cellier.aka.nice at gmail.com
Fri Jan 7 20:51:15 UTC 2011


Sean P. DeNigris <sean <at> clipperadams.com> writes:

> 
> 
> For Character, what is the difference between #asciiValue and #charCode (=
> #asciiValue bitAnd: 16r3FFFFF)?
> 
> Thanks.
> Sean

#asciiValue suggests the character is encoded in ASCII.
But hey, it's not general ! What is the ASCII code of é ?
It can be used by legacy code dating from ages...
...when Smalltalk characters were all in the ASCII set.
All ? well, but the left and up arrow maybe ;)

The modern replacement of #asciiValue is #charCode.

So, what the hell means this bitAnd: 16r3FFFFF ?
Well, because in Squeak Character encoding, bits above don't encode the
character by itself but the so called #leadingChar. This leadingChar holds
information about the environment and the encoding which should be used to
interpret the charCode.

In fact, the charCode will most likely return a unicode code point
(http://en.wikipedia.org/wiki/ISO/CEI_10646), except if leadingChar ~= 0, which
can be the case for some east-asian languages environments.

Note that a previous replacement - #codePoint - appears unsent...
This codePoint does not deal with leadingChar, so i'm not sure it's correct.

Hope it helps.

Nicolas



More information about the Beginners mailing list