[Newbies] Re: Character #asciiValue vs #charCode
nicolas.cellier.aka.nice at gmail.com
Fri Jan 7 20:51:15 UTC 2011
Sean P. DeNigris <sean <at> clipperadams.com> writes:
> For Character, what is the difference between #asciiValue and #charCode (=
> #asciiValue bitAnd: 16r3FFFFF)?
#asciiValue suggests the character is encoded in ASCII.
But hey, it's not general ! What is the ASCII code of é ?
It can be used by legacy code dating from ages...
...when Smalltalk characters were all in the ASCII set.
All ? well, but the left and up arrow maybe ;)
The modern replacement of #asciiValue is #charCode.
So, what the hell means this bitAnd: 16r3FFFFF ?
Well, because in Squeak Character encoding, bits above don't encode the
character by itself but the so called #leadingChar. This leadingChar holds
information about the environment and the encoding which should be used to
interpret the charCode.
In fact, the charCode will most likely return a unicode code point
(http://en.wikipedia.org/wiki/ISO/CEI_10646), except if leadingChar ~= 0, which
can be the case for some east-asian languages environments.
Note that a previous replacement - #codePoint - appears unsent...
This codePoint does not deal with leadingChar, so i'm not sure it's correct.
Hope it helps.
More information about the Beginners