[squeak-dev] leadingChar question

Thu Apr 21 23:30:50 UTC 2011

Hi,

I think we found a bug, but I'm interested in your opinion before "fixing"
it. Some TextConverters (e.g. ISO88592TextConverter) implement 
#leadingChar. The problem is that this #leadingChar is added to all 
decoded characters. Since character equality takes leadingChar into 
account, these decoded characters will never be equal to unicode 
characters. The following example returns false, because the carriage 
return (13) will be decoded as (Character value: 58720269):

(String cr convertFromWithConverter: ISO88592TextConverter new) = String cr

The current system (Collections, Compiler, etc) assumes that the first 256 
characters are unique and doesn't care about the variants of these 
characters which have non-zero leadingChar.

So, I think we should change Character class >> #leadingChar:code: to 
ignore it's first argument, when the second is less than 256.

Also, I think only TextConverters of CJKV languages should implement 
#leadingChar, because AFAIK only the characters of those languages are 
unified.

What do you think?

Cheers,
Levente