[squeak-dev] Re: Problem with typing Czech characters in Squeak
3.10 on Ubuntu 9.04
Yoshiki Ohshima
yoshiki at vpri.org
Mon Aug 17 22:36:12 UTC 2009
At Tue, 18 Aug 2009 10:12:21 +1200,
Michael van der Gulik wrote:
>
> Assuming the Unicode characters 97 ("a") followed by 301 (composing ') in a String, should the correct behaviour be to
> consider this one character or two?
>
> Given the String 'xxa'xx' (where "a" is Unicode #97 and the middle ' is Unicode #301), would "String at: 3" return a
> single composed character or uncomposed character?
>
> Or should Unicode-able Strings not be indexable at all to completely circumvent issues like this?
Unicode string can be indexable, but basically don't expect to get a
useful "character" (displayable, comparable, and etc.) always. What
you get back is a code point, not a character. For comparison and
other purposes, you need to "normalize" the string first, and result
can be a single composed character or uncomposed character.
However, when do you need "aString at: 3"? From the Squeak point of
view, as long as some relationship is satisfied (like #at: agrees with
#size), a random access indexing is rarely needed, and if there is, it
would need some closer attention.
-- Yoshiki
More information about the Squeak-dev
mailing list
|