[squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Yoshiki Ohshima yoshiki at vpri.org
Mon Aug 17 22:36:12 UTC 2009


At Tue, 18 Aug 2009 10:12:21 +1200,
Michael van der Gulik wrote:
> 
> Assuming the Unicode characters 97 ("a") followed by 301 (composing ') in a String, should the correct behaviour be to
> consider this one character or two?
> 
> Given the String 'xxa'xx' (where "a" is Unicode #97 and the middle ' is Unicode #301), would "String at: 3" return a
> single composed character or uncomposed character?
> 
> Or should Unicode-able Strings not be indexable at all to completely circumvent issues like this?

  Unicode string can be indexable, but basically don't expect to get a
useful "character" (displayable, comparable, and etc.) always.  What
you get back is a code point, not a character.  For comparison and
other purposes, you need to "normalize" the string first, and result
can be a single composed character or uncomposed character.

  However, when do you need "aString at: 3"?  From the Squeak point of
view, as long as some relationship is satisfied (like #at: agrees with
#size), a random access indexing is rarely needed, and if there is, it
would need some closer attention.

-- Yoshiki




More information about the Squeak-dev mailing list