[squeak-dev] Re: [Cuis] Sorting Unicode strings (Re: [Unicode]
collation sequences (Re: Unicode Support))
Levente Uzonyi
leves at caesar.elte.hu
Thu Dec 10 00:31:46 UTC 2015
On Wed, 9 Dec 2015, Dale Henrichs wrote:
>
>
> On 12/09/2015 12:44 AM, Stephan Eggermont wrote:
>> On 08-12-15 22:35, Dale Henrichs wrote:
>>> What I meant is that you can't _always_ use the code point for
>>> collation, i.e., sorting based on the value of code points is not always
>>> correct[1].
>>
>> I have given up on universal sorting when I learned that dutch libraries
>> sorting of author names depends on the country of origin of the author. So
>> if Jan van Beek is dutch he will be sorted under B, while if he's belgian
>> under V. I haven't checked what happens if the author emigrates, or changes
>> nationality...
>>
>> Stephan
>>
> Well, with ICU (and GemStone's implementation) you can choose which collator
> to use (Country specific) at the image level or on a comparison by comparison
> bases ... for example for an indexed collection (Unicode) Strings, you can
> choose the collator to use for that particular index ... so while it's true
> that universal sorter is not possible, it is possible to choose a collator
> that will satisfy a particlar customer ....
I expect my image to compare strings using the codepoint-based (+language
tags) lexicographical method, because it's simple, deterministic and fast.
Imagine having failing tests just because your image uses different
default comparison methods based on some (external) parameter...
It's also a nightmare to find out why your program is slow on some
machine, while it's fast on another.
Levente
>
> Dale
>
>
>
More information about the Squeak-dev
mailing list
|