[squeak-dev] Re: [Cuis] Sorting Unicode strings (Re: [Unicode] collation sequences (Re: Unicode Support))

Levente Uzonyi leves at caesar.elte.hu
Thu Dec 10 00:31:46 UTC 2015


On Wed, 9 Dec 2015, Dale Henrichs wrote:

>
>
> On 12/09/2015 12:44 AM, Stephan Eggermont wrote:
>> On 08-12-15 22:35, Dale Henrichs wrote:
>>> What I meant is that you can't _always_ use the code point for
>>> collation, i.e., sorting based on the value of code points is not always
>>> correct[1].
>> 
>> I have given up on universal sorting when I learned that dutch libraries 
>> sorting of author names depends on the country of origin of the author. So 
>> if Jan van Beek is dutch he will be sorted under B, while if he's belgian 
>> under V. I haven't checked what happens if the author emigrates, or changes 
>> nationality...
>> 
>> Stephan
>> 
> Well, with ICU (and GemStone's implementation) you can choose which collator 
> to use (Country specific) at the image level or on a comparison by comparison 
> bases ... for example for an indexed collection (Unicode) Strings, you can 
> choose the collator to use for that particular index ... so while it's true 
> that universal sorter is not possible, it is possible to choose a collator 
> that will satisfy a particlar customer ....

I expect my image to compare strings using the codepoint-based (+language 
tags) lexicographical method, because it's simple, deterministic and fast.
Imagine having failing tests just because your image uses different 
default comparison methods based on some (external) parameter...
It's also a nightmare to find out why your program is slow on some 
machine, while it's fast on another.

Levente

>
> Dale
>
>
>


More information about the Squeak-dev mailing list