[squeak-dev] Re: [Cuis] Sorting Unicode strings (Re: [Unicode] collation sequences (Re: Unicode Support))

H. Hirzel hannes.hirzel at gmail.com
Wed Dec 9 14:11:35 UTC 2015


Hi Stephan

What you mention is an edge case. A regular case which is not
implemented yet is

language insensitive sorting

http://www.unicode.org/versions/Unicode8.0.0/UnicodeStandard-8.0.pdf
<citation>
In some circumstances, an application may need to do
language-insensitive sorting—that
is, sorting of textual data without consideration of language-specific
cultural expectations about how strings should be ordered.
</citation>

This is currently not the case as there is no normalization. As the
Unicode Character Database [1] is already available  in Squeak / Pharo
(and could easily be loaded into Cuis) the implementation effort for
language-insensitive sorting seems to be in reach without a big
effort.

[1] http://wiki.squeak.org/squeak/6244

Hannes

On 12/9/15, Stephan Eggermont <stephan at stack.nl> wrote:
> On 08-12-15 22:35, Dale Henrichs wrote:
>> What I meant is that you can't _always_ use the code point for
>> collation, i.e., sorting based on the value of code points is not always
>> correct[1].
>
> I have given up on universal sorting when I learned that dutch libraries
> sorting of author names depends on the country of origin of the author.
> So if Jan van Beek is dutch he will be sorted under B, while if he's
> belgian under V. I haven't checked what happens if the author emigrates,
> or changes nationality...
>
> Stephan
>
>
>
>


More information about the Squeak-dev mailing list