[Cuis] Sorting Unicode strings (Re: [Unicode] collation sequences (Re: [squeak-dev] Unicode Support))

EuanM euanmee at gmail.com
Wed Dec 9 02:45:36 UTC 2015


"The ffi_ligature (U+FB03) is not decomposed, because it has a
compatibility mapping, not a canonical mapping."
Standard's text from Table 7

"what the standard seems to say is, that the ffi ligature is not equivalent to
plain ffi because, if i write a string with the ligature, then it is a
different string than plain "ffi",  because both forms are printable.
on the other hand ä and a" (two different encodings for ä) are the same,
because the printed forms are always identical."  --Martin

I agree with that interpretation.

I'm struggling to be clear about the consequences for equality testing.

For sorting - every sort order can be created to go along with every
encoding, so I suppose it just depends on which pre-assembled sort
order is used.



On 9 December 2015 at 02:36, Martin Bähr
<mbaehr at email.archlab.tuwien.ac.at> wrote:
> Excerpts from Martin Bähr's message of 2015-12-09 02:43:35 +0100:
>> Excerpts from EuanM's message of 2015-12-09 01:59:43 +0100:
>> > http://www.unicode.org/reports/tr15/#Stable_Code_Points
>> > Table 7, the discussion of Ligatures, (which uses the ligature of
>> > "ffi" as its example)
>>
>> ß is not a ligature of ss, but is a different character.
>
> rereading this, i think i am wrong, in that this has nothing to do with ß vs ss.
>
> looking at the standard i also don't understand your conclusion.
>
> what the standard seems to say is, that the ffi ligature is not equivalent to
> plain ffi because, if i write a string with the ligature, then it is a
> different string than plain "ffi",  because both forms are printable.
> on the other hand ä and a" (two different encodings for ä) are the same,
> because the printed forms are always identical.
>
> however that doesn't mean that they are sorted differently.
> german sorting rules for example explicitly state that ß and ss are sorted the same.
> (at least according to wikipedia :-)
> and surely, ffi ligature and ffi are sorted the same too.
>
> greetings, martin.
>
> --
> eKita                   -   the online platform for your entire academic life
> --
> chief engineer                                                       eKita.co
> pike programmer      pike.lysator.liu.se    caudium.net     societyserver.org
> secretary                                                      beijinglug.org
> mentor                                                           fossasia.org
> foresight developer  foresightlinux.org                            realss.com
> unix sysadmin
> Martin Bähr          working in china        http://societyserver.org/mbaehr/


More information about the Squeak-dev mailing list