[Cuis] Sorting Unicode strings (Re: [Unicode] collation sequences
(Re: [squeak-dev] Unicode Support))
EuanM
euanmee at gmail.com
Wed Dec 9 02:45:36 UTC 2015
"The ffi_ligature (U+FB03) is not decomposed, because it has a
compatibility mapping, not a canonical mapping."
Standard's text from Table 7
"what the standard seems to say is, that the ffi ligature is not equivalent to
plain ffi because, if i write a string with the ligature, then it is a
different string than plain "ffi", because both forms are printable.
on the other hand ä and a" (two different encodings for ä) are the same,
because the printed forms are always identical." --Martin
I agree with that interpretation.
I'm struggling to be clear about the consequences for equality testing.
For sorting - every sort order can be created to go along with every
encoding, so I suppose it just depends on which pre-assembled sort
order is used.
On 9 December 2015 at 02:36, Martin Bähr
<mbaehr at email.archlab.tuwien.ac.at> wrote:
> Excerpts from Martin Bähr's message of 2015-12-09 02:43:35 +0100:
>> Excerpts from EuanM's message of 2015-12-09 01:59:43 +0100:
>> > http://www.unicode.org/reports/tr15/#Stable_Code_Points
>> > Table 7, the discussion of Ligatures, (which uses the ligature of
>> > "ffi" as its example)
>>
>> ß is not a ligature of ss, but is a different character.
>
> rereading this, i think i am wrong, in that this has nothing to do with ß vs ss.
>
> looking at the standard i also don't understand your conclusion.
>
> what the standard seems to say is, that the ffi ligature is not equivalent to
> plain ffi because, if i write a string with the ligature, then it is a
> different string than plain "ffi", because both forms are printable.
> on the other hand ä and a" (two different encodings for ä) are the same,
> because the printed forms are always identical.
>
> however that doesn't mean that they are sorted differently.
> german sorting rules for example explicitly state that ß and ss are sorted the same.
> (at least according to wikipedia :-)
> and surely, ffi ligature and ffi are sorted the same too.
>
> greetings, martin.
>
> --
> eKita - the online platform for your entire academic life
> --
> chief engineer eKita.co
> pike programmer pike.lysator.liu.se caudium.net societyserver.org
> secretary beijinglug.org
> mentor fossasia.org
> foresight developer foresightlinux.org realss.com
> unix sysadmin
> Martin Bähr working in china http://societyserver.org/mbaehr/
More information about the Squeak-dev
mailing list
|