[Vm-dev] Re: [squeak-dev] Re: [Cuis] Sorting Unicode strings (Re: [Unicode] collation sequences (Re: Unicode Support))

Tue Dec 15 21:46:24 UTC 2015

Hi Dale,

On Thu, Dec 10, 2015 at 12:27 PM, Dale Henrichs <
dale.henrichs at gemtalksystems.com> wrote:

>
>
> On 12/09/2015 04:31 PM, Levente Uzonyi wrote:
>
>> On Wed, 9 Dec 2015, Dale Henrichs wrote:
>>
>>
>>>
>>> On 12/09/2015 12:44 AM, Stephan Eggermont wrote:
>>>
>>>> On 08-12-15 22:35, Dale Henrichs wrote:
>>>>
>>>>> What I meant is that you can't _always_ use the code point for
>>>>> collation, i.e., sorting based on the value of code points is not
>>>>> always
>>>>> correct[1].
>>>>>
>>>>
>>>> I have given up on universal sorting when I learned that dutch
>>>> libraries sorting of author names depends on the country of origin of the
>>>> author. So if Jan van Beek is dutch he will be sorted under B, while if
>>>> he's belgian under V. I haven't checked what happens if the author
>>>> emigrates, or changes nationality...
>>>>
>>>> Stephan
>>>>
>>>> Well, with ICU (and GemStone's implementation) you can choose which
>>> collator to use (Country specific) at the image level or on a comparison by
>>> comparison bases ... for example for an indexed collection (Unicode)
>>> Strings, you can choose the collator to use for that particular index ...
>>> so while it's true that universal sorter is not possible, it is possible to
>>> choose a collator that will satisfy a particlar customer ....
>>>
>>
>> I expect my image to compare strings using the codepoint-based (+language
>> tags) lexicographical method, because it's simple, deterministic and fast.
>> Imagine having failing tests just because your image uses different
>> default comparison methods based on some (external) parameter...
>> It's also a nightmare to find out why your program is slow on some
>> machine, while it's fast on another.
>>
>
> When we implemented the Unicode support in GemStone we preserved the
> legacy string classes and  their legacy behavior ... We added new Unicode*
> classes with the new collator-based behavior for sorting and comparison ...
> That way legacy applications (and legacy) tests were not impacted by the
> choice of  collator ... And folks could choose whether or not their
> application would benefit by the use of the new Unicode* classes....
>
> The ICU library performance is actually comparable to our original
> implementations, so there isn't a noticeable performance difference - we
> built the support into our vm and if folks are interested in some of the
> gory technical details, we'd be willing to share our experience, as there
> are several things that we did to minimize potential performance impacts ---
>

Just so you know, I will dig my heels in as deeply as I am able to prevent
the use of C++ libraries in the VM.  It destroys the simulator, which is
the most important thing we have for VM development productivity.  As far
as I'm concerned any use of external libraries to implement core
functionality kills the VM-in-Smalltalk concept that Squeak (and Pharo) are
built upon.  So for me it's a non-starter.  I hope others agree.

_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20151215/48354678/attachment-0001.htm