[squeak-dev] Re: [Pharo-dev] String >> #=

Eliot Miranda eliot.miranda at gmail.com
Tue May 27 18:59:00 UTC 2014


On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) <
juanlists at jvuletich.org> wrote:

>  Quoting Eliot Miranda <eliot.miranda at gmail.com>:
>
> Hi Phillipe,
>
>
> On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall <
> philippe.marschall at netcetera.ch> wrote:
>
>> Hi
>>
>> I have been investigating why Dictionary look up performance with String
>> keys is not as good as I would expected. Something I noted is that String
>> >> #= is implemented in terms of #compare:with:collated:. There is no short
>> circuit if Strings are not the same size. In my case some Strings have the
>> same prefix but a different length eg 'Content-Type' and 'Content-Length'.
>> In that case a #compare:with:collated: is performed even though we know in
>> advance the answer will be false because they have different sizes.
>>
>
> Why not rewrite
>
> String>>= aString
> "Answer whether the receiver sorts equally as aString.
> The collation order is simple ascii (with case differences)."
> aString isString ifFalse: [ ^ false ].
> ^ (self compare: self with: aString collated: AsciiOrder) = 2
>
> as
>
>  String>>= aString
> "Answer whether the receiver sorts equally as aString.
> The collation order is simple ascii (with case differences)."
>
> (aString isString
> and: [self size = aString size]) ifFalse: [^false].
> ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2
>
> ?
>
>
This makes a huge difference, over 3 times faster:

| bs t1 t2 |
bs := ByteString allInstances first: 10000.
t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
(FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn.
t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
{ t1. t2 } #(13726 4467)
4467 - 13726 / 137.26 -67.46%

>
> One /could/ add a replacement compare:with:collated:
> primitive primitiveCompareString which took the sizes as arguments to avoid
> asking twice.  But it wouldn't be safe.  One could abuse the primitive and
> lie about the size.  So I suspect it is best to add the size check to
> String>>#= and accept the duplication of the primitive finding the sizes of
> the two strings.  The cost in the primitive is minimal.  A WideString
> version of the primitive might pay its way, but if Spur and Sista arrive
> soon the primitive shouldn't be faster than the optimised Smalltalk code.
> --
> best,
> Eliot
>
> BTW, any good reason for not prefixing all the implementors of #= with
> this?
>
> "Any object is equal to itself"
> self == argument ifTrue: [ ^ true ].
>

It doesn't make much difference:

| bs t1 t2 |
bs := ByteString allInstances first: 10000.
t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
(FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn.
t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
{ t1. t2 } #(4628 4560)

4560 - 4628 / 46.28 -1.47%

So is it worth it?  If you feel it is I've no objection other than it feels
a little kludgey for such little benefit.  And there are the Symbols if one
needs quick comparison and can bear the cost of slow interning.
-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20140527/63971d01/attachment.htm


More information about the Squeak-dev mailing list