[Vm-dev] Re: [squeak-dev] Re: [Pharo-dev] String >> #=

Chris Muller asqueaker at gmail.com
Wed May 28 01:49:41 UTC 2014


> On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) <juanlists at jvuletich.org> wrote:
>>
>> Quoting Eliot Miranda <eliot.miranda at gmail.com>:
>>
>> Hi Phillipe,
>>
>>
>> On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall <philippe.marschall at netcetera.ch> wrote:
>>>
>>> Hi
>>>
>>> I have been investigating why Dictionary look up performance with String keys is not as good as I would expected. Something I noted is that String >> #= is implemented in terms of #compare:with:collated:. There is no short circuit if Strings are not the same size. In my case some Strings have the same prefix but a different length eg 'Content-Type' and 'Content-Length'. In that case a #compare:with:collated: is performed even though we know in advance the answer will be false because they have different sizes.
>>
>>
>> Why not rewrite
>>
>> String>>= aString
>> "Answer whether the receiver sorts equally as aString.
>> The collation order is simple ascii (with case differences)."
>> aString isString ifFalse: [ ^ false ].
>> ^ (self compare: self with: aString collated: AsciiOrder) = 2
>>
>> as
>>
>> String>>= aString
>> "Answer whether the receiver sorts equally as aString.
>> The collation order is simple ascii (with case differences)."
>>
>> (aString isString
>> and: [self size = aString size]) ifFalse: [^false].
>> ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2
>>
>> ?
>
>
> This makes a huge difference, over 3 times faster:
>
> | bs t1 t2 |
> bs := ByteString allInstances first: 10000.
> t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
> (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn.
> t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
> { t1. t2 } #(13726 4467)
> 4467 - 13726 / 137.26 -67.46%
>>
>>
>> One /could/ add a replacement compare:with:collated: primitive primitiveCompareString which took the sizes as arguments to avoid asking twice.  But it wouldn't be safe.  One could abuse the primitive and lie about the size.  So I suspect it is best to add the size check to String>>#= and accept the duplication of the primitive finding the sizes of the two strings.  The cost in the primitive is minimal.  A WideString version of the primitive might pay its way, but if Spur and Sista arrive soon the primitive shouldn't be faster than the optimised Smalltalk code.
>> --
>> best,
>> Eliot
>>
>> BTW, any good reason for not prefixing all the implementors of #= with this?
>>
>> "Any object is equal to itself"
>> self == argument ifTrue: [ ^ true ].
>
>
> It doesn't make much difference:
>
> | bs t1 t2 |
> bs := ByteString allInstances first: 10000.
> t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
> (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn.
> t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
> { t1. t2 } #(4628 4560)
>
> 4560 - 4628 / 46.28 -1.47%
>
> So is it worth it?  If you feel it is I've no objection other than it feels a little kludgey for such little benefit.

That is a very common convention in the image, not kludgey.
String>>#= should advertise, by its implementation, its desire and
intent for maximum performance.  Not covering this check conveys a
lack of thoroghness and/or dedication to performance.  Future readers
of the method will wonder why such a simple avoidance of unnecessary
processing for that case wasn't taken.  By the time they wonder
whether it was an oversight they've already expended too much thought
about it.

We should really consider Andres' question before popping this into
trunk though..

> And there are the Symbols if one needs quick comparison and can bear the cost of slow interning.



> --
> best,
> Eliot
>


More information about the Vm-dev mailing list