[squeak-dev] WideString hash is way slower than ByteString hash.

Thu Sep 30 00:22:10 UTC 2010

On Thu, 30 Sep 2010, Igor Stasenko wrote:

> On 15 May 2010 02:43, Igor Stasenko <siguctua at gmail.com> wrote:
>> On 15 May 2010 02:35, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>> The cardinal rule of running benchmarks is to compare apples to apples.
>>>  You've compared apples to oranges, i.e. an optimized reimplementation of
>>> WideString>>hash that eliminates the mapping of codes to characters, against
>>> the vanilla Squeak implementation.  You need to at least compare the NB
>>> implementation against
>>> WideString methods for comparison
>>> fastHash
>>> | stringSize hash low |
>>> stringSize := self size.
>>> hash := ByteString identityHash bitAnd: 16rFFFFFFF.
>>> 1 to: stringSize do: [:pos |
>>> hash := hash + (self wordAt: pos).
>>> "Begin hashMultiply"
>>> low := hash bitAnd: 16383.
>>> hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low)
>>> bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF.
>>> ].
>>> ^ hash
>>> | s n |
>>> s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
>>> n := 100000.
>>> { [1 to: n do: [:i| s fastHash. s fastHash. s fastHash. s fastHash. s
>>> fastHash. s fastHash. s fastHash. s fastHash. s fastHash. s fastHash]]
>>> timeToRun.
>>>  [1 to: n do: [:i| s hash. s hash. s hash. s hash. s hash. s hash. s hash. s
>>> hash. s hash. s hash]] timeToRun. }
>>>      #(829 1254)
>>> ASo your measurements tell us nothing about a general comparison of NB
>>> against the Squeak VM or Cog.  They only demonstrate (unsurprisingly) that a
>>> loop summing integers in an array goes PDQ.  On the other hand my numbers
>>> showed Cog 10x faster than the Squeak interpreter when executing exactly the
>>> same bytecode.
>>
>> Yes, of course you're right.
>> But i didn't compared it with Cog, because i can't.
>> And NB is not for translating bytecodes into native code,
>> it is for authoring a primitives using native code.
>> So, my comparison is how faster it could be , if implemented primitively.
>>
>> I can only imagine, how faster the whole thing would be if we
>> cross-hatch NB and Cog,
>> where Cog will serve as a smalltalk optimizer , and NB will serve for
>> manual optimizations
>> for heavy numerical crunching :)
>>
>
> Its now time to verify things, since now i can run native code under Cog:
>
> |ss s t1 t2 t3 |
> s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
> ss := (String with: $z) , 'abcdefghijklmno'.
>
> 10 timesRepeat: [ s := s , s. ss := ss , ss ].
> self assert: (s hash = s nbHash).
> t1 := [1000 timesRepeat: [s hash]] timeToRun.
> t2 := [1000 timesRepeat: [s nbHash]] timeToRun.
> t3 := [1000 timesRepeat: [ss hash]] timeToRun.
> { t1. t2. t3 }
>
> Squeak VM:
>
> #(7929 89 125)

You should update your image, because WideString's hash is now ~2x faster 
than before with SqueakVM. I get #(3822 119) for t1 and t3.

>
> Cog VM:
>
> #(1256 88 99)
>
>
> However, in tight loops, where loop's payload comparable to function's
> payload itself, Cog is a winner:
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [10000000 timesRepeat: [s hash]] timeToRun.
> 1307
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [ s hash ] bench
> '2,840,000 per second.'
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [10000000 timesRepeat: [s nbHash]] timeToRun.
>
> 1438
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [ s nbHash ] bench
> '2,710,000 per second.'
>
> obviously, because it can't inline a native code generated by NB :)

Can Cog inline message sends? I thought it can't ATM.
Btw, can you send a message from NB code?

Levente

>
> -- 
> Best regards,
> Igor Stasenko AKA sig.
>
>