[squeak-dev] WideString hash is way slower than ByteString hash.

Fri May 14 23:35:05 UTC 2010

The cardinal rule of running benchmarks is to compare apples to apples.
 You've compared apples to oranges, i.e. an optimized reimplementation of
WideString>>hash that eliminates the mapping of codes to characters, against
the vanilla Squeak implementation.  You need to at least compare the NB
implementation against

WideString methods for comparison
fastHash
| stringSize hash low |
stringSize := self size.
hash := ByteString identityHash bitAnd: 16rFFFFFFF.
1 to: stringSize do: [:pos |
hash := hash + (self wordAt: pos).
"Begin hashMultiply"
low := hash bitAnd: 16383.
hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low)
bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF.
].
^ hash

| s n |
s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
n := 100000.
{ [1 to: n do: [:i| s fastHash. s fastHash. s fastHash. s fastHash. s
fastHash. s fastHash. s fastHash. s fastHash. s fastHash. s fastHash]]
timeToRun.
  [1 to: n do: [:i| s hash. s hash. s hash. s hash. s hash. s hash. s hash.
s hash. s hash. s hash]] timeToRun. }

     #(829 1254)

ASo your measurements tell us nothing about a general comparison of NB
against the Squeak VM or Cog.  They only demonstrate (unsurprisingly) that a
loop summing integers in an array goes PDQ.  On the other hand my numbers
showed Cog 10x faster than the Squeak interpreter when executing exactly the
same bytecode.

best
Eliot

On Fri, May 14, 2010 at 4:13 PM, Igor Stasenko <siguctua at gmail.com> wrote:

> And besides, if someone would bother implementing a primitive
> for hasing the WideString, i doubt that he will go and create
> an instances of Character for each indice, and then read its value and
> only then use
> it for hashing.
> So, i don't think this is cheating. Its just an optimization :)
> Of course, Cog , even if it optimize things cleverly, still has to
> follow a code, and should create a real instances of Character,
> simply because it is written so, and it should honor the language
> semantics.
>
>
> On 15 May 2010 01:53, Igor Stasenko <siguctua at gmail.com> wrote:
> > On 15 May 2010 01:38, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> >> Hi Igor,
> >>     is the NB implementation mapping the character codes in the wide
> strings
> >> into Character objects and taking the character hashes?  If so, very
> cool.
> >>  The NB code is very fast.  If on the other hand you're just
> >> short-circuiting the character code lookup then you're cheating :)
> >>
> > What mapping you have in mind?
> >
> > WideString>>at: index
> >        "Answer the Character stored in the field of the receiver indexed
> by
> > the argument."
> >        ^ Character value: (self wordAt: index).
> >
> > Character class>>value: anInteger
> >        "Answer the Character whose value is anInteger."
> >
> >        anInteger > 255 ifTrue: [^self basicNew setValue: anInteger].
> >        ^ CharacterTable at: anInteger + 1.
> >
> > (Character classPool at: #CharacterTable) withIndexDo: [:ch :i | self
> > assert: (ch asInteger = (i-1))]
> >
> > So, it is 1:1 correspondence between word, stored in wide string (self
> > wordAt: index),
> > and Character value, used for hashing. So, no mapping required.
> >
> >
> > --
> > Best regards,
> > Igor Stasenko AKA sig.
> >
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100514/aeea4285/attachment.htm