[ENH][plugin] hashSpeedup-sr

Stephan Rudlof sr at evolgo.de
Fri Mar 2 21:52:27 UTC 2001


Mark,

Mark van Gulik wrote:
> 
> on 2/28/01 5:18 PM, Stephan Rudlof at sr at evolgo.de wrote:
> ...
> > History:
> > First I have tried just to make a primitive for SmallInteger>>hashMultiply,
> > but this hasn't speeded up things! Reason: no message send occurs inside
> > this method in real...
> 
> Thanks, that's the intention.  I glanced at the generated C code and it's
> horrible.

Which one? I haven't posted any plugin code for >>hashMultiply and the
posted one from
	String>>primStringHashFor: string startHash: startHash 
results in
	EXPORT(int) primStringHashForStartHash(void)
which has in its core loop (looping over the string) at the C side:
---
        for (ix = 1; ix <= lastIx; ix += 1) {
                hash += asciiValue(string[ix]);
                low = hash & 16383;
                hash = ((9741 * low) + ((((9741 * (((unsigned) hash >> 14)))
+ (101 * low)) & 16383) * 16384)) & 268435455;
        }
---

I think this C code isn't so bad.

And there are advantages:
- the Smalltalk method from which it is generated is executable - as it is -
if the plugin is missing! This makes it clear that the semantics are the
same;
- if there should be a change at the ST side, you'd just have to regenerate
the plugin/VM and the semantics are in sync (in spite of this you have to be
careful here...);
- last but not least: I was able to realize it in reasonable time without
thinking about numerics, I'm not an expert here.

> I'm not really "up" on the latest Squeak stuff, but isn't there a
> way of specifying C translations directly for some methods?

Do you mean the TestInterpreterPlugin mechanism?
First I have made a version with it, but for this case I've thought the
older variant fits better.

> If so, there is
> a method (or maybe it's already been manually inlined) whose sole purpose is
> to multiply by 1664525, modulo 2^24.  It was coded in a roundabout way, to
> avoid large integer math.  In C this is not necessary.  Instaed just "*" the
> receiver by 1664525 and "&" it with 0x0FFFFFFF.  Actually, we only need to
> mask the *final* result of the hash, as upper bits don't affect lower bits.

This probably would be an improvement (assumed the compiler isn't very
intelligent), but also had the drawback to keep two ST methods in sync (the
C specific plugin method and the ST method working if the plugin is missing,
the latter beeing there then to avoid LargeIntegers arithmetics!).

But feel free to post your version!

> 
> You may want to avoid the invocation of String>>at:

which is just a C array accesss,

> and
> Character>>asciiValue

which is
	#define asciiValue(c) c
(effectively a noop) in the plugin.

> in your primitive, and make sure you're using C
> integers throughout.

There are only C integers in the core loop, isn't it?

> I'd guess this should drop the time by *at least* a
> factor of two.  Ask your C compiler to generate assembly code to get a feel
> for what it's really doing down deep.
> 
> In addition, because it's just a polynomial evaluation with a known constant
> for the "variable", we can allow better processor parallelism by
> accumulating all the even terms while in parallel accumulating all the odd
> terms.  You'll need to multiply by the square of 1664525 mod 2^24 between
> successive even terms (and between successive odd terms).  The multiplier
> squared mod 2^24 is 121134249.  When you're done, just multiply the even
> accumulation by 1664525 and add the odd accumulation.  You might even want
> to do four simultaneous streams instead of two to really keep the CPU
> humming.  If Squeak guarantees that the "pad" bytes of a
> non-multiple-of-four-byte String are all zero, you can simply overrun the
> actual String characters, merge the streams, and then fix up for 0-3 zero
> bytes that aren't part of the String proper.  Hint:  The inverse of 1664525
> mod 2^24 is 249583813, as can be shown by the identity "(1664525 * 249583813
> bitAnd: 16r0FFFFFFF) = 1".  A table of this inverse raised to 0, 1, 2, and 3
> should make this pretty quick.

It may be that there are further improvements possible, but I don't think
that this core loop is a bottleneck.
And: There is an about factor 10 improvement for long strings *now*!


Again: Feel free to post an improved version, but I don't expect dramatical
speedups seen from the ST side.


Greetings,

Stephan

> 
> > "
> >
> > For me it is (Linux (without jitter)):
> >
> > Time millisecondsToRun: [100000 timesRepeat: [#n hash]].
> > 1611 1603 1607 "filed in without plugin"
> > 1061 1060 1067 "filed in with plugin"
> > 1453 1465 1449 "before"
> > Time millisecondsToRun: [100000 timesRepeat: [#namenamenamenamenamename
> > hash]].
> > 8928 8888 8901 "filed in without plugin"
> > 1273 1290 1281 "filed in with plugin" "!"
> > 10685 10684 10687 "before"
> >
> > To compile evaluate
> > MiscPrimitivePlugin translateDoInlining: true
> > .
> >
> > Greetings,
> >
> > Stephan

-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3





More information about the Squeak-dev mailing list