[ENH][plugin] hashSpeedup-sr

Mark van Gulik ghoul6 at home.net
Sat Mar 3 08:00:37 UTC 2001


on 3/2/01 3:52 PM, Stephan Rudlof at sr at evolgo.de wrote:
> Mark van Gulik wrote:
>> on 2/28/01 5:18 PM, Stephan Rudlof at sr at evolgo.de wrote:
>> ...
>>> History:
>>> First I have tried just to make a primitive for SmallInteger>>hashMultiply,
>>> but this hasn't speeded up things! Reason: no message send occurs inside
>>> this method in real...
>> 
>> Thanks, that's the intention.  I glanced at the generated C code and it's
>> horrible.
>
> Which one? I haven't posted any plugin code for >>hashMultiply and the
> posted one from
> String>>primStringHashFor: string startHash: startHash
> results in
> EXPORT(int) primStringHashForStartHash(void)
> which has in its core loop (looping over the string) at the C side:
> ---
> for (ix = 1; ix <= lastIx; ix += 1) {
> hash += asciiValue(string[ix]);
> low = hash & 16383;
> hash = ((9741 * low) + ((((9741 * (((unsigned) hash >> 14)))
> + (101 * low)) & 16383) * 16384)) & 268435455;
> }
> ---

Whoops.  I must have been pretty tired.  The best I can reconstruct is that
I was looking at the Smalltalk code you attached, and figuring out what a
direct translation to C would be doing.  Still, the C code you just posted
is probably slower than:

for (ix = 1; ix <= lastlx; ix += 1) {
    hash += asciiValue(string[ix]);
    hash *= 1664525;
}
hash &= 0x0FFFFFFF;


> I think this C code isn't so bad.

Yeah, if speed isn't paramount the generated C code should be adequate.
Pardon my over-reaction.


> And there are advantages:
> - the Smalltalk method from which it is generated is executable - as it is -
> if the plugin is missing! This makes it clear that the semantics are the
> same;
> - if there should be a change at the ST side, you'd just have to regenerate
> the plugin/VM and the semantics are in sync (in spite of this you have to be
> careful here...);
> - last but not least: I was able to realize it in reasonable time without
> thinking about numerics, I'm not an expert here.

Agreed.  Unless there's a way of stashing the C translation in a comment
within the original method(s) and getting the translator to use it instead,
it's probably best to just let the Smalltalk code be translated literally.

>> I'm not really "up" on the latest Squeak stuff, but isn't there a
>> way of specifying C translations directly for some methods?
> 
> Do you mean the TestInterpreterPlugin mechanism?
> First I have made a version with it, but for this case I've thought the
> older variant fits better.

I mean Squeak itself.  I haven't done any development in it yet, and I
haven't even really played with it in at least six months.

>> If so, there is
>> a method (or maybe it's already been manually inlined) whose sole purpose is
>> to multiply by 1664525, modulo 2^24.  It was coded in a roundabout way, to
>> avoid large integer math.  In C this is not necessary.  Instaed just "*" the
>> receiver by 1664525 and "&" it with 0x0FFFFFFF.  Actually, we only need to
>> mask the *final* result of the hash, as upper bits don't affect lower bits.
> 
> This probably would be an improvement (assumed the compiler isn't very
> intelligent), but also had the drawback to keep two ST methods in sync (the
> C specific plugin method and the ST method working if the plugin is missing,
> the latter beeing there then to avoid LargeIntegers arithmetics!).
> 
> But feel free to post your version!

I don't have a recent Squeak set up, and I haven't produced C from Slang in
a least a year.  I'm more of a lurker on this mailing list.


>> You may want to avoid the invocation of String>>at:
> 
> which is just a C array accesss,

My bad.

> It may be that there are further improvements possible, but I don't think
> that this core loop is a bottleneck.
> And: There is an about factor 10 improvement for long strings *now*!

Ah.  Yeah, don't bother speeding it up, unless you think the operation
should be an actual honest-to-goodness "built-in primitive" (on par with
identityHash or SmallInteger #+).

>>> Time millisecondsToRun: [100000 timesRepeat: [#n hash]].
>>> 1611 1603 1607 "filed in without plugin"
>>> 1061 1060 1067 "filed in with plugin"
>>> 1453 1465 1449 "before"
>>> Time millisecondsToRun: [100000 timesRepeat: [#namenamenamenamenamename
>>> hash]].
>>> 8928 8888 8901 "filed in without plugin"
>>> 1273 1290 1281 "filed in with plugin" "!"
>>> 10685 10684 10687 "before"

Nice.  The 23 * 100000 =2.3x10^6 extra characters took an additional ~200ms.
That's about 2x10^-1s/(2x10^6c) = 10^-7 seconds per character.  What's a
hundred nanoseconds between friends, eh?  Of course, if Squeak had, say, a
30-bit hash slot, you could compute the hash of a Symbol when it was
interned and cache it there forever.  That's basically what I do in my
language Avail (http://www.ericsworld.com/Mark/HTML/Avail.html), but I get
to do it lazily, and for arbitrary strings and tuples (because they're all
immutable).





More information about the Squeak-dev mailing list