String>>hash performance

Richard A. O'Keefe ok at cs.otago.ac.nz
Fri Aug 30 01:33:23 UTC 2002


I was doing some work in Squeak 3.0 at home, and found that
(for the particular thing I was doing) most of the time was
going in String>>hash.

Looking in Squeak 3.2, I see that String>>hash has been
replaced by code that calls a primitive.  It has made an
improvement to the time, no two ways about that.

It hasn't made as much of an improvement as I expected.

Number of instances of String : 37,074 
The sum of #size of each:      996,905 

Time to hash all of them:        3.648 sec 
Average time to hash one:       98.5   usec

Time to hash all of them in C:   0.078 sec
Average time to hash one in C:   2.1   usec

Measuring the time in Squeak:

    s := String allInstances.
    t0 := [s do: [:each | each]] timeToRun.
    t1 := [s do: [:each | each hash]] timeToRun.
    (t1 - t0) * 1.0e-3 "Print It"
    (t1 - t0) * 1.0e3 / s size "Print It"

Measuring the time in C:

    f := StandardFileStream newFileNamed: 'strpool.dat'.
    s do: [:each | f print: each size; space; nextPutAll: each].
    f close.

    Then a C program loads the strings, hashes them all 400 times,
    and writes out the times.

Controlling for hardware:

    Squeak and the C program were both run on the same machine.

How do I know the C code was doing the same thing as the Squeak hash code?

    String>>hash
        ^String stringHash: self initialHash: self species hash

    String class>>stringHash: aString initialHash: speciesHash
	|stringSize hash low|
	<primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'>
	
	self var: #aHash declareC: 'int speciesHash'.
	self var: #aString declareC: 'unsigned char *aString'.
	stringSize := aString size.
	hash := speciesHash bitAnd: 16rFFFFFFF.
	1 to: stringSize do: [:pos |
	    hash := hash + (aString at: pos) asciiValue.
	    "Begin hashMultiply"
	    low := hash bitAnd: 16383.
	    hash := (16r260D * low + ((16r260D * (hash bitShift: -14) +
		(16r0065 * low) bitAnd: 16383)) * 16384)) bitAnd: 16r0FFFFFFF.
	].
	^hash

I turned that into C, that's how.  I don't actually understand the rules
for Slang; I presume that (aString at: pos) asciiValue
turns into aString[pos] (because at any rate that's the number that always
used to be used).

So if #stringHash:initialHash: is compiled into C, how come C that does
the same thing is running about 47 times faster?

Where is the time going?

What have I misunderstood?




More information about the Squeak-dev mailing list