Richard A. O'Keefe
ok at cs.otago.ac.nz
Fri Aug 30 01:33:23 UTC 2002
I was doing some work in Squeak 3.0 at home, and found that
(for the particular thing I was doing) most of the time was
going in String>>hash.
Looking in Squeak 3.2, I see that String>>hash has been
replaced by code that calls a primitive. It has made an
improvement to the time, no two ways about that.
It hasn't made as much of an improvement as I expected.
Number of instances of String : 37,074
The sum of #size of each: 996,905
Time to hash all of them: 3.648 sec
Average time to hash one: 98.5 usec
Time to hash all of them in C: 0.078 sec
Average time to hash one in C: 2.1 usec
Measuring the time in Squeak:
s := String allInstances.
t0 := [s do: [:each | each]] timeToRun.
t1 := [s do: [:each | each hash]] timeToRun.
(t1 - t0) * 1.0e-3 "Print It"
(t1 - t0) * 1.0e3 / s size "Print It"
Measuring the time in C:
f := StandardFileStream newFileNamed: 'strpool.dat'.
s do: [:each | f print: each size; space; nextPutAll: each].
Then a C program loads the strings, hashes them all 400 times,
and writes out the times.
Controlling for hardware:
Squeak and the C program were both run on the same machine.
How do I know the C code was doing the same thing as the Squeak hash code?
^String stringHash: self initialHash: self species hash
String class>>stringHash: aString initialHash: speciesHash
|stringSize hash low|
<primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'>
self var: #aHash declareC: 'int speciesHash'.
self var: #aString declareC: 'unsigned char *aString'.
stringSize := aString size.
hash := speciesHash bitAnd: 16rFFFFFFF.
1 to: stringSize do: [:pos |
hash := hash + (aString at: pos) asciiValue.
low := hash bitAnd: 16383.
hash := (16r260D * low + ((16r260D * (hash bitShift: -14) +
(16r0065 * low) bitAnd: 16383)) * 16384)) bitAnd: 16r0FFFFFFF.
I turned that into C, that's how. I don't actually understand the rules
for Slang; I presume that (aString at: pos) asciiValue
turns into aString[pos] (because at any rate that's the number that always
used to be used).
So if #stringHash:initialHash: is compiled into C, how come C that does
the same thing is running about 47 times faster?
Where is the time going?
What have I misunderstood?
More information about the Squeak-dev