Adding loop primitives/optimizations (was
Making Set/Dictionaryetc. loops more robust)
Yoshiki Ohshima
Yoshiki.Ohshima at acm.org
Thu Dec 2 05:47:32 UTC 2004
John,
> When you are talking this type of stuff, then really you are talking
> about doing a plugin.
> A few years back I and others looked at Altivec stuff for Squeak. What
> I found was that well we
> do have a set of primtives to do vector math already. If the data is in
> Arrays we can make the
> prim call and say add two vectors together. The problem was that adding
> a million elements takes
> just a few milliseconds, but the overhead to setup the prim call and
> get at the data took many ms.
> So changing the cost from 2 milliseconds to less than 1 millisecond
> didn't make any difference to
> the bottom line since it took 10 ms to get us to where we could do the
> math.
This "10ms" number doesn't agree with my experiences. On my
computer,
f _ FloatArray new: 1.
[1000000 timesRepeat: [f+=f]] timeToRun.
"=> 1094"
and
f _ FloatArray new: 1000.
[1000000 timesRepeat: [f+=f]] timeToRun.
"=> 3730"
or even
f _ FloatArray new: 10000.
[1000000 timesRepeat: [f+=f]] timeToRun.
"=> 31595"
So, basically, the primitive callout time isn't that terrible and it
makes some sense to try to optimize the array arithmetic. In Kedama,
the plugin supports a bit more array arithmetic primitives, and they
definitely gives performance boost. In typical example, if we can cut
the primitive execution time occupies 70% or so of total execution
time and we can still imagine to cut the primitive execution time in
half or so.
-- Yoshiki
By the way, this version:
f _ FloatArray new: 1.
[10000 timesRepeat: [
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f.
]] timeToRun
"=> 830"
So, the loop overhead is somwhat in the same ballpark. (less than
factor of 10.)
More information about the Squeak-dev
mailing list
|