Adding loop primitives/optimizations (was Making Set/Dictionaryetc. loops more robust)

Yoshiki Ohshima Yoshiki.Ohshima at acm.org
Thu Dec 2 05:47:32 UTC 2004


  John,

> When you are talking this type of stuff, then really you are talking  
> about doing a plugin.
> A few years back I and others looked  at Altivec stuff for Squeak. What  
> I found was that well we
> do have a set of primtives to do vector math already. If the data is in  
> Arrays we can make the
> prim call and say add two vectors together. The problem was that adding  
> a million elements takes
> just a few milliseconds, but the overhead to setup the prim call and  
> get at the data  took many ms.
> So changing the cost from 2 milliseconds to less than 1 millisecond  
> didn't make any difference to
> the bottom line since it took 10 ms to get us to where we could do the  
> math.

  This "10ms" number doesn't agree with my experiences.  On my
computer,

	f _ FloatArray new: 1.
	[1000000 timesRepeat: [f+=f]] timeToRun.

"=> 1094"

and 

	f _ FloatArray new: 1000.
	[1000000 timesRepeat: [f+=f]] timeToRun.
"=> 3730"

or even

	f _ FloatArray new: 10000.
	[1000000 timesRepeat: [f+=f]] timeToRun.
"=>  31595"

So, basically, the primitive callout time isn't that terrible and it
makes some sense to try to optimize the array arithmetic.  In Kedama,
the plugin supports a bit more array arithmetic primitives, and they
definitely gives performance boost.  In typical example, if we can cut
the primitive execution time occupies 70% or so of total execution
time and we can still imagine to cut the primitive execution time in
half or so.

-- Yoshiki

By the way, this version:

	f _ FloatArray new: 1.
	[10000 timesRepeat: [
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. f+=f. 
	]] timeToRun
"=> 830"

So, the loop overhead is somwhat in the same ballpark.  (less than
factor of 10.)



More information about the Squeak-dev mailing list