OQO [Re: Yet another interesting bit of hardware in theDynapadvein...]

Tue May 21 08:58:43 UTC 2002

I suspect loop overhead is the main influence on the ratio, even though it will be the same for integer and FP on a given platform.

I also get a ratio of roughly 2.1 for my Pentium for this:
	[100000 timesRepeat: [0+0]] timeToRun.
	[100000 timesRepeat: [0.0+0.0]] timeToRun.

If I unwind the computation to reduce the overhead of the 100000 times
loop, I get a ratio of more like 7.7
	[100000 timesRepeat: [0+0+0+0+0+0+0+0+0+0]] timeToRun.
	[100000 timesRepeat: [0.0+0.0+0.0+0.0+0.0+0.0+0.0+0.0+0.0+0.0]]
timeToRun.

This might be a better test for comparing FP/Int performance ratio on
Pentium vs. ARM

Mike

"Ohshima, Yoshiki" <Yoshiki.Ohshima at disney.com> wrote:
>   Hello,
> 
> > >  An XScale based PocketPC is coming out in a few weeks from
> > >Toshiba.  I would assume that XScale at 400MHz (max) is
> > >faster than 300MHz Geode:-)
> > 
> > Well, that depends....  If you mean the PXA210, then I
> > would say that the 300 MHz Geode definitely beats the 400
> > MHz XScale.  The Geode is essentially a Pentium class CPU
> > with floating point and MMX.  I'd say that it is probably
> > still faster than even a 400 MHz PXA250, especially for
> > running Squeak. 
> 
>   Hmm.  Interesting.
> 
> > Squeak likes hardware floating point.  Compare Squeak on a
> > 200 MHz Celeron against Squeak on an iPaq.  I know for
> > sure that Squeak is much more responsive and the
> > benchmarks are better on my old Pentium 133 Sharp Widenote
> > that on my Casiopeia E-105 (a 131 MHz MIPS 3 CPU with no
> > floating point hardware).
> 
>   Do you think this is due to the floating point hardware?
> I've been thinking this is more because of the memory
> bandwidth.
> 
>   On my Pentium III 800MHz laptop, the ratio of results from 
> following two lines is around 2.4. (43ms vs. 103ms).
> 
> [100000 timesRepeat: [0+0]] timeToRun.
> [100000 timesRepeat: [0.0+0.0]] timeToRun.
> 
> On my iPAQ, the ratio is around 2.3.  I think the primitive
> callout is so slow that the actual computation is pretty
> much shadowed by the other factor.  The #+ primitive first
> trys SmallInteger version and then fall back to Float
> version.  This would explain the factor of two difference.
> 
> > I don't think Squeak would readily take advantage of the
> > dual multiply-accumulate pipelines or SIMD on the PXA250,
> > just like it doesn't really benefit from MMX.
> 
>   Yes.  Some bitblt rules, such as rule 24 can be much
> faster if we bind it with the MMX (or Intel IPP stuff).
> 
> -- Yoshiki