Odd

sqrmax at cvtci.com.ar sqrmax at cvtci.com.ar
Tue Apr 21 06:38:45 UTC 1998


Hi.

>>"Multiplication is a one cycle operation, the same as shift"
>Not quite yet.  The fastest integer multiplies are still around 3 to 5 
>cycles of latency, but can be pipelined (and proceed in parallel with 
>other instructions).  Many implementations are slower than this, e.g. 
>radix-4 modified Booth that would take 16 to 18 processor cycles for a 
>32-bit multiply.

Well... Intel chips used to take up to 38 processor cycles to multiply 32 
bits x 32 bits of data... although ok, they had fast out cases cutting the 
execution at 9 cycles. I don't have cycle consumption tables for newer 
processors. They should be slower than 3 to 5... and still, that is slower than the 
one cycle shift operation. Hint: most cpus also have somewhat dubbed double 
precision shift...

>>#bitShift: as well as #<< require a message lookup in the class.
>Yes, that's the real reason: 

No wonder it's slower than multiplying... ouch!

>SmallInteger arithmetic is important and 
>frequent enough that the system is heavily optimized for it.  The 
>Compiler generates special bytecodes that attempt to perform SmallInteger 
>arithmetic (and sometimes Float arithmetic, if that fails) directly 
>without going through the standard method lookup that message sends 
>normally do.  Only if those fail is the standard lookup performed.  

Say, it checks if the receiver of the #* message is an immediate 
smallinteger or float, doesn't it?

>This means that if you implement #* for one of your own objects, it will 
be 
>slightly slower than a message named something else, say #times: -- but 
>that is a good tradeoff to make for the increased performance of 
>SmallInteger arithmetic.

Yes, anyway that kind of check is almost instantaneous. Could it be 
possible to copy that neat behaviour for my old good friend bitShift:? I usually do 
a lot of calculations and they tend to take a lot of time (last one: 12 
hours... it would have taken even more if I had only 1 cpu on the motherboard) and 
it would be really nice to be able to skip "trivial" multiplications by 
shifting in an efficient way... what should be the first thing to look at? What 
would be the "tidy" way to implement it?

By the way, bit manipulations can require LOTS of shifting (hint: faster 
shifting would be nice)... and also bitOring, Anding and sometimes Xoring. I 
recently rewrote those methods for LargePositiveIntegers... they came up (in 
order) 40, 31 and 32 times faster than the originals with the numbers I was 
working on. The worst case I found produced at least 25% or so more crunching. 
It gets better as numbers grow. We will be sending these after beta testing.

The #stopWatch:times: message below eliminates the milliseconds taken to 
iterate the block (if any milliseconds).

bigX := 
6218505580556377054933696188038227463266226353204360987582520552117399354806157645981735539822213809735356061910761586985957250252328245503917403
76566216694760169500212006371597832501463679413278366716698804808782186224327
23589692532766871407282927254679931663812652874626942022387218424934147886663
52003287505024373742927521323823902292247788953822427577726350168781848187839
09510937505877800813851738986507251794246440515528901819698778916278071811583
510400023716260942011160576

smallX := 1 << 30 - 1
smallX class SmallInteger

Time stopWatch: 100 times: [bigX bitAnd: smallX] 203ms 
Time stopWatch: 100 times: [bigX fasterBitAnd: smallX] 0ms
Time stopWatch: 500 times: [bigX bitAnd: smallX] 1s 15ms 
Time stopWatch: 14500 times: [bigX fasterBitAnd: smallX] 1s 31ms
Time stopWatch: 14500 times: [bigX fasterBitAnd: smallX] 952ms
Time stopWatch: 15500 times: [bigX fasterBitAnd: smallX] 1s 15ms 
Time stopWatch: 15500 times: [bigX fasterBitAnd: smallX] 999ms


Time stopWatch: 500 times: [bigX bitOr: smallX] 812ms 
Time stopWatch: 770 times: [bigX fasterBitOr: smallX] 812ms
Time stopWatch: 770 times: [bigX fasterBitOr: smallX] 47ms 
(bigX bitOr: smallX) = (bigX fasterBitOr: smallX) true

Time stopWatch: 19500 times: [bigX fasterBitOr: smallX] 797ms 
Time stopWatch: 20000 times: [bigX fasterBitOr: smallX] 828ms


Time stopWatch: 500 times: [bigX bitXor: smallX] 875ms 
Time stopWatch: 16000 times: [bigX fasterBitXor: smallX] 844ms 


Time stopWatch: 500 times: [bigX bitAnd: bigX] 719ms 
Time stopWatch: 500 times: [bigX fasterBitAnd: bigX] 407ms 

Time stopWatch: 500 times: [bigX bitOr: bigX] 750ms 
Time stopWatch: 500 times: [bigX fasterBitOr: bigX] 437ms

bigXp1 := bigX + 1

Time stopWatch: 500 times: [bigX bitXor: bigXp1] 938ms 
Time stopWatch: 500 times: [bigX fasterBitXor: bigXp1] 703ms

bigXs := bigX << 1

Time stopWatch: 500 times: [bigX bitXor: bigXs] 704ms 
Time stopWatch: 500 times: [bigX fasterBitXor: bigXs] 468ms


---
By the way, have you noticed that all these timings have a precision of 
around 16ms?

Andres.





More information about the Squeak-dev mailing list