Odd
sqrmax at cvtci.com.ar
sqrmax at cvtci.com.ar
Tue Apr 21 06:38:45 UTC 1998
Hi.
>>"Multiplication is a one cycle operation, the same as shift"
>Not quite yet. The fastest integer multiplies are still around 3 to 5
>cycles of latency, but can be pipelined (and proceed in parallel with
>other instructions). Many implementations are slower than this, e.g.
>radix-4 modified Booth that would take 16 to 18 processor cycles for a
>32-bit multiply.
Well... Intel chips used to take up to 38 processor cycles to multiply 32
bits x 32 bits of data... although ok, they had fast out cases cutting the
execution at 9 cycles. I don't have cycle consumption tables for newer
processors. They should be slower than 3 to 5... and still, that is slower than the
one cycle shift operation. Hint: most cpus also have somewhat dubbed double
precision shift...
>>#bitShift: as well as #<< require a message lookup in the class.
>Yes, that's the real reason:
No wonder it's slower than multiplying... ouch!
>SmallInteger arithmetic is important and
>frequent enough that the system is heavily optimized for it. The
>Compiler generates special bytecodes that attempt to perform SmallInteger
>arithmetic (and sometimes Float arithmetic, if that fails) directly
>without going through the standard method lookup that message sends
>normally do. Only if those fail is the standard lookup performed.
Say, it checks if the receiver of the #* message is an immediate
smallinteger or float, doesn't it?
>This means that if you implement #* for one of your own objects, it will
be
>slightly slower than a message named something else, say #times: -- but
>that is a good tradeoff to make for the increased performance of
>SmallInteger arithmetic.
Yes, anyway that kind of check is almost instantaneous. Could it be
possible to copy that neat behaviour for my old good friend bitShift:? I usually do
a lot of calculations and they tend to take a lot of time (last one: 12
hours... it would have taken even more if I had only 1 cpu on the motherboard) and
it would be really nice to be able to skip "trivial" multiplications by
shifting in an efficient way... what should be the first thing to look at? What
would be the "tidy" way to implement it?
By the way, bit manipulations can require LOTS of shifting (hint: faster
shifting would be nice)... and also bitOring, Anding and sometimes Xoring. I
recently rewrote those methods for LargePositiveIntegers... they came up (in
order) 40, 31 and 32 times faster than the originals with the numbers I was
working on. The worst case I found produced at least 25% or so more crunching.
It gets better as numbers grow. We will be sending these after beta testing.
The #stopWatch:times: message below eliminates the milliseconds taken to
iterate the block (if any milliseconds).
bigX :=
6218505580556377054933696188038227463266226353204360987582520552117399354806157645981735539822213809735356061910761586985957250252328245503917403
76566216694760169500212006371597832501463679413278366716698804808782186224327
23589692532766871407282927254679931663812652874626942022387218424934147886663
52003287505024373742927521323823902292247788953822427577726350168781848187839
09510937505877800813851738986507251794246440515528901819698778916278071811583
510400023716260942011160576
smallX := 1 << 30 - 1
smallX class SmallInteger
Time stopWatch: 100 times: [bigX bitAnd: smallX] 203ms
Time stopWatch: 100 times: [bigX fasterBitAnd: smallX] 0ms
Time stopWatch: 500 times: [bigX bitAnd: smallX] 1s 15ms
Time stopWatch: 14500 times: [bigX fasterBitAnd: smallX] 1s 31ms
Time stopWatch: 14500 times: [bigX fasterBitAnd: smallX] 952ms
Time stopWatch: 15500 times: [bigX fasterBitAnd: smallX] 1s 15ms
Time stopWatch: 15500 times: [bigX fasterBitAnd: smallX] 999ms
Time stopWatch: 500 times: [bigX bitOr: smallX] 812ms
Time stopWatch: 770 times: [bigX fasterBitOr: smallX] 812ms
Time stopWatch: 770 times: [bigX fasterBitOr: smallX] 47ms
(bigX bitOr: smallX) = (bigX fasterBitOr: smallX) true
Time stopWatch: 19500 times: [bigX fasterBitOr: smallX] 797ms
Time stopWatch: 20000 times: [bigX fasterBitOr: smallX] 828ms
Time stopWatch: 500 times: [bigX bitXor: smallX] 875ms
Time stopWatch: 16000 times: [bigX fasterBitXor: smallX] 844ms
Time stopWatch: 500 times: [bigX bitAnd: bigX] 719ms
Time stopWatch: 500 times: [bigX fasterBitAnd: bigX] 407ms
Time stopWatch: 500 times: [bigX bitOr: bigX] 750ms
Time stopWatch: 500 times: [bigX fasterBitOr: bigX] 437ms
bigXp1 := bigX + 1
Time stopWatch: 500 times: [bigX bitXor: bigXp1] 938ms
Time stopWatch: 500 times: [bigX fasterBitXor: bigXp1] 703ms
bigXs := bigX << 1
Time stopWatch: 500 times: [bigX bitXor: bigXs] 704ms
Time stopWatch: 500 times: [bigX fasterBitXor: bigXs] 468ms
---
By the way, have you noticed that all these timings have a precision of
around 16ms?
Andres.
More information about the Squeak-dev
mailing list
|