2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice@gmail.com>
We could do a similar thing when adding components in BitBlt
partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nParts

For a simple example, say I have 2 parts of 4 bits
I will start reasonning with extended precision (thus on 16 bits here)

"The sum with carry is" sumWithCarry:=word1+word2.
"But carry must not be propagated past each component."
"So we must care of what happens at" carryOverflowMask := 2r100010000.
"The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2.
"If the sum without carry differ from sum with carry then an overflow occured."
"We can thus detect presence of a carry overflow:"
carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd: carryOverflowMask.
"If an undue carry occured, we just removet it:"
sumWithoutUndueCarry := sumWithCarry - carryOverflow.
"But in this case, previous component did overflow, we must saturate it at 2r1111, that is:" componentMask:=1<<nBits-1.
"we just have to multiply each carryOverflow bit with this componentMask:"
result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit * componentMask.

-----------

"Generalization: note that 2r00010001 * componentMask = 2r11111111"
"Correlatively, for arbitrary nBits and nParts parameters:"
carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.

-----------

In BitBlt, we want to operate on 32bits words.
We can use #usqLong (64 bits at least) all the way, and obtain branch free replacement for above method
  
    <var: #word1 type: 'unsigned int'>
    <var: #word2 type: 'unsigned int'>
    <var: #one type: #usqLong>
    <var: #componentMask type: #usqLong>
    <var: #carryOverflowMask type: #usqLong>
    <var: #carryOverflow type: #usqLong>
    <var: #sum type: #usqLong>
    one := 1.
    componentMask := one<<nBits-1.
    carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits.
    sum := word1.
    sum := sum + word2.
    carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd: carryOverflowMask.
    ^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask

-----------

But maybe we can do better and express all with native unsigned int operations.
We must then look if bits at 2r10001000 could produce a carry.
We have a carry in next bit if at least 2 out of the 3 are true among word1 word2 carry...
That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry).

    <var: #word1 type: 'unsigned int'>
    <var: #word2 type: 'unsigned int'>
    <var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in C..."
    <var: #componentMask type: 'unsigned int'>
    <var: #carryOverflowMask type: 'unsigned int'>
    <var: #carryOverflow type: 'unsigned int'>
    <var: #carry type: 'unsigned int'>
    <var: #sum type: 'unsigned int'>
    one := 1.
    componentMask := one<<nBits-1.
    carryOverflowMask := one<<(nBits*nParts)-1.
    carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1).
    sum := word1.
    sum := sum + word2.
    carry := (word1 bitXor: word2) bitXor: sum.
    carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry)) bitAnd: carryOverflowMask.
    ^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) * componentMask

Maybe some good soul may help reducing # ops further, but we already have a branch free sum.
I did no performance test.
Does one have good BitBlt benchmark?


Attached is what I could come with minimal ops...
The carry overflow can be written with this table

0 0 0 0 1 1 1 1 word1
0 0 1 1 0 0 1 1 word2
0 1 0 1 0 1 0 1 carry
-------------------
0 0 0 1 0 1 0 1 next carry if at least 2 out of 3
0 1 1 0 1 0 0 1 sum=word1+word2 (with carry)
0 0 0 1 0 1 x x (word1 bitOr: word2) bitAnd: sum bitInvert
0 0 0 0 0 0 1 1 (word1 bitAnd: word2)

The bit masks are not computed when depths are known constants