I got an idea while reading this nice blending trick: http://stackoverflow.com/questions/20681590/how-does-this-color-blending-tri...
We could do a similar thing when adding components in BitBlt partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nParts
For a simple example, say I have 2 parts of 4 bits I will start reasonning with extended precision (thus on 16 bits here)
"The sum with carry is" sumWithCarry:=word1+word2. "But carry must not be propagated past each component." "So we must care of what happens at" carryOverflowMask := 2r100010000. "The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2. "If the sum without carry differ from sum with carry then an overflow occured." "We can thus detect presence of a carry overflow:" carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd: carryOverflowMask. "If an undue carry occured, we just removet it:" sumWithoutUndueCarry := sumWithCarry - carryOverflow. "But in this case, previous component did overflow, we must saturate it at 2r1111, that is:" componentMask:=1<<nBits-1. "we just have to multiply each carryOverflow bit with this componentMask:" result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit * componentMask.
-----------
"Generalization: note that 2r00010001 * componentMask = 2r11111111" "Correlatively, for arbitrary nBits and nParts parameters:" carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.
-----------
In BitBlt, we want to operate on 32bits words. We can use #usqLong (64 bits at least) all the way, and obtain branch free replacement for above method
<var: #word1 type: 'unsigned int'> <var: #word2 type: 'unsigned int'> <var: #one type: #usqLong> <var: #componentMask type: #usqLong> <var: #carryOverflowMask type: #usqLong> <var: #carryOverflow type: #usqLong> <var: #sum type: #usqLong> one := 1. componentMask := one<<nBits-1. carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits. sum := word1. sum := sum + word2. carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd: carryOverflowMask. ^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask
-----------
But maybe we can do better and express all with native unsigned int operations. We must then look if bits at 2r10001000 could produce a carry. We have a carry in next bit if at least 2 out of the 3 are true among word1 word2 carry... That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry).
<var: #word1 type: 'unsigned int'> <var: #word2 type: 'unsigned int'> <var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in C..." <var: #componentMask type: 'unsigned int'> <var: #carryOverflowMask type: 'unsigned int'> <var: #carryOverflow type: 'unsigned int'> <var: #carry type: 'unsigned int'> <var: #sum type: 'unsigned int'> one := 1. componentMask := one<<nBits-1. carryOverflowMask := one<<(nBits*nParts)-1. carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1). sum := word1. sum := sum + word2. carry := (word1 bitXor: word2) bitXor: sum. carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry)) bitAnd: carryOverflowMask. ^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) * componentMask
Maybe some good soul may help reducing # ops further, but we already have a branch free sum. I did no performance test. Does one have good BitBlt benchmark?