"So we must care of what happens at" carryOverflowMask := 2r100010000."But carry must not be propagated past each component.""The sum with carry is" sumWithCarry:=word1+word2.For a simple example, say I have 2 parts of 4 bitsI got an idea while reading this nice blending trick:We could do a similar thing when adding components in BitBlt
http://stackoverflow.com/questions/20681590/how-does-this-color-blending-trick-work
partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nPartsI will start reasonning with extended precision (thus on 16 bits here)"The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2."If the sum without carry differ from sum with carry then an overflow occured.""We can thus detect presence of a carry overflow:"
carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd: carryOverflowMask."If an undue carry occured, we just removet it:"sumWithoutUndueCarry := sumWithCarry - carryOverflow.
"But in this case, previous component did overflow, we must saturate it at 2r1111, that is:" componentMask:=1<<nBits-1."we just have to multiply each carryOverflow bit with this componentMask:"
result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit * componentMask.
-----------"Generalization: note that 2r00010001 * componentMask = 2r11111111""Correlatively, for arbitrary nBits and nParts parameters:"
carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.
-----------In BitBlt, we want to operate on 32bits words.We can use #usqLong (64 bits at least) all the way, and obtain branch free replacement for above method
<var: #word1 type: 'unsigned int'>
<var: #word2 type: 'unsigned int'>
<var: #one type: #usqLong>
<var: #componentMask type: #usqLong>
<var: #carryOverflowMask type: #usqLong>
<var: #carryOverflow type: #usqLong>
<var: #sum type: #usqLong>
one := 1.
componentMask := one<<nBits-1.
carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits.
sum := word1.
sum := sum + word2.
carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd: carryOverflowMask.
^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask
-----------But maybe we can do better and express all with native unsigned int operations.
We must then look if bits at 2r10001000 could produce a carry.We have a carry in next bit if at least 2 out of the 3 are true among word1 word2 carry...That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry).
<var: #word1 type: 'unsigned int'>
<var: #word2 type: 'unsigned int'>
<var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in C..."
<var: #componentMask type: 'unsigned int'>
<var: #carryOverflowMask type: 'unsigned int'>
<var: #carryOverflow type: 'unsigned int'>
<var: #carry type: 'unsigned int'>
<var: #sum type: 'unsigned int'>
one := 1.
componentMask := one<<nBits-1.
carryOverflowMask := one<<(nBits*nParts)-1.
carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1).
sum := word1.
sum := sum + word2.carry := (word1 bitXor: word2) bitXor: sum.carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry)) bitAnd: carryOverflowMask.
^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) * componentMask
Maybe some good soul may help reducing # ops further, but we already have a branch free sum.I did no performance test.Does one have good BitBlt benchmark?