2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice@gmail.com>
2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice@gmail.com>
"So we must care of what happens at" carryOverflowMask := 2r100010000."But carry must not be propagated past each component.""The sum with carry is" sumWithCarry:=word1+word2.For a simple example, say I have 2 parts of 4 bitsI got an idea while reading this nice blending trick:We could do a similar thing when adding components in BitBlt
http://stackoverflow.com/questions/20681590/how-does-this-color-blending-trick-work
partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nPartsI will start reasonning with extended precision (thus on 16 bits here)"The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2."If the sum without carry differ from sum with carry then an overflow occured.""We can thus detect presence of a carry overflow:"
carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd: carryOverflowMask."If an undue carry occured, we just removet it:"sumWithoutUndueCarry := sumWithCarry - carryOverflow.
"But in this case, previous component did overflow, we must saturate it at 2r1111, that is:" componentMask:=1<<nBits-1."we just have to multiply each carryOverflow bit with this componentMask:"
result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit * componentMask.
-----------"Generalization: note that 2r00010001 * componentMask = 2r11111111""Correlatively, for arbitrary nBits and nParts parameters:"
carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.
-----------In BitBlt, we want to operate on 32bits words.We can use #usqLong (64 bits at least) all the way, and obtain branch free replacement for above method
<var: #word1 type: 'unsigned int'>
<var: #word2 type: 'unsigned int'>
<var: #one type: #usqLong>
<var: #componentMask type: #usqLong>
<var: #carryOverflowMask type: #usqLong>
<var: #carryOverflow type: #usqLong>
<var: #sum type: #usqLong>
one := 1.
componentMask := one<<nBits-1.
carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits.
sum := word1.
sum := sum + word2.
carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd: carryOverflowMask.
^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask
-----------But maybe we can do better and express all with native unsigned int operations.
We must then look if bits at 2r10001000 could produce a carry.We have a carry in next bit if at least 2 out of the 3 are true among word1 word2 carry...That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry).
<var: #word1 type: 'unsigned int'>
<var: #word2 type: 'unsigned int'>
<var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in C..."
<var: #componentMask type: 'unsigned int'>
<var: #carryOverflowMask type: 'unsigned int'>
<var: #carryOverflow type: 'unsigned int'>
<var: #carry type: 'unsigned int'>
<var: #sum type: 'unsigned int'>
one := 1.
componentMask := one<<nBits-1.
carryOverflowMask := one<<(nBits*nParts)-1.
carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1).
sum := word1.
sum := sum + word2.carry := (word1 bitXor: word2) bitXor: sum.carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry)) bitAnd: carryOverflowMask.
^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) * componentMask
Maybe some good soul may help reducing # ops further, but we already have a branch free sum.I did no performance test.Does one have good BitBlt benchmark?Attached is what I could come with minimal ops...The carry overflow can be written with this table0 0 0 0 1 1 1 1 word10 0 1 1 0 0 1 1 word20 1 0 1 0 1 0 1 carry
-------------------
0 0 0 1 0 1 0 1 next carry if at least 2 out of 30 1 1 0 1 0 0 1 sum=word1+word2 (with carry)0 0 0 1 0 1 x x (word1 bitOr: word2) bitAnd: sum bitInvert0 0 0 0 0 0 1 1 (word1 bitAnd: word2)The bit masks are not computed when depths are known constantsArghh, I posted too fast, it was bitInvert32 because that's the only thing that the CCodeGenerator understand by now.
Hmm this is wrong, ~x works for other int types than int32/uint32, but well...