<div dir="ltr"><div><div><div><div><div><div>I got an idea while reading this nice blending trick:<br><a href="http://stackoverflow.com/questions/20681590/how-does-this-color-blending-trick-work" target="_blank">http://stackoverflow.com/questions/20681590/how-does-this-color-blending-trick-work</a><br>
<br></div>We could do a similar thing when adding components in BitBlt<br>partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nParts<br><br></div>For a simple example, say I have 2 parts of 4 bits<br></div><div>I will start reasonning with extended precision (thus on 16 bits here)</div>
<br></div>"The sum with carry is" sumWithCarry:=word1+word2.<br></div>"But carry must not be propagated past each component."<br></div>"So we must care of what happens at" carryOverflowMask := 2r100010000.<div>
<div><div><div>"The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2.<br><div></div></div><div>"If the sum without carry differ from sum with carry then an overflow occured."<br></div><div>
"We can thus detect presence of a carry overflow:"<br>
</div><div>carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd: carryOverflowMask.<br></div><div>"If an undue carry occured, we just removet it:"<br></div><div>sumWithoutUndueCarry := sumWithCarry - carryOverflow.<br>
</div><div>"But in this case, previous component did overflow, we must saturate it at 2r1111, that is:" componentMask:=1<<nBits-1.<br></div><div>"we just have to multiply each carryOverflow bit with this componentMask:"<br>
</div><div>result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit * componentMask.<br><br>-----------<br><br></div><div>"Generalization: note that 2r00010001 * componentMask = 2r11111111"<br></div><div>
"Correlatively, for arbitrary nBits and nParts parameters:" <br>
</div><div>carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.<br><br>-----------<br><br></div><div>In BitBlt, we want to operate on 32bits words.<br></div><div>We can use #usqLong (64 bits at least) all the way, and obtain branch free replacement for above method<br>
<br>
<var: #word1 type: 'unsigned int'><br> <var: #word2 type: 'unsigned int'><br> <var: #one type: #usqLong><br> <var: #componentMask type: #usqLong><br> <var: #carryOverflowMask type: #usqLong><br>
<var: #carryOverflow type: #usqLong><br>
<var: #sum type: #usqLong><br> one := 1.<br> componentMask := one<<nBits-1.<br> carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits.<br> sum := word1.<br> sum := sum + word2.<br>
carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd: carryOverflowMask.<br> ^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask<br><br>-----------<br><br></div><div>But maybe we can do better and express all with native unsigned int operations.<br>
</div><div>We must then look if bits at 2r10001000 could produce a carry.<br></div><div>We have a carry in next bit if at least 2 out of the 3 are true among word1 word2 carry...<br></div><div>That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry).<br>
<br>
<var: #word1 type: 'unsigned int'><br> <var: #word2 type: 'unsigned int'><br> <var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in C..."<br> <var: #componentMask type: 'unsigned int'><br>
<var: #carryOverflowMask type: 'unsigned int'><br> <var: #carryOverflow type: 'unsigned int'><br> <var: #carry type: 'unsigned int'><br>
<var: #sum type: 'unsigned int'><br> one := 1.<br> componentMask := one<<nBits-1.<br> carryOverflowMask := one<<(nBits*nParts)-1.<br> carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1).<br>
<div> sum := word1.<br> sum := sum + word2.<br></div><div> carry := (word1 bitXor: word2) bitXor: sum.</div><div>
carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry)) bitAnd: carryOverflowMask.<br> ^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) * componentMask<br></div><br>
</div><div>Maybe some good soul may help reducing # ops further, but we already have a branch free sum.<br></div><div>I did no performance test.<br></div><div>Does one have good BitBlt benchmark?<br></div><div><br></div>
</div></div></div></div>