[Vm-dev] Nice bit trick for component add

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Dec 24 00:50:44 UTC 2013


I got an idea while reading this nice blending trick:
http://stackoverflow.com/questions/20681590/how-does-this-color-blending-trick-work

We could do a similar thing when adding components in BitBlt
partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nParts

For a simple example, say I have 2 parts of 4 bits
I will start reasonning with extended precision (thus on 16 bits here)

"The sum with carry is" sumWithCarry:=word1+word2.
"But carry must not be propagated past each component."
"So we must care of what happens at" carryOverflowMask := 2r100010000.
"The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2.
"If the sum without carry differ from sum with carry then an overflow
occured."
"We can thus detect presence of a carry overflow:"
carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd:
carryOverflowMask.
"If an undue carry occured, we just removet it:"
sumWithoutUndueCarry := sumWithCarry - carryOverflow.
"But in this case, previous component did overflow, we must saturate it at
2r1111, that is:" componentMask:=1<<nBits-1.
"we just have to multiply each carryOverflow bit with this componentMask:"
result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit * componentMask.

-----------

"Generalization: note that 2r00010001 * componentMask = 2r11111111"
"Correlatively, for arbitrary nBits and nParts parameters:"
carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.

-----------

In BitBlt, we want to operate on 32bits words.
We can use #usqLong (64 bits at least) all the way, and obtain branch free
replacement for above method

    <var: #word1 type: 'unsigned int'>
    <var: #word2 type: 'unsigned int'>
    <var: #one type: #usqLong>
    <var: #componentMask type: #usqLong>
    <var: #carryOverflowMask type: #usqLong>
    <var: #carryOverflow type: #usqLong>
    <var: #sum type: #usqLong>
    one := 1.
    componentMask := one<<nBits-1.
    carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits.
    sum := word1.
    sum := sum + word2.
    carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd:
carryOverflowMask.
    ^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask

-----------

But maybe we can do better and express all with native unsigned int
operations.
We must then look if bits at 2r10001000 could produce a carry.
We have a carry in next bit if at least 2 out of the 3 are true among word1
word2 carry...
That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd: carry).

    <var: #word1 type: 'unsigned int'>
    <var: #word2 type: 'unsigned int'>
    <var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in
C..."
    <var: #componentMask type: 'unsigned int'>
    <var: #carryOverflowMask type: 'unsigned int'>
    <var: #carryOverflow type: 'unsigned int'>
    <var: #carry type: 'unsigned int'>
    <var: #sum type: 'unsigned int'>
    one := 1.
    componentMask := one<<nBits-1.
    carryOverflowMask := one<<(nBits*nParts)-1.
    carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1).
    sum := word1.
    sum := sum + word2.
    carry := (word1 bitXor: word2) bitXor: sum.
    carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2)
bitAnd: carry)) bitAnd: carryOverflowMask.
    ^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) * componentMask

Maybe some good soul may help reducing # ops further, but we already have a
branch free sum.
I did no performance test.
Does one have good BitBlt benchmark?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131224/469be4e8/attachment-0001.htm


More information about the Vm-dev mailing list