[Vm-dev] Re: Nice bit trick for component add

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Dec 24 05:05:22 UTC 2013


I have yet another version published at
http://smalltalkhub.com/mc/nice/NiceVMExperiments/main to handle a 2x16bits
word in a single pass
Of course, speed up is spectacular on low depth (around x5 at depth 1, x2
at depth 4, only 40% at depth 16 and 30% at depth 32).


2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>

>
> 2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
>
>>
>> 2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
>>
>>> I got an idea while reading this nice blending trick:
>>>
>>> http://stackoverflow.com/questions/20681590/how-does-this-color-blending-trick-work
>>>
>>> We could do a similar thing when adding components in BitBlt
>>> partitionedAdd: word1 to: word2 nBits: nBits nPartitions: nParts
>>>
>>> For a simple example, say I have 2 parts of 4 bits
>>> I will start reasonning with extended precision (thus on 16 bits here)
>>>
>>> "The sum with carry is" sumWithCarry:=word1+word2.
>>> "But carry must not be propagated past each component."
>>> "So we must care of what happens at" carryOverflowMask := 2r100010000.
>>> "The sum without any carry is" sumWithoutCarry:=word1 bitXor: word2.
>>> "If the sum without carry differ from sum with carry then an overflow
>>> occured."
>>> "We can thus detect presence of a carry overflow:"
>>> carryOverflow := (sumWithCarry bitXor: sumWithoutCarry) bitAnd:
>>> carryOverflowMask.
>>> "If an undue carry occured, we just removet it:"
>>> sumWithoutUndueCarry := sumWithCarry - carryOverflow.
>>> "But in this case, previous component did overflow, we must saturate it
>>> at 2r1111, that is:" componentMask:=1<<nBits-1.
>>> "we just have to multiply each carryOverflow bit with this
>>> componentMask:"
>>> result := sumWithoutUndueCarry bitOr: carryOverflow >> nBit *
>>> componentMask.
>>>
>>> -----------
>>>
>>> "Generalization: note that 2r00010001 * componentMask = 2r11111111"
>>> "Correlatively, for arbitrary nBits and nParts parameters:"
>>> carryOverflowMask := 1<<(nParts*nBits)-1//componentMask<<nBits.
>>>
>>> -----------
>>>
>>> In BitBlt, we want to operate on 32bits words.
>>> We can use #usqLong (64 bits at least) all the way, and obtain branch
>>> free replacement for above method
>>>
>>>     <var: #word1 type: 'unsigned int'>
>>>     <var: #word2 type: 'unsigned int'>
>>>     <var: #one type: #usqLong>
>>>     <var: #componentMask type: #usqLong>
>>>     <var: #carryOverflowMask type: #usqLong>
>>>     <var: #carryOverflow type: #usqLong>
>>>     <var: #sum type: #usqLong>
>>>     one := 1.
>>>     componentMask := one<<nBits-1.
>>>     carryOverflowMask := one<<(nBits*nParts)-1//componentMask<<nBits.
>>>     sum := word1.
>>>     sum := sum + word2.
>>>     carryOverflow := ((word1 bitXor: word2) bitXor: sum) bitAnd:
>>> carryOverflowMask.
>>>     ^sum-carryOverflow bitOr: carryOverflow>>nBits * componentMask
>>>
>>> -----------
>>>
>>> But maybe we can do better and express all with native unsigned int
>>> operations.
>>> We must then look if bits at 2r10001000 could produce a carry.
>>> We have a carry in next bit if at least 2 out of the 3 are true among
>>> word1 word2 carry...
>>> That is (word1 bitAnd: word2) bitOr: ((word1 bitOr: word2) bitAnd:
>>> carry).
>>>
>>>     <var: #word1 type: 'unsigned int'>
>>>     <var: #word2 type: 'unsigned int'>
>>>     <var: #one type: #usqLong> "because we cannot shift <<32 a usqInt in
>>> C..."
>>>     <var: #componentMask type: 'unsigned int'>
>>>     <var: #carryOverflowMask type: 'unsigned int'>
>>>     <var: #carryOverflow type: 'unsigned int'>
>>>     <var: #carry type: 'unsigned int'>
>>>     <var: #sum type: 'unsigned int'>
>>>     one := 1.
>>>     componentMask := one<<nBits-1.
>>>     carryOverflowMask := one<<(nBits*nParts)-1.
>>>     carryOverflowMask := carryOverflowMask//componentMask<<(nBits-1).
>>>     sum := word1.
>>>     sum := sum + word2.
>>>     carry := (word1 bitXor: word2) bitXor: sum.
>>>     carryOverflow := ((word1 bitAnd: word2) bitOr: ((word1 bitOr: word2)
>>> bitAnd: carry)) bitAnd: carryOverflowMask.
>>>     ^sum-(carryOverflow<<1) bitOr: carryOverflow>>(nBits-1) *
>>> componentMask
>>>
>>> Maybe some good soul may help reducing # ops further, but we already
>>> have a branch free sum.
>>> I did no performance test.
>>> Does one have good BitBlt benchmark?
>>>
>>>
>> Attached is what I could come with minimal ops...
>> The carry overflow can be written with this table
>>
>> 0 0 0 0 1 1 1 1 word1
>> 0 0 1 1 0 0 1 1 word2
>> 0 1 0 1 0 1 0 1 carry
>> -------------------
>> 0 0 0 1 0 1 0 1 next carry if at least 2 out of 3
>> 0 1 1 0 1 0 0 1 sum=word1+word2 (with carry)
>> 0 0 0 1 0 1 x x (word1 bitOr: word2) bitAnd: sum bitInvert
>> 0 0 0 0 0 0 1 1 (word1 bitAnd: word2)
>>
>> The bit masks are not computed when depths are known constants
>>
>
> Arghh, I posted too fast, it was bitInvert32 because that's the only thing
> that the CCodeGenerator understand by now.
> Hmm this is wrong, ~x works for other int types than int32/uint32, but
> well...
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131224/84cf25e4/attachment-0001.htm


More information about the Vm-dev mailing list