[Vm-dev] Re: Little speed up of BitBlt alpha-blending

Tue Dec 24 04:59:21 UTC 2013

I only measured gain of 25%, not 50%, maybe the division is a bit complex...

2013/12/24 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>

>
> 2013/12/23 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
>
>>
>> 2013/12/23 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
>>
>>> Currently we use a very clear but naive algorithm
>>>
>>>     alpha := sourceWord >> 24.  "High 8 bits of source pixel"
>>>     alpha = 0 ifTrue: [ ^ destinationWord ].
>>>     alpha = 255 ifTrue: [ ^ sourceWord ].
>>>     unAlpha := 255 - alpha.
>>>     colorMask := 16rFF.
>>>     result := 0.
>>>
>>>     "red"
>>>     shift := 0.
>>>     blend := ((sourceWord >> shift bitAnd: colorMask) * alpha) +
>>>                 ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
>>>                  + 254 // 255 bitAnd: colorMask.
>>>     result := result bitOr: blend << shift.
>>>     "green"
>>>     shift := 8.
>>>     blend := ((sourceWord >> shift bitAnd: colorMask) * alpha) +
>>>                 ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
>>>                  + 254 // 255 bitAnd: colorMask.
>>>     result := result bitOr: blend << shift.
>>>     "blue"
>>>     shift := 16.
>>>     blend := ((sourceWord >> shift bitAnd: colorMask) * alpha) +
>>>                 ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
>>>                  + 254 // 255 bitAnd: colorMask.
>>>     result := result bitOr: blend << shift.
>>>     "alpha (pre-multiplied)"
>>>     shift := 24.
>>>     blend := (alpha * 255) +
>>>                 ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
>>>                  + 254 // 255 bitAnd: colorMask.
>>>     result := result bitOr: blend << shift.
>>>     ^ result
>>>
>>>
>>> Of course, the best we could do to improve it is using a native OS
>>> library when it exists on the whole bitmap. I let this path apart, it can
>>> be handled at platform specific source like tim did for Pi.
>>> But still, with our own crafted bits, we could do better than current
>>> implementation.
>>> See
>>> http://stackoverflow.com/questions/1102692/how-to-do-alpha-blend-fast
>>>
>>> Using specific hardware instructions by ourselves is not really an
>>> option for a portable VM, it's better to call a native library if we cant
>>> to have specific optimizations, so i let SSE instructions apart.
>>>
>>> But there are two simple ideas we can recycle from above SO reference:
>>>
>>> 1) multiplex Red+Blue and Alpha+Green computations
>>> 2) avoid division by 255
>>>
>>> Here it is:
>>>
>>>     "red and blue"
>>>     blend := ((sourceWord bitAnd: 16rFF00FF) * alpha) +
>>>                 ((destinationWord bitAnd: 16rFF00FF) * unAlpha) +
>>> 16rFE00FE.
>>>     "divide by 255"
>>>     blend := blend + 16r10001 + (blend >> 8 bitAnd: 16rFF00FF) >> 8.
>>>
>> I forgot to protect  bitAnd: 16rFF00FF but you get the idea...
>>
>>
>>>      result := blend.
>>>
>>>     "alpha and green"
>>>     blend := (((sourceWord>> 8 bitOr: 16rFF0000) bitAnd: 16rFF00FF) *
>>> alpha) +
>>>                 ((destinationWord>>8 bitAnd: 16rFF00FF) * unAlpha) +
>>> 16rFE00FE.
>>>     "divide by 255"
>>>     blend := blend + 16r10001 + (blend >> 8 bitAnd: 16rFF00FF) >> 8.
>>>
>>
>> bitAnd: 16rFF00FF too of course...
>>
>>
>>>     result := result bitOr: blend<<8.
>>>     ^ result
>>>
>>> For bytes B1 and B2 in (0..255), alpha*B1+unAlpha*B2 is in (0..16rFE01)
>>> alpha*B1+unAlpha*B2+254 is in (0..16rFEFF)
>>> So when we multiplex non adjacent components, we're safe from overflow.
>>>
>>> Now for division by 255 we are also safe: when adding 1 -> (1..16rFF00)
>>> And when adding blend>>8 bitAnd 16rFF -> (1..16rFFFF)
>>> We are still free of overflow and can extend the //255 division trick to
>>> 32bit word (the formula given on SO is for 16bit only).
>>>
>>> I expect roughly a x2 factor in throughput, but it's hard to measure.
>>> What do you think? Is this interesting?
>>>
>>
>> Find corresponding code attached
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131224/7ab24807/attachment.htm