[Vm-dev] Re: Little speed up of BitBlt alpha-blending
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Mon Dec 23 15:56:07 UTC 2013
2013/12/23 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
> Currently we use a very clear but naive algorithm
>
> alpha := sourceWord >> 24. "High 8 bits of source pixel"
> alpha = 0 ifTrue: [ ^ destinationWord ].
> alpha = 255 ifTrue: [ ^ sourceWord ].
> unAlpha := 255 - alpha.
> colorMask := 16rFF.
> result := 0.
>
> "red"
> shift := 0.
> blend := ((sourceWord >> shift bitAnd: colorMask) * alpha) +
> ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
> + 254 // 255 bitAnd: colorMask.
> result := result bitOr: blend << shift.
> "green"
> shift := 8.
> blend := ((sourceWord >> shift bitAnd: colorMask) * alpha) +
> ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
> + 254 // 255 bitAnd: colorMask.
> result := result bitOr: blend << shift.
> "blue"
> shift := 16.
> blend := ((sourceWord >> shift bitAnd: colorMask) * alpha) +
> ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
> + 254 // 255 bitAnd: colorMask.
> result := result bitOr: blend << shift.
> "alpha (pre-multiplied)"
> shift := 24.
> blend := (alpha * 255) +
> ((destinationWord>>shift bitAnd: colorMask) * unAlpha)
> + 254 // 255 bitAnd: colorMask.
> result := result bitOr: blend << shift.
> ^ result
>
>
> Of course, the best we could do to improve it is using a native OS library
> when it exists on the whole bitmap. I let this path apart, it can be
> handled at platform specific source like tim did for Pi.
> But still, with our own crafted bits, we could do better than current
> implementation.
> See http://stackoverflow.com/questions/1102692/how-to-do-alpha-blend-fast
>
> Using specific hardware instructions by ourselves is not really an option
> for a portable VM, it's better to call a native library if we cant to have
> specific optimizations, so i let SSE instructions apart.
>
> But there are two simple ideas we can recycle from above SO reference:
>
> 1) multiplex Red+Blue and Alpha+Green computations
> 2) avoid division by 255
>
> Here it is:
>
> "red and blue"
> blend := ((sourceWord bitAnd: 16rFF00FF) * alpha) +
> ((destinationWord bitAnd: 16rFF00FF) * unAlpha) +
> 16rFE00FE.
> "divide by 255"
> blend := blend + 16r10001 + (blend >> 8 bitAnd: 16rFF00FF) >> 8.
>
I forgot to protect bitAnd: 16rFF00FF but you get the idea...
> result := blend.
>
> "alpha and green"
> blend := (((sourceWord>> 8 bitOr: 16rFF0000) bitAnd: 16rFF00FF) *
> alpha) +
> ((destinationWord>>8 bitAnd: 16rFF00FF) * unAlpha) +
> 16rFE00FE.
> "divide by 255"
> blend := blend + 16r10001 + (blend >> 8 bitAnd: 16rFF00FF) >> 8.
>
bitAnd: 16rFF00FF too of course...
> result := result bitOr: blend<<8.
> ^ result
>
> For bytes B1 and B2 in (0..255), alpha*B1+unAlpha*B2 is in (0..16rFE01)
> alpha*B1+unAlpha*B2+254 is in (0..16rFEFF)
> So when we multiplex non adjacent components, we're safe from overflow.
>
> Now for division by 255 we are also safe: when adding 1 -> (1..16rFF00)
> And when adding blend>>8 bitAnd 16rFF -> (1..16rFFFF)
> We are still free of overflow and can extend the //255 division trick to
> 32bit word (the formula given on SO is for 16bit only).
>
> I expect roughly a x2 factor in throughput, but it's hard to measure.
> What do you think? Is this interesting?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131223/e4091a01/attachment.htm
More information about the Vm-dev
mailing list