<p></p>

<p dir="auto">In depth 1, the resulting bits will always be 0.<br><br>

It's not a big problem because rgbMul is just a bitAnd operation at this depth.<br><br>

So a quick workaround would be to detect the case in BitBltSimulation</p>

<pre class="notranslate"><code class="notranslate">destDepth = 1 ifTrue: [^self bitAnd: sourceWord with: destinationWord].

</code></pre>

<p dir="auto">That would also accelerate the Bit BLock Transfer operation, so it's a good hack.</p>

<p dir="auto">But there is more. What we want is multiply ratios in interval [0,1].</p>

<p dir="auto">dstRatio * srcRatio</p>

<p dir="auto">Our implementation is scaled ratio (scaled by <code class="notranslate">1 << nBits - 1</code>):</p>

<pre class="notranslate"><code class="notranslate">src := (srcRatio * scale) rounded.

dst := (dstRatio * scale) rounded.

</code></pre>

<p dir="auto">So what we want is:</p>

<pre class="notranslate"><code class="notranslate">((dst/scale) * (src/scale) * scale) rounded

</code></pre>

<p dir="auto">that is:</p>

<pre class="notranslate"><code class="notranslate">(dst*src / (1<<nBits-1)) rounded

</code></pre>

<p dir="auto">Unfortunately, that's the other grief with the current implementation used for rounding:</p>

<pre class="notranslate"><code class="notranslate">(dst+1)*(src+1) - 1 >> nBits

</code></pre>

<p dir="auto">It only equals correctly rounded operation for depths 2 and 4.</p>

<p dir="auto">For rounding we might use:</p>

<pre class="notranslate"><code class="notranslate">(((dst/scale) * (src/scale) + 0.5) * scale) truncated.

</code></pre>

<p dir="auto">that is expressed with truncated division:</p>

<pre class="notranslate"><code class="notranslate">dst*src + (scale+1//2) // scale

</code></pre>

<p dir="auto">So here is a nicer formulation for doing the job at any depth (including 5bits rgb channels for 16 bits depth) with correctly rounded division:</p>

<pre class="notranslate"><code class="notranslate">aux := src * dst + (1 << (nBits - 1)). "add mid-scale for rounding"

result := aux << (nBits - 1) + aux << (nBits -1). "divide by scale"

</code></pre>

<p dir="auto">This is because instead of dividing by scale, we can multiply by shifted inverse (sort of double precision), then shift right.</p>

<pre class="notranslate"><code class="notranslate">(2 to: 32) allSatisfy: [:nBits | (1 << (nBits * 2) / (1 << nBits - 1)) rounded = (1 << nBits + 1)].

</code></pre>

<p dir="auto">Multiplying by this inverse is easy and cheap:</p>

<pre class="notranslate"><code class="notranslate">x * (1 << nBits + 1) = (x << nBits + x).

</code></pre>

<p dir="auto">And then applying the right shift <code class="notranslate">>> (2 * nBits)</code> is equivalent to:</p>

<pre class="notranslate"><code class="notranslate">x >> nBits + x >> nBits.

</code></pre>

<p dir="auto">We must first add 0.5 (scaled), that is <code class="notranslate">src * dst + (1 << (nBits -1))</code> - our formulation of aux, and we're done.</p>

<p dir="auto">We verify:</p>

<pre class="notranslate"><code class="notranslate">     {

        (0 to: 1<<20-1) allSatisfy: [:i | (1<<9+i)>>10+ (1<<9+i)>>10 = (i/1023) rounded].

        (0 to: 1<<18-1) allSatisfy: [:i | (1<<8+i)>>9+ (1<<8+i)>>9 = (i/511) rounded].

        (0 to: 1<<16-1) allSatisfy: [:i | (1<<7+i)>>8+ (1<<7+i)>>8 = (i/255) rounded].

        (0 to: 1<<14-1) allSatisfy: [:i | (1<<6+i)>>7+ (1<<6+i)>>7 = (i/127) rounded].

        (0 to: 1<<12-1) allSatisfy: [:i | (1<<5+i)>>6+ (1<<5+i)>>6 = (i/63) rounded].

        (0 to: 1<<10-1) allSatisfy: [:i | (1<<4+i)>>5+ (1<<4+i)>>5 = (i/31) rounded].

        (0 to: 1<<8-1) allSatisfy: [:i |  (1<<3+i)>>4+ (1<<3+i)>>4 = (i/15) rounded].

        (0 to: 1<<6-1) allSatisfy: [:i |  (1<<2+i)>>3+ (1<<2+i)>>3 = (i/7) rounded].

        (0 to: 1<<4-1) allSatisfy: [:i |  (1<<1+i)>>2+ (1<<1+i)>>2 = (i/3) rounded].

} allSatisfy: #yourself.

</code></pre>

<p dir="auto">The nice thing is that above down-scaling operation can be multiplexed.<b><br>

Suppose that we have p groups of 2*nBits <code class="notranslate">M</code> holding square-scale multiplication of each channel concatenated in a double-Word-Mul.</b></p><b>

<pre class="notranslate"><code class="notranslate">doubleWordMul = Mp .... M5 M3 M1

</code></pre>

<p dir="auto">Note we arrange to have odd channels in low word, and even channels in high word.</p>

<p dir="auto">We first form a <code class="notranslate">groupMask</code> on a word with (p+1)/2 groups of nBits alternating all one <code class="notranslate">i</code> and all zero <code class="notranslate">o</code>, <code class="notranslate">oioi...ioi</code>.<br></p>

<pre class="notranslate"><code class="notranslate">channelMask := 1 << nBits - 1.

groupMask := 0.

0 to: wordBits // (2 * nBits) do: [:i |

    groupMask = groupMask << (2 * nBits) + channelMask].

</code></pre>

<p dir="auto">Where wordBits is the number of bits in a word (usually we want to operate on 32 bits words in BitBlt).</p>

<p dir="auto">We form the <code class="notranslate">doubleGroupMask</code> on a double-word with p groups of 2*nBits <code class="notranslate">oi</code>:</p>

<pre class="notranslate"><code class="notranslate">doubleGroupMask := groupMask >> nBits.

doubleGroupMask := doubleGroupMask << wordBits + groupMask.

</code></pre>

<p dir="auto">And we perform the division by scale:</p>

<pre class="notranslate"><code class="notranslate">doubleWordMul := (doubleWordMul >> nBits bitAnd: doubleGroupMask) + doubleWord >> nBits bitAnd: doubleGroupMask.

</code></pre>

<p dir="auto">At this stage we obtain a double word containing scaled multiplicands interleaved with groups of nBits zeros:</p>

<p dir="auto">o mp ... o m3 o m1</p>

<p dir="auto">Now the final result can be obtained by shifting back:</p>

<pre class="notranslate"><code class="notranslate">doubleWordMul >> (wordBits - nBits) + (doubleWordMul bitAnd: groupMask)

</code></pre>

<p dir="auto">The only problem remaining is how to obtain the squared-scale multiplicands. It would be easy to form the alternate even-odd channels for each src and dst operands:</p>

<p dir="auto">doubleWordSrc := src >> nBits bitAnd: groupMask.<br>

doubleWordSrc := doubleWordSrc << wordBits + (src bitAnd: groupMask).<br>

doubleWordDst := dst >> nBits bitAnd: groupMask.<br>

doubleWordDst := doubleWordDst << wordBits + (dst bitAnd: groupMask).</p>

<p dir="auto">we now get <code class="notranslate">o sp ...  o s3 o s1</code> and <code class="notranslate">0 dp ... o d3 o d1</code>, but we would now need a SIMD integer multiplication operating on groups of 2*nBits in parallel... We don't have that, at least in portable C code. So we still have to emulate it with a loop.</p>

<p dir="auto">half := 1 << (nBits - 1).<br>

shift := 0.<br>

doubleWordMul  := 0<br>

0 to: nChannels - 1 do: [:i |<br>

doubleWordMul := doubleWordMul + (((doubleWordSrc >> shift bitAnd: channelMask) * (doubleWordSrc >> shift bitAnd: channelMask) + half) << shift).<br>

shift := shift + nBits + nBits].</p>

<p dir="auto">We know that each operation cannot overflow on upper neighbour group of 2*nBits, because the maximum value is:</p>

<p dir="auto">(1<<nBits-1) squared + (1 << (nBits-1)) = 1 << (2<em>nBits) - (2</em>(1<<nBits)) + (1 << (nBits-1)) - 1<br>

< (1 << (2*nBits) - 1)</p>

<p dir="auto">It remains the odd case of 16 bits depth, which has 3 groups of 5 bits and a leading zero.<br>

I believe that above algorithm works without splitting in two half-words...<br>

To be tested.</p>

<p dir="auto">We have gathered the pieces for a correctly rounded almost-multiplexed rgbMul.<br><br>

Somehow have our cake and eat it too.</p></b>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />Reply to this email directly, <a href="https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/651">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AIJPEW6NMBJBG7QDWV25KXLV25A3PANCNFSM57SPOYNQ">unsubscribe</a>.<br />You are receiving this because you are subscribed to this thread.<img src="https://github.com/notifications/beacon/AIJPEW2KVTJAGE377ROIF63V25A3PA5CNFSM57SPOYN2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4UEBM53Q.gif" height="1" width="1" alt="" /><span style="color: transparent; font-size: 0; display: none; visibility: hidden; overflow: hidden; opacity: 0; width: 0; height: 0; max-width: 0; max-height: 0; mso-hide: all">Message ID: <span><OpenSmalltalk/opensmalltalk-vm/issues/651</span><span>@</span><span>github</span><span>.</span><span>com></span></span></p>

<script type="application/ld+json">[

{

"@context": "http://schema.org",

"@type": "EmailMessage",

"potentialAction": {

"@type": "ViewAction",

"target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/651",

"url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/651",

"name": "View Issue"

},

"description": "View this Issue on GitHub",

"publisher": {

"@type": "Organization",

"name": "GitHub",

"url": "https://github.com"

}

}

]</script>