[Vm-dev] Re: Nice bit trick for component add

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Dec 27 17:06:02 UTC 2013


This has been capitalized at http://bugs.squeak.org/view.php?id=7802
I use the old squeak mantis right now rather than cog issue tracker
(abandonned?) or newest pharo fogbugz (restricted?)
What do VM maintainers prefer?
I don't really like the balkanisation in progress, and hope we can converge
on some better communalisation again.


2013/12/26 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>

>
> 2013/12/25 Eliot Miranda <eliot.miranda at gmail.com>
>
>>
>>
>>
>>
>> On Tue, Dec 24, 2013 at 1:35 PM, Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>
>>>
>>> Sure, I agree but this would require platform specific code...
>>> As I developped in the other thread, an alternative is to use native
>>> libraries where possible - that is when they perform an equivalent job (It
>>> would be Quartz or something like that for me).
>>> Such platform specific support will be harder to obtain, because it
>>> requires knowledge of VM plugin+external library or low level assembler...
>>> Plus the pain of diving into historical VM architecture (specifically in
>>> BitBlt, it's not that easy to get the full picture when you did not
>>> participate to the development - like me).
>>> I note that there is plenty of work required in this area, sort of
>>> technical debt. For example my MacOSX VMs do not support the little endian
>>> image formats yes (those with negative depth) which could be surprising if
>>> we didn't knew historical roots...
>>> ^Display supportedDisplayDepths
>>>
>>> So, in a word, such platform specific support can certainly provide
>>> great rewards, but with significative investments.
>>> On the other hand, these little tricks are like harvesting low hanging
>>> fruits.
>>> They are made of pure C^H slang, thus valid on every platform which does
>>> not yet provide specific accelerated support.
>>> I'm lazy and just musing, so the fastest ROI is appealing.
>>>
>>> Remember my little text composition/display benchs?
>>>
>>>     | text canvas m1 |
>>>     canvas := FormCanvas extent:  569 at 399 depth: 32.
>>>   text := Compiler evaluate: (FileStream fileNamed: 'text.st')
>>> contentsOfEntireFile.
>>>     m1 := TextMorph new.
>>>     m1 text: text textStyle: TextStyle default.
>>>     m1 wrapFlag: true.
>>>     m1 extent: 569 at 9999999.
>>>    MessageTally spyOn: [ Time millisecondsToRun: [100 timesRepeat: [
>>>         m1 drawOn: canvas]]]
>>>  Interpreter VM 4.10.10 Before bit hacks: 1425 1411 1403
>>>  Interpreter VM After bit hacks: 1152 1173
>>>
>>> 15 to 20% less, it's not really impressive, but the hurdle is low, just
>>> two bit-hacks applied to rules 20 24 30 & 31
>>>
>>
>> On the contrary.  That kind of speed-up in a mature tuned numerical
>> algorithm is impressive.  Thanks!
>>
>
> Hem, except I messed up a bit the partittionAdd: ...
> Here is a correct version:
>
>     w1 := word1 bitAnd: carryOverflowMask. "mask to remove high bit of
> each component"
>     w2 := word2 bitAnd: carryOverflowMask.
>     sum := (word1 bitXor: w1)+(word2 bitXor: w2). "sum without high bit to
> avoid overflowing over next component"
>     carryOverflow := (w1 bitAnd: w2) bitOr: ((w1 bitOr: w2) bitAnd: sum).
> "detect overflow condition for saturating"
>     ^((sum bitXor: w1)bitXor:w2) "sum high bit without overflow"
>         bitOr: carryOverflow>>(nBits-1) * componentMask "saturate in case
> of overflow"
>
> And here is a simple test:
>
> pb := OrderedCollection new.
> carryOverflowMask := 2r10101010.
> componentMask := 2r11.
> nBits := 2.
> (0 to: 16rFF) do: [:word1 |
> (0 to: 16rFF) do: [:word2 |
>     w1 := (word1 bitAnd: carryOverflowMask).
>     w2 := (word2 bitAnd: carryOverflowMask).
>     sum := (word1 bitXor: w1)+(word2 bitXor: w2).
>     carryOverflow := w1 + w2 + (sum bitAnd: carryOverflowMask).
>     res1 := (((sum bitXor:w1)bitXor:w2) bitOr: (carryOverflow bitAnd:
> carryOverflowMask)>>(nBits-1) * componentMask).
>     res2 :=
>         ((word1 bitAnd: 2r11)+(word2 bitAnd: 2r11) min: 2r11)+
>         ((word1 bitAnd: 2r1100)+(word2 bitAnd: 2r1100) min: 2r1100)+
>         ((word1 bitAnd: 2r110000)+(word2 bitAnd: 2r110000) min: 2r110000)+
>         ((word1 bitAnd: 2r11000000)+(word2 bitAnd: 2r11000000) min:
> 2r11000000).
>     (res1 = res2) ifFalse: [pb add: {word1 printStringBase: 2. word2
> printStringBase: 2. res1 printStringBase: 2. res2 printStringBase: 2.}]]].
> ^pb
>
> Fortunately, performance gain is preserved :)
>
>
>
>>>
>>> 2013/12/24 tim Rowledge <tim at rowledge.org>
>>>
>>>>
>>>> If you (or anyone else, of course) are interested in really speeding up
>>>> bitblt, it would likely be worth looking at the ARM specific speedups Ben
>>>> Avison did for the PI. (Look in platforms/Cross/plugins/BitBltPlugin) and
>>>> seeing if similar tricks could be done with the assorted media instructions
>>>> in current i7 etc cpus.
>>>>
>>>> tim
>>>> --
>>>> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
>>>> Experience is something you don't get until just after you need it.
>>>>
>>>
>> --
>> best,
>> Eliot
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131227/211a0b1a/attachment.htm


More information about the Vm-dev mailing list