[squeak-dev] Rounding in MatrixTransform2x3

Stephan Lutz dev at stlutz.net
Wed Aug 12 10:11:49 UTC 2020


Ah, yes. I hadn't even thought about that. Probably because I've only 
ever used Squeak on 64bits with immediate floats ^^.

I followed your suggestion and implemented a Matrix2x3 in pure Smalltalk.
It's actually really fast and surprisingly even beats the existing 
implementation in quite a few cases. :D
Most notably, the plugin-supported point transformation 
(/localPointToGlobal:/) is actually slower on my machine.
Only the transformation of multiple points at a time, as done in 
/localBoundsToGlobal:/ is significantly (~3x) faster using the plugin.

Below are some of the measurements I have taken in a 64bit trunk image 
using squeak.cog.spur_linux64x64:

    "--------------------------------"
    " localPointToGlobal: "
    "--------------------------------"
    [mat2x3Old localPointToGlobal: -10 @ 10] bench.
    " '11,700,000 per second. 85.6 nanoseconds per run. 1.43971 % GC time.'"

    [mat2x3Old transformPoint: -10 @ 10] bench.
    " '2,330,000 per second. 429 nanoseconds per run. 0.37985 % GC time.'"

    [mat2x3New localPointToGlobal: -10 @ 10] bench.
    " '12,500,000 per second. 80.3 nanoseconds per run. 1.89962 % GC time.'"

    [morphic localPointToGlobal: -10 @ 10] bench.
    " '2,710,000 per second. 370 nanoseconds per run. 1.16 % GC time.'"

    "--------------------------------"
    " localBoundsToGlobal: "
    "--------------------------------"
    [mat2x3Old localBoundsToGlobal: rect] bench.
    " '6,770,000 per second. 148 nanoseconds per run. 1.67966 % GC time.'"

    [mat2x3New localBoundsToGlobal: rect] bench.
    " '2,090,000 per second. 480 nanoseconds per run. 1.55969 % GC time.'"

    [morphic localBoundsToGlobal: rect] bench.
    " '505,000 per second. 1.98 microseconds per run. 1.95922 % GC time.'"

    "--------------------------------"
    " localBoundsToGlobal: (pure translation) "
    "--------------------------------"
    [mat2x3OldTranslation localBoundsToGlobal: rect] bench.
    " '6,780,000 per second. 147 nanoseconds per run. 1.7 % GC time.'"

    [mat2x3NewTranslation localBoundsToGlobal: rect] bench.
    " '5,860,000 per second. 171 nanoseconds per run. 1.48 % GC time.'"

    [morphicTranslation localBoundsToGlobal: rect] bench.
    " '1,580,000 per second. 631 nanoseconds per run. 4.12 % GC time.'"

    "--------------------------------"
    "composedWithLocal:"
    "--------------------------------"
    [mat2x3Old composedWithLocal: mat2x3OldRotation] bench.
    " '9,670,000 per second. 103 nanoseconds per run. 1.19976 % GC time.'"

    [mat2x3New composedWithLocal: mat2x3NewRotation] bench.
    " '6,920,000 per second. 144 nanoseconds per run. 1.4997 % GC time.'"

    [morphic composedWithLocal: morphicRotation] bench.
    " '11,100,000 per second. 89.8 nanoseconds per run. 1.09978 % GC time.'"

    "--------------------------------"
    " instance creation "
    "--------------------------------"
    [MatrixTransform2x3 withOffset: offset] bench.
    " '3,320,000 per second. 301 nanoseconds per run. 1.91962 % GC time.'"

    [Matrix2x3 withOffset: offset] bench.
    " '24,800,000 per second. 40.3 nanoseconds per run. 10.63787 % GC
    time.'"

    [MorphicTransform offset: offset] bench.
    " '43,600,000 per second. 22.9 nanoseconds per run. 7.63847 % GC time.'"


There are quite a few more benchmarks in the attached file.
I have also attached a change set of the implementation I used, so you 
can try it out for yourselves if you'd like :)

Cheers
Stephan

On 28.07.20 19:33, Vanessa Freudenberg wrote:
> On Tue, Jul 28, 2020 at 4:34 AM Stephan Lutz <dev at stlutz.net 
> <mailto:dev at stlutz.net>> wrote:
>
>     While transforming points using MatrixTransform2x3 we noticed some
>     strange rounding behavior:
>
>         "with plugin"
>         (MatrixTransform2x3 withOffset: 5 @ 10) localPointToGlobal:
>         0 at 0. "5 at 10"
>         (MatrixTransform2x3 withOffset: -5 @ -10) localPointToGlobal:
>         0 at 0. "-4@ -9"
>
>         "without plugin"
>         ((MatrixTransform2x3 withOffset: 5 @ 10) transformPoint: 0 at 0)
>         rounded. "5 at 10"
>         ((MatrixTransform2x3 withOffset: -5 @ -10) transformPoint:
>         0 at 0) rounded. "-5@ -10"
>
>     It appears the code used to round in the plugin simply adds 0.5
>     and truncates the result, which does not work correctly for
>     negative numbers.
>     This code can be found in Matrix2x3Plugin >>
>     #roundAndStoreResultPoint: and Matrix2x3Plugin >>
>     #roundAndStoreResultRect:x0:y0:x1:y1: .
>
>     ----
>
>     On a kind of related note: Is there even a reason to round the
>     resulting floats?
>
>     While the class comment of MatrixTransform2x3 notes that this
>     behavior is intentional, glancing quickly over its uses we could
>     not find anything taking advantage or benefiting from it. It's
>     also not a limitation of the DisplayTransform interface, since
>     MorphicTransform does produce floating point values. Wouldn't it
>     be much more versatile and easier to leave rounding to users if
>     they actually need it?
>
> No. Having a float result means that the primitive would need to 
> allocate two Float objects. Any allocation can fail due to memory 
> exhaustion. So the primitive would have to be made to retry the 
> allocation after running a garbage collection.
>
> Secondly, its results are primarily used to set up a WarpBlt IIRC, for 
> drawing rotated user objects in Etoys. WarpBlt fails if the coords are 
> not integers. The failure code rounds the numbers and retries. Doing 
> the rounding in the matrix primitives ensured a fast path to rendering 
> - that's why it was done that way.
>
> So, there are very good reasons why the plugin returns integers. And 
> there are Squeak VMs where this still is a very reasonable behavior. 
> It also would be a good idea to document the reasoning in the class 
> comment of MatrixTransform2x3.
>
> That being said, there is virtually no reason to use it when running 
> on Cog, much less Sista, especially on 64 bits where we have immediate 
> floats. An interesting thing would be to compare a pure Smalltalk 
> implementation to the performance of the plugin. If you need floating 
> point transform results, just write it in Smalltalk, would be my 
> suggestion.
>
> - Vanessa -
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200812/4b5ea283/attachment-0001.html>
-------------- next part --------------
rect := 100 at 200 extent: 300 at 400.
offset := 500 at 600.
angle := 42.
mat2x3Old := (MatrixTransform2x3 withRotation: 70) offset: offset.
mat2x3New := (Matrix2x3 withRotation: 70) setOffset: offset.
morphic := (MorphicTransform offset: offset negated angle: 70 scale: 1).

mat2x3OldTranslation := MatrixTransform2x3 withOffset: offset.
mat2x3NewTranslation := Matrix2x3 withOffset: offset.
morphicTranslation := MorphicTransform offset: offset negated.

mat2x3OldRotation := MatrixTransform2x3 withRotation: angle.
mat2x3NewRotation := Matrix2x3 withRotation: angle.
morphicRotation := MorphicTransform offset: 0 at 0 angle: angle negated degreesToRadians scale: 1.0.

mat2x3OldIdentity := MatrixTransform2x3 identity.
mat2x3NewIdentity := Matrix2x3 identity.
morphicIdentity := MorphicTransform identity.


"--------------------------------"
" localPointToGlobal: "
"--------------------------------"
[mat2x3Old localPointToGlobal: -10 @ 10] bench.
" '11,700,000 per second. 85.6 nanoseconds per run. 1.43971 % GC time.'"

[mat2x3Old transformPoint: -10 @ 10] bench.
" '2,330,000 per second. 429 nanoseconds per run. 0.37985 % GC time.'"

[mat2x3New localPointToGlobal: -10 @ 10] bench.
" '12,500,000 per second. 80.3 nanoseconds per run. 1.89962 % GC time.'"

[morphic localPointToGlobal: -10 @ 10] bench.
" '2,710,000 per second. 370 nanoseconds per run. 1.16 % GC time.'"

"--------------------------------"
" localBoundsToGlobal: "
"--------------------------------"
[mat2x3Old localBoundsToGlobal: rect] bench.
" '6,770,000 per second. 148 nanoseconds per run. 1.67966 % GC time.'"

[mat2x3New localBoundsToGlobal: rect] bench.
" '2,090,000 per second. 480 nanoseconds per run. 1.55969 % GC time.'"

[morphic localBoundsToGlobal: rect] bench.
" '505,000 per second. 1.98 microseconds per run. 1.95922 % GC time.'"

"--------------------------------"
" localBoundsToGlobal: (pure translation) "
"--------------------------------"
[mat2x3OldTranslation localBoundsToGlobal: rect] bench.
" '6,780,000 per second. 147 nanoseconds per run. 1.7 % GC time.'"

[mat2x3NewTranslation localBoundsToGlobal: rect] bench.
" '5,860,000 per second. 171 nanoseconds per run. 1.48 % GC time.'"

[morphicTranslation localBoundsToGlobal: rect] bench.
" '1,580,000 per second. 631 nanoseconds per run. 4.12 % GC time.'"

"--------------------------------"
" globalPointToLocal: "
"--------------------------------"
[mat2x3Old globalPointToLocal: -10 @ 10] bench.
" '11,300,000 per second. 88.3 nanoseconds per run. 1.79964 % GC time.'"

[mat2x3Old invertPoint: -10 @ 10] bench.
" '2,470,000 per second. 405 nanoseconds per run. 0.45991 % GC time.'"

[mat2x3New globalPointToLocal: -10 @ 10] bench.
" '8,950,000 per second. 112 nanoseconds per run. 1.5197 % GC time.'"

[morphic globalPointToLocal: -10 @ 10] bench.
" '2,720,000 per second. 368 nanoseconds per run. 1.19976 % GC time.'"

"--------------------------------"
"globalBoundsToLocal:"
"--------------------------------"
[mat2x3Old globalBoundsToLocal: rect] bench.
" '6,210,000 per second. 161 nanoseconds per run. 1.69966 % GC time.'"

[mat2x3New globalBoundsToLocal: rect] bench.
" '1,720,000 per second. 581 nanoseconds per run. 1.41972 % GC time.'"

[morphic globalBoundsToLocal: rect] bench.
" '465,000 per second. 2.15 microseconds per run. 2.08 % GC time.'"

"--------------------------------"
"globalBoundsToLocal: (pure translation)"
"--------------------------------"
[mat2x3OldTranslation globalBoundsToLocal: rect] bench.
" '6,280,000 per second. 159 nanoseconds per run. 1.73965 % GC time.'"

[mat2x3NewTranslation globalBoundsToLocal: rect] bench.
" '4,060,000 per second. 246 nanoseconds per run. 1.13977 % GC time.'"

[morphicTranslation globalBoundsToLocal: rect] bench.
" '1,500,000 per second. 668 nanoseconds per run. 4.36 % GC time.'"

"--------------------------------"
"composedWithLocal:"
"--------------------------------"
[mat2x3Old composedWithLocal: mat2x3OldRotation] bench.
" '9,670,000 per second. 103 nanoseconds per run. 1.19976 % GC time.'"

[mat2x3New composedWithLocal: mat2x3NewRotation] bench.
" '6,920,000 per second. 144 nanoseconds per run. 1.4997 % GC time.'"

[morphic composedWithLocal: morphicRotation] bench.
" '11,100,000 per second. 89.8 nanoseconds per run. 1.09978 % GC time.'"

"--------------------------------"
" inverseTransformation "
"--------------------------------"
[mat2x3Old inverseTransformation] bench.
" '584,000 per second. 1.71 microseconds per run. 0.55989 % GC time.'"

[mat2x3New inverseTransformation] bench.
" '2,450,000 per second. 408 nanoseconds per run. 2.27954 % GC time.'"

[morphic inverseTransformation] bench.
" '1,230,000 per second. 813 nanoseconds per run. 1.19976 % GC time.'"

"--------------------------------"
" = "
"--------------------------------"
mx := mat2x3Old copy.
[mat2x3Old = mx] bench.
" '16,900,000 per second. 59.1 nanoseconds per run. 0 % GC time.'"

mx := mat2x3New copy.
[mat2x3New = mx] bench.
" '17,500,000 per second. 57.1 nanoseconds per run. 0 % GC time.'"

mx := morphic copy.
[morphic = mx] bench. "identity check"
" '97,600,000 per second. 10.3 nanoseconds per run. 0 % GC time.'"

"--------------------------------"
" instance creation "
"--------------------------------"
[MatrixTransform2x3 withOffset: offset] bench.
" '3,320,000 per second. 301 nanoseconds per run. 1.91962 % GC time.'"

[Matrix2x3 withOffset: offset] bench.
" '24,800,000 per second. 40.3 nanoseconds per run. 10.63787 % GC time.'"

[MorphicTransform offset: offset] bench.
" '43,600,000 per second. 22.9 nanoseconds per run. 7.63847 % GC time.'"

"--------------------------------"
" isPureTranslation "
"--------------------------------"
[mat2x3Old isPureTranslation] bench.
" '22,000,000 per second. 45.4 nanoseconds per run. 0 % GC time.'"

[mat2x3New isPureTranslation] bench.
" '77,000,000 per second. 13 nanoseconds per run. 0 % GC time.'"

[morphic isPureTranslation] bench.
" '76,600,000 per second. 13.1 nanoseconds per run. 0 % GC time.'"

"--------------------------------"
" isPureTranslation (pure translation)"
"--------------------------------"
[mat2x3OldTranslation isPureTranslation] bench.
" '21,000,000 per second. 47.7 nanoseconds per run. 0 % GC time.'"

[mat2x3NewTranslation isPureTranslation] bench.
" '38,700,000 per second. 25.8 nanoseconds per run. 0 % GC time.'"

[morphicTranslation isPureTranslation] bench.
" '60,400,000 per second. 16.6 nanoseconds per run. 0 % GC time.'"

"--------------------------------"
" isIdentity "
"--------------------------------"
[mat2x3Old isIdentity] bench.
" '23,700,000 per second. 42.1 nanoseconds per run. 0 % GC time.'"

[mat2x3New isIdentity] bench.
" '76,500,000 per second. 13.1 nanoseconds per run. 0 % GC time.'"

[morphic isIdentity] bench.
" '58,400,000 per second. 17.1 nanoseconds per run. 0 % GC time.'"

"--------------------------------"
" isIdentity (identity) "
"--------------------------------"
[mat2x3OldIdentity isIdentity] bench.
" '22,100,000 per second. 45.1 nanoseconds per run. 0 % GC time.'"

[mat2x3NewIdentity isIdentity] bench.
" '28,000,000 per second. 35.8 nanoseconds per run. 0 % GC time.'"

[morphicIdentity isIdentity] bench.
" '19,500,000 per second. 51.2 nanoseconds per run. 0.89982 % GC time.'"

"--------------------------------"
"--------------------------------"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Matrix2x3.1.cs
Type: text/x-csharp
Size: 8534 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200812/4b5ea283/attachment-0001.bin>


More information about the Squeak-dev mailing list