Is it worth delaying the release until after the 3.9 upgrade and the VMMaker upgrade? I've just started moving to a 3.9 image. There are 24 failing tests, most are because 3.9 produces different bytecodes than 3.8 for some things. This seems to be due to block processing but I haven't fully investigated.
There are a few other failures including one due to a bug in 3.9.
Is it more useful for me to release as is. A solid release on 3.8 before moving to 3.9 or to upgrade everything now. I'm going to upgrade next anyway. I'm not going to maintain different tests for both releases so once I fix them for 3.9 they will fail in 3.8.
Bryce
On Nov 12, 2006, at 12:28 PM, bryce@kampjes.demon.co.uk bryce@kampjes.demon.co.uk wrote:
Is it more useful for me to release as is. A solid release on 3.8 before moving to 3.9 or to upgrade everything now. I'm going to upgrade next anyway. I'm not going to maintain different tests for both releases so once I fix them for 3.9 they will fail in 3.8.
Better to make a solid release for 3.8. That has value regardless of the existence of 3.9.
Colin
Hi Bryce, I think it is a good idea to release the solid 3.8 version.
Having said that, I am looking forward to the 3.9 release because I really want to try using Exupery on my sub-pixel font filtering algorithm to see if it can speed it up. Currently this is in 3.9, and I don't want to port it all back to an earlier image/vm, especially since you are moving forward to 3.9.
This is probably a topic for another thread, but could you tell from looking at the attached method if it is a good candidate for speed-up. It has nested loops, does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift: , bitAnd: , *, + , // , and some Float calcs.
Cheers, Andy
bryce@kampjes.demon.co.uk wrote in message news:17751.33773.546927.574750@gargle.gargle.HOWL...
Is it worth delaying the release until after the 3.9 upgrade and the VMMaker upgrade? I've just started moving to a 3.9 image. There are 24 failing tests, most are because 3.9 produces different bytecodes than 3.8 for some things. This seems to be due to block processing but I haven't fully investigated.
There are a few other failures including one due to a bug in 3.9.
Is it more useful for me to release as is. A solid release on 3.8 before moving to 3.9 or to upgrade everything now. I'm going to upgrade next anyway. I'm not going to maintain different tests for both releases so once I fix them for 3.9 they will fail in 3.8.
Bryce
begin 666 GlyphForm-asBalancedGlyphFormWithDepth32ItalicFIR.st M)T9R;VT@4W%U96%K,RXY9V%M;6$@;V8@)R<R,R!*=6QY(#(P,#8G)R!;;&%T M97-T('5P9&%T93H@(S<P-C9=(&]N(#$S($YO=F5M8F5R(#(P,#8@870@-SHS M.#HQ,"!P;2<A#0TA1VQY<&A&;W)M(&UE=&AO9'-&;W(Z("=C;VYV97)T:6YG M)R!S=&%M<#H@)W1W965N(#@O,3(O,C P-B Q.3HP-"<A#6%S0F%L86YC961' M;'EP:$9O<FU7:71H1&5P=&@S,DET86QI8T9)4CH@84)O;VQE86X-"2)R96-E M:79E<B!S:&]U;&0@8F4@,W@@W1R971C:&5D(#@@8FET($=L7!H1F]R;2(- M"7P@=R!H(',@86YS=V5R(')O=W-T87)T(&)Y=&5S('=O<F0@;&ET=&QE16YD M:6%N('-H:69T('8@82!C;VQO<E9A;"!I( T)("!P<F5V1R!P<F5V0B!R(&<@ M8B!N97AT4B!N97AT1R @9FEL=&5R<R!R9FEL=&5R(&=F:6QT97(@8F9I;'1E M<B!C;W)R96-T:6]N1F%C=&]R( T)8F%L4B!B86Q'(&)A;$(@;'5M:6YA;F-E M('P-"0T)8V]R<F5C=&EO;D9A8W1O<B Z/2!A0F]O;&5A;@T)"6EF5')U93I; M1G)E951Y<&5&;VYT('-U8E!I>&5L0V]L;W)#;W)R96-T:6]N1F%C=&]R271A M;&EC70T)"6EF1F%L<V4Z6T9R9654>7!E1F]N="!S=6)0:7AE;$-O;&]R0V]R M<F5C=&EO;D9A8W1O<ETN#0EF:6QT97)S(#H]($9R9654>7!E1F]N="!S=6)0 M:7AE;$9I;'1E<G,N#0ER9FEL=&5R(#H](&9I;'1E<G,@870Z(#$N#0EG9FEL M=&5R(#H](&9I;'1E<G,@870Z(#(N#0EB9FEL=&5R(#H](&9I;'1E<G,@870Z M(#,N#0EB>71E<R Z/2!S96QF(&)I=',N#0EW(#H]('-E;&8@=VED=&@N#0EH M(#H]('-E;&8@:&5I9VAT+@T)86YS=V5R(#H]('-E;&8@8VQA<W,@97AT96YT M.B H*'-E;&8@=VED=&@@+R S*2!C96EL:6YG("L@,BE :"!D97!T:#H@,S(N M#0EA;G-W97(@#0D);V9F<V5T.B H;V9F<V5T('@@+R S*2!R;W5N9&5D0&]F M9G-E="!Y.PT)"6%D=F%N8V4Z("AA9'9A;F-E("@,RD@<F]U;F1E9#L-"0EL M:6YE87)!9'9A;F-E.B!L:6YE87)!9'9A;F-E+@D-"7,@.CT@=R K(#,@/CX@ M,BX-"6QI='1L945N9&EA;B Z/2!S96QF(&ES3&ET=&QE16YD:6%N+@T),"!T M;SH@:" M(#$@9&\Z(%LZ>2!#0D)<F]W<W1A<G0@.CT@*'D@*B!S*2LQ+@T) M"7!R979'(#H]('!R979"(#H],"X-"0DP('1O.B!W("T@,2!B>3H@,R!D;SI; M.G@@?" -"0D),"!T;SH@,B!D;SI;.G-U8G!I>&5L:6YD97@@? T)"0D):2 Z M/2!X("L@<W5B<&EX96QI;F1E>"X-"0D)"7=O<F0@.CT@8GET97,@870Z(')O M=W-T87)T("L@*&DO+S0I+@T)"0D)<VAI9G0@.CT@+3@J("AL:71T;&5%;F1I M86X@#0D)"0D):694<G5E.EMI(&)I=$%N9#H@,UT@#0D)"0D):69&86QS93I; M,RTH:2!B:71!;F0Z(#,I72DN#0D)"0EV(#H]("AW;W)D(&)I=%-H:69T.B!S M:&EF="D@8FET06YD.B Q-G)&1BX-"0D)"7-U8G!I>&5L:6YD97@@/2 P(&EF M5')U93I;<B Z/2!V72X-"0D)"7-U8G!I>&5L:6YD97@@/2 Q(&EF5')U93I; M9R Z/2!V72X-"0D)"7-U8G!I>&5L:6YD97@@/2 R(&EF5')U93I;8B Z/2!V M75TN#0D)"7@@/CT@*'<M,RD-"0D)"6EF5')U93I;;F5X=%(@.CT@;F5X=$<@ M.CT@,%T-"0D)"6EF1F%L<V4Z6PT)"0D)"3 @=&\Z(#$@9&\Z6SIS=6)P:7AE M;&EN9&5X('P-"0D)"0D):2 Z/2!X("L@,R K('-U8G!I>&5L:6YD97@N#0D) M"0D)"7=O<F0@.CT@8GET97,@870Z(')O=W-T87)T("L@*&DO+S0I+@T)"0D) M"0ES:&EF=" Z/2 M."H@*&QI='1L945N9&EA;B -"0D)"0D)"6EF5')U93I; M:2!B:71!;F0Z(#-=( T)"0D)"0D):69&86QS93I;,RTH:2!B:71!;F0Z(#,I M72DN#0D)"0D)"78@.CT@*'=O<F0@8FET4VAI9G0Z('-H:69T*2!B:71!;F0Z M(#$V<D9&+@T)"0D)"0ES=6)P:7AE;&EN9&5X(#T@,"!I9E1R=64Z6VYE>'12 M(#H]('9=+@T)"0D)"0ES=6)P:7AE;&EN9&5X(#T@,2!I9E1R=64Z6VYE>'1' M(#H]('9=75TN#0D)"2)B86QA;F-E('(@9R!B(@D-"0D)8F%L4B Z/2 H<')E M=D<J*')F:6QT97(@870Z(#$I*2L-"0D)"2AP<F5V0BHH<F9I;'1E<B!A=#H@ M,BDI*PT)"0D)*'(J*')F:6QT97(@870Z(#,I*2L-"0D)"2AG*BAR9FEL=&5R M(&%T.B T*2DK#0D)"0DH8BHH<F9I;'1E<B!A=#H@-2DI+@T)"0EB86Q'(#H] M("AP<F5V0BHH9V9I;'1E<B!A=#H@,2DI*PT)"0D)*'(J*&=F:6QT97(@870Z M(#(I*2L-"0D)"2AG*BAG9FEL=&5R(&%T.B S*2DK#0D)"0DH8BHH9V9I;'1E M<B!A=#H@-"DI*PT)"0D)*&YE>'12*BAG9FEL=&5R(&%T.B U*2DN#0D)"6)A M;$(@.CT@*'(J*&)F:6QT97(@870Z(#$I*2L-"0D)"2AG*BAB9FEL=&5R(&%T M.B R*2DK#0D)"0DH8BHH8F9I;'1E<B!A=#H@,RDI*PT)"0D)*&YE>'12*BAB M9FEL=&5R(&%T.B T*2DK#0D)"0DH;F5X=$<J*&)F:6QT97(@870Z(#4I*2X- M"0D);'5M:6YA;F-E(#H]("@P+C(Y.2IB86Q2*2LH,"XU.#<J8F%L1RDK*# N M,3$T*F)A;$(I+@T)"0EB86Q2(#H](&)A;%(@*R H*&QU;6EN86YC92 M(&)A M;%(I*F-O<G)E8W1I;VY&86-T;W(I+@T)"0EB86Q'(#H](&)A;$<@*R H*&QU M;6EN86YC92 M(&)A;$<I*F-O<G)E8W1I;VY&86-T;W(I+@T)"0EB86Q"(#H] M(&)A;$(@*R H*&QU;6EN86YC92 M(&)A;$(I*F-O<G)E8W1I;VY&86-T;W(I M+@T)"0EB86Q2(#H](&)A;%(@('1R=6YC871E9"X-"0D)8F%L4B (# @:694 M<G5E.EMB86Q2(#H](#!=(&EF1F%L<V4Z6V)A;%(@/B R-34@:694<G5E.EMB M86Q2(#H](#(U-5U=+@D-"0D)8F%L1R Z/2!B86Q'("!T<G5N8V%T960N#0D) M"6)A;$<@/" P(&EF5')U93I;8F%L1R Z/2 P72!I9D9A;'-E.EMB86Q'(#X@ M,C4U(&EF5')U93I;8F%L1R Z/2 R-35=72X)"0T)"0EB86Q"(#H](&)A;$(@ M('1R=6YC871E9"X-"0D)8F%L0B (# @:694<G5E.EMB86Q"(#H](#!=(&EF M1F%L<V4Z6V)A;$(@/B R-34@:694<G5E.EMB86Q"(#H](#(U-5U=+@D@#0D) M"6$@.CT@8F%L4B K(&)A;$<@*R!B86Q"(#X@,"!I9E1R=64Z6S$V<D9&72!I M9D9A;'-E.ELP72X-"0D)8V]L;W)686P@.CT@8F%L0B K("AB86Q'(&)I=%-H M:69T.B X*2 K(" H8F%L4B!B:713:&EF=#H@,38I("L@*&$@8FET4VAI9G0Z M(#(T*2X-"0D)86YS=V5R(&)I=',@:6YT96=E<D%T.B H>2IA;G-W97(@=VED M=&@I*RAX+R\S*S$I('!U=#H@8V]L;W)686PN#0D)"7!R979"(#H](&(N('!R M979'(#H](&<N(" B<F5M96UB97(@=&AE('5N8F%L86YC960@=F%L=65S(B!= /72X-"5YA;G-W97(A("$- ` end
Hi Andy, Any chance you could build a Win 32 version of the VM for the release? VM's for other platforms would also be nice too. It would be really great to release on two platforms at once.
The versions to use are: Exupery-wbk.219 VMMAker-wbk.42
The big decision was really between releasing on VMMaker 3.8b3 based VMs or upgrading to VMMaker 3.8b6. We're hoping that upgrading will solve the problems that the Mac x86 port is having. Hopefully a Mac port should appear during 0.11 development. Upgrading VMMaker risks destabilizing this release, and also makes it harder for the ports that exist to build VMs to go with the release.
I've now got a working 3.9 development image based on the squeak-dev images. I'll include that image along with the release. The exupery39 versions are the port to 3.9. The only problems have been with tests. 23 tests were failing because the bytecodes are all 8 bytes further down in the MethodContexts in 3.9. One test was failing due to a bug fix in 3.9. Exupery currently works in both 3.8 and 3.9 images but not all the tests will pass in both images.
I've moved all the VM code into the VMMaker package, this is to make it easier to see when a new VM may be needed. If the VMMaker package hasn't changed then none of the VM code will have changed. If it hasn't then all the changes were in image side code. Previously there is no easy way to see if a new VM is required between different versions of Exupery
The release will be built on 3.8, with the old well tested VMMaker 3.8b3 VMs and will include a 3.9 developer image with the tests fixed. So the release image will be slightly ahead of the release but only tests will have changed.
Bryce
bryce@kampjes.demon.co.uk wrote in message news:17752.56853.300993.790762@gargle.gargle.HOWL...
Hi Andy, Any chance you could build a Win 32 version of the VM for the release? VM's for other platforms would also be nice too. It would be really great to release on two platforms at once.
The versions to use are: Exupery-wbk.219 VMMAker-wbk.42
No problem. Any particular version of the SVN vm sources? Are you building the vm from a 3.8 basic image, 3.8 full, or 3.9? It may not make any difference, but I would like to build from the exact-same setup that you use.
The big decision was really between releasing on VMMaker 3.8b3 based VMs or upgrading to VMMaker 3.8b6. We're hoping that upgrading will solve the problems that the Mac x86 port is having. Hopefully a Mac port should appear during 0.11 development. Upgrading VMMaker risks destabilizing this release, and also makes it harder for the ports that exist to build VMs to go with the release.
I've now got a working 3.9 development image based on the squeak-dev images. I'll include that image along with the release. The exupery39 versions are the port to 3.9. The only problems have been with tests. 23 tests were failing because the bytecodes are all 8 bytes further down in the MethodContexts in 3.9. One test was failing due to a bug fix in 3.9. Exupery currently works in both 3.8 and 3.9 images but not all the tests will pass in both images.
I've moved all the VM code into the VMMaker package, this is to make it easier to see when a new VM may be needed. If the VMMaker package hasn't changed then none of the VM code will have changed. If it hasn't then all the changes were in image side code. Previously there is no easy way to see if a new VM is required between different versions of Exupery
The release will be built on 3.8, with the old well tested VMMaker 3.8b3 VMs and will include a 3.9 developer image with the tests fixed. So the release image will be slightly ahead of the release but only tests will have changed.
Bryce
Andrew Tween writes:
bryce@kampjes.demon.co.uk wrote in message news:17752.56853.300993.790762@gargle.gargle.HOWL...
Hi Andy, Any chance you could build a Win 32 version of the VM for the release? VM's for other platforms would also be nice too. It would be really great to release on two platforms at once.
The versions to use are: Exupery-wbk.219 VMMAker-wbk.42
No problem. Any particular version of the SVN vm sources? Are you building the vm from a 3.8 basic image, 3.8 full, or 3.9? It may not make any difference, but I would like to build from the exact-same setup that you use.
Thanks,
I built the VM in a 3.8 image using my normal build environment.
So probably:
svn export http://squeakvm.org/svn/squeak/tags/unix-3.7-7
Bryce
Hi Bryce, I have built a Win32 vm. Firstly, I tried building from a Squeak3.8.1-6747-full image, but that gave MNU errors when generating :(. So then I tried again with Squeak3.8-6665-full, and this time it generated, and compiled ok.
4 tests are failing in the ExuperyStoryTests... #testBlockBug3 #testBlockNonLocalReturnsRecycleContexts #testBlocksAndProcessesBug #testDelayWaitStressTest
The benchmarks are... arithmaticLoopBenchmark 2487 compiled 285 ratio: 8.726 bytecodeBenchmark 4271 compiled 1255 ratio: 3.403 sendBenchmark 3482 compiled 1772 ratio: 1.965 doLoopsBenchmark 2078 compiled 1663 ratio: 1.250 largeExplorers 2224 compiled 1683 ratio: 1.321 compilerBenchmark 2093 compiled 1712 ratio: 1.223 Cumulative Time 12903.774 compiled 4971.489 ratio 2.596
Let me know if the above indicates a fully-functioning VM and I'll let you have it. Cheers, Andy
bryce@kampjes.demon.co.uk wrote in message news:17752.60531.291804.747854@gargle.gargle.HOWL...
Andrew Tween writes:
bryce@kampjes.demon.co.uk wrote in message news:17752.56853.300993.790762@gargle.gargle.HOWL...
Hi Andy, Any chance you could build a Win 32 version of the VM for the release? VM's for other platforms would also be nice too. It would be really great to release on two platforms at once.
The versions to use are: Exupery-wbk.219 VMMAker-wbk.42
No problem. Any particular version of the SVN vm sources? Are you building the vm from a 3.8 basic image, 3.8 full, or 3.9? It may
not
make any difference, but I would like to build from the exact-same setup
that
you use.
Thanks,
I built the VM in a 3.8 image using my normal build environment.
So probably:
svn export http://squeakvm.org/svn/squeak/tags/unix-3.7-7
Bryce
Hi Andrew, The VM looks fine to me. More detail below.
Andrew Tween writes:
Hi Bryce, I have built a Win32 vm. Firstly, I tried building from a Squeak3.8.1-6747-full image, but that gave MNU errors when generating :(. So then I tried again with Squeak3.8-6665-full, and this time it generated, and compiled ok.
4 tests are failing in the ExuperyStoryTests... #testBlockBug3
Relies on a test from the refactoring browser.
#testBlockNonLocalReturnsRecycleContexts
Another refactoring browser test is used here.
#testBlocksAndProcessesBug
This one uses CommandShell which is built on top of OSProcess.
#testDelayWaitStressTest
This test uses GraphViz which is used to lay out graphical inspectors for intermediate code. This is also why I've loaded OSProcess and CommandShell into my standard image.
I need to figure out a decent way of handling dependencies on other packages. Exupery itself should be dependency free, but the tests are not. I re-use tests from other packages if they catch a crash in Exupery. This can wait though.
The benchmarks are... arithmaticLoopBenchmark 2487 compiled 285 ratio: 8.726 bytecodeBenchmark 4271 compiled 1255 ratio: 3.403 sendBenchmark 3482 compiled 1772 ratio: 1.965 doLoopsBenchmark 2078 compiled 1663 ratio: 1.250 largeExplorers 2224 compiled 1683 ratio: 1.321 compilerBenchmark 2093 compiled 1712 ratio: 1.223 Cumulative Time 12903.774 compiled 4971.489 ratio 2.596
The numbers look very good to me. The micro benchmarks are worse than I get here and the macro benchmarks are much better.
Here's the benchmarks I get: arithmaticLoopBenchmark 1397 compiled 92 ratio: 15.185 bytecodeBenchmark 2135 compiled 463 ratio: 4.611 sendBenchmark 1576 compiled 699 ratio: 2.255 doLoopsBenchmark 1083 compiled 841 ratio: 1.288 largeExplorers 356 compiled 366 ratio: 0.973 compilerBenchmark 733 compiled 708 ratio: 1.035 Cumulative Time 4213.729 compiled 1453.554 ratio 2.899
I'm running an Athlon 64 3500+ 2.2GHz. What CPU did you use for those benchmarks?
Bryce
bryce@kampjes.demon.co.uk wrote in message news:17754.12298.506847.526669@gargle.gargle.HOWL...
Hi Andrew, The VM looks fine to me. More detail below.
Good. I'll email it to you.
Andrew Tween writes:
Hi Bryce, I have built a Win32 vm. Firstly, I tried building from a Squeak3.8.1-6747-full image, but that gave
MNU
errors when generating :(. So then I tried again with Squeak3.8-6665-full, and this time it generated,
and
compiled ok.
4 tests are failing in the ExuperyStoryTests... #testBlockBug3
Relies on a test from the refactoring browser.
#testBlockNonLocalReturnsRecycleContexts
Another refactoring browser test is used here.
#testBlocksAndProcessesBug
This one uses CommandShell which is built on top of OSProcess.
#testDelayWaitStressTest
This test uses GraphViz which is used to lay out graphical inspectors for intermediate code. This is also why I've loaded OSProcess and CommandShell into my standard image.
I need to figure out a decent way of handling dependencies on other packages. Exupery itself should be dependency free, but the tests are not. I re-use tests from other packages if they catch a crash in Exupery. This can wait though.
The benchmarks are... arithmaticLoopBenchmark 2487 compiled 285 ratio: 8.726 bytecodeBenchmark 4271 compiled 1255 ratio: 3.403 sendBenchmark 3482 compiled 1772 ratio: 1.965 doLoopsBenchmark 2078 compiled 1663 ratio: 1.250 largeExplorers 2224 compiled 1683 ratio: 1.321 compilerBenchmark 2093 compiled 1712 ratio: 1.223 Cumulative Time 12903.774 compiled 4971.489 ratio 2.596
The numbers look very good to me. The micro benchmarks are worse than I get here and the macro benchmarks are much better.
Here's the benchmarks I get: arithmaticLoopBenchmark 1397 compiled 92 ratio: 15.185 bytecodeBenchmark 2135 compiled 463 ratio: 4.611 sendBenchmark 1576 compiled 699 ratio: 2.255 doLoopsBenchmark 1083 compiled 841 ratio: 1.288 largeExplorers 356 compiled 366 ratio: 0.973 compilerBenchmark 733 compiled 708 ratio: 1.035 Cumulative Time 4213.729 compiled 1453.554 ratio 2.899
I'm running an Athlon 64 3500+ 2.2GHz. What CPU did you use for those benchmarks?
Pentium 3 Mobile. 1133MHz. I need a new PC - then I'd get twice as much work done ;) Cheers, Andy
Bryce
Andrew Tween writes:
bryce@kampjes.demon.co.uk wrote in message news:17754.12298.506847.526669@gargle.gargle.HOWL...
Hi Andrew, The VM looks fine to me. More detail below.
Good. I'll email it to you.
Thanks, I've got the email, I'll upload it tomorrow night.
Bryce
Hello again, This time about sub-pixel aliasing.
Andrew Tween writes:
Hi Bryce, I think it is a good idea to release the solid 3.8 version.
Having said that, I am looking forward to the 3.9 release because I really want to try using Exupery on my sub-pixel font filtering algorithm to see if it can speed it up. Currently this is in 3.9, and I don't want to port it all back to an earlier image/vm, especially since you are moving forward to 3.9.
Exupery runs fine on 3.9, the tests just needed to be fixed.
The best way to find out how it performs for your example would be to load Exupery into your 3.9 image and try it.
This is probably a topic for another thread, but could you tell from looking at the attached method if it is a good candidate for speed-up. It has nested loops, does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift: , bitAnd: , *, + , // , and some Float calcs.
I'm not sure how well it would run. The code is definately a promising candidate to compile however Exupery doesn't yet compile Floats, large integers, or primitive 166. I don't think the interpreter does any special optimisations for them either so chances are those operations will run at the same speed. Exupery will be able to optimise the SmallInteger calculations and looping overhead.
The method could definately be optimised much more. Adding integerAt:put: and ByteArray>>at: primitives would help. So would basic floating point optimisations. Going further, adding support for machine word (32 bit integer) and byte objects should allow us to compile to near C speeds.
The optimisations for machine words, bytes objects, and floating point are all very similar. The game is to remove all the intermediate objects so the calculations are done directly in registers without any conversion and deconversion overhead.
luminance := (0.299*balR)+(0.587*balG)+(0.114*balB). balR := balR + ((luminance - balR)*correctionFactor). balG := balG + ((luminance - balG)*correctionFactor). balB := balB + ((luminance - balB)*correctionFactor). balR := balR truncated. balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]]. balG := balG truncated. balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]]. balB := balB truncated. balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]]. a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0]. colorVal := balB + (balG bitShift: 8) + (balR bitShift: 16) + (a bitShift: 24). answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.
Is a nice example to show what dynamically inlined primitives could do. The major overhead with floats is allocating memory (1). In this example, using the current optimisation engine it should be possible to create only 4 floats rather than 19 needed by the intepreter. One more allocation will be needed to form colorVal if it overflows into a LargeInteger. SSA should allow all the floating point intermediate values to be removed by allow program analysis over more than one statement.
balR := balR truncated. balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
Should probably be handled via a primitive that truncates a floating point value down to an unsigned 8 bit value. For this example such a primitive may be overkill however converting floating point values to. But with Exupery 3.0 and SSA it would be really nice to be able to optimise to vectors. With vector optimisation we will have a level playing field with C, they will need at least as much compiler machinery as we will and they will probably write their compilers in C requiring much more work than writing in Smalltalk.
In summary, I think there may be some speed improvement now. Adding the array access primitives will help. Floating point is likely to be the next biggest win. Without SSA I doubt that other optimisations will provide enough gain to be worthwhile. With SSA and a few extra object types it should be possible to fully optimise it.
Bryce
(1) After upgrading the VM I'm going to implement fast compiled primitives for #new and #@. This is driven by the largeExplorers benchmark. #@ is inlined into the main interpret loop in the interpreter but Exupery executes it as a normal primitive. This means that compiling largeExplorers can lead to a 8% speed loss.
bryce@kampjes.demon.co.uk wrote in message news:17752.60448.36947.789918@gargle.gargle.HOWL...
Hello again, This time about sub-pixel aliasing.
Andrew Tween writes:
Hi Bryce, I think it is a good idea to release the solid 3.8 version.
Having said that, I am looking forward to the 3.9 release because I really
want
to try using Exupery on my sub-pixel font filtering algorithm to see if it
can
speed it up. Currently this is in 3.9, and I don't want to port it all back
to
an earlier image/vm, especially since you are moving forward to 3.9.
Exupery runs fine on 3.9, the tests just needed to be fixed.
The best way to find out how it performs for your example would be to load Exupery into your 3.9 image and try it.
The subpixel rendering needs a modified vm (for BitBlt stuff). And Exupery needs a modified vm. Currently these are built from different versions of vmmaker, svn sources,etc. So, I am keen for them to be synchronised, and I am sure it will all come together eventually.
In the meantime, I guess I could create a standalone benchmark, which would be interesting in its own right.
This is probably a topic for another thread, but could you tell from
looking at
the attached method if it is a good candidate for speed-up. It has nested
loops,
does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift:
,
bitAnd: , *, + , // , and some Float calcs.
I'm not sure how well it would run. The code is definately a promising candidate to compile however Exupery doesn't yet compile Floats, large integers, or primitive 166. I don't think the interpreter does any special optimisations for them either so chances are those operations will run at the same speed. Exupery will be able to optimise the SmallInteger calculations and looping overhead.
Is the primitive compilation something that I, or others, could help with? What is involved in adding a primitive to Exupery?
The method could definately be optimised much more. Adding integerAt:put: and ByteArray>>at: primitives would help. So would basic floating point optimisations. Going further, adding support for machine word (32 bit integer) and byte objects should allow us to compile to near C speeds.
The optimisations for machine words, bytes objects, and floating point are all very similar. The game is to remove all the intermediate objects so the calculations are done directly in registers without any conversion and deconversion overhead.
luminance := (0.299*balR)+(0.587*balG)+(0.114*balB). balR := balR + ((luminance - balR)*correctionFactor). balG := balG + ((luminance - balG)*correctionFactor). balB := balB + ((luminance - balB)*correctionFactor). balR := balR truncated. balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]]. balG := balG truncated. balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]]. balB := balB truncated. balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]]. a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0]. colorVal := balB + (balG bitShift: 8) + (balR bitShift: 16) + (a bitShift:
24).
answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.
Is a nice example to show what dynamically inlined primitives could do. The major overhead with floats is allocating memory (1). In this example, using the current optimisation engine it should be possible to create only 4 floats rather than 19 needed by the intepreter. One more allocation will be needed to form colorVal if it overflows into a LargeInteger. SSA should allow all the floating point intermediate values to be removed by allow program analysis over more than one statement.
balR := balR truncated. balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
Should probably be handled via a primitive that truncates a floating point value down to an unsigned 8 bit value. For this example such a primitive may be overkill however converting floating point values to. But with Exupery 3.0 and SSA it would be really nice to be able to optimise to vectors. With vector optimisation we will have a level playing field with C, they will need at least as much compiler machinery as we will and they will probably write their compilers in C requiring much more work than writing in Smalltalk.
In summary, I think there may be some speed improvement now. Adding the array access primitives will help. Floating point is likely to be the next biggest win. Without SSA I doubt that other optimisations will provide enough gain to be worthwhile. With SSA and a few extra object types it should be possible to fully optimise it.
Thanks for your comments. I had intended to re-write the method in C and add it to the plugin, but the advantages of being able to easily play with it in Smalltalk outweigh the speed-up of porting to C, at least while I am still experimenting.
Cheers, Andy
Bryce
(1) After upgrading the VM I'm going to implement fast compiled primitives for #new and #@. This is driven by the largeExplorers benchmark. #@ is inlined into the main interpret loop in the interpreter but Exupery executes it as a normal primitive. This means that compiling largeExplorers can lead to a 8% speed loss.
Andrew Tween writes:
Is the primitive compilation something that I, or others, could help with? What is involved in adding a primitive to Exupery?
Primitives vary. Simple primitives may only be two lines with two different tests covering them. Exupery is tested both by compiling methods then testing they work correctly and also by unit tests that test the individual components.
primitiveLoadInstVar: aMedPrimitive ^emitter fetchAddress: (MedLiteral literal: aMedPrimitive primitiveNumber - 264) ofObject: (aMedPrimitive arguments first visitWith: self)
Is the simplest primitive in Exupery at the moment. It's the quick return primitive to return an instance variable. This is a primitive to save the cost of creating a context.
The primitives are in the category primitive in IntermediateSimplifier. The end to end tests are in the category "Test - Primitives" in ExuperyStoryTests. The unit tests are in "Tests - Primitives" in IntermediateSimplifierTests.
#at: and #at:put: primitives will be much easier to write than floating point primitives. #new involves calling C and also saving state around a potential GC call.
Decently optimised floating point primitives will require some cleaning up of the front end to generalise the inlining of integer code. Ideally, it should be possible to guess the type of an operation then to forget the guess if type feedback shows it was wrong. This would generalize the inlining done manually for arithmetic primitives with the dynamic primitive inlining done for #at: etc.
Thanks for your comments. I had intended to re-write the method in C and add it to the plugin, but the advantages of being able to easily play with it in Smalltalk outweigh the speed-up of porting to C, at least while I am still experimenting.
Writing a single primitive in C would be less work than optimizing Exupery to handle everything needed for near C speeds.
Bryce
exupery@lists.squeakfoundation.org