Report from a novice VM h4x0r.

Thu Apr 1 02:14:59 UTC 2004

On Mar 31, 2004, at 5:02 PM, Andreas Raab wrote:

>> This one I looked at a few years ago. You need to see what the
>> assembler your compile is producing, (intel,powerpc,68K) to
>> understand the implications. I had altered the more important ones to
>> be optimal for powerpc and 68K.
>
> You have? If I'd known it, I would have stood on my head to prevent  
> it. The
> trouble with those optimizations is that while they *may* be optimal  
> for a
> specific compiler/platform they *never* scale across platforms (and  
> often
> not even within a single processor family). So what happens is that you
> unfairly optimize the VM for a platform which (in the case of 68k)  
> noone
> even uses any longer! And are you sure that the generated code is still
> optimal on G4/G5?

See I can get VM developers to stand on their heads.

The issue here was the original convoluted code would not compile to  
code which did pre/post
increment/decrement of the indexing register, rather it wanted to move  
data
from/to locations then add/sub 4 bytes from each of the index registers.
Rearranging the way the loops worked made the C compiler understand  
that it could
do the CPU specific instructions to do this.

This issue interesting enough applied to both 68K and powerpc.
The most gain was for 68K because you actually got rid of real
clock cycles for the add.

Now I can't recall the details  but it involved something silly like  
using
two memory pointers and two different index pointers and moving  
backwards
or something to move memory from A to B.

WIth a bit of tweaking that became an innocent looking, processor  
agnostic I MIGHT ADD,
  for loop or was that a while loop.

> I mean it's not like we haven't (had and have) enough of that in the VM
> already - the number of temps that are inlined in the main interpreter  
> loop
> were for years *precisely* matched so that a specific Mac compiler  
> would
> generate the fastest code even though it could be proven that by NOT  
> sharing
> the temps in interpret() compilers make a much better life-time  
> analysis and
> therefore improve the overall speed for *every* other platform (even  
> on Macs
> if you didn't use that specific version of the compiler - I still have  
> the
> memos somewhere because I couldn't possibly imagine how the better  
> life-time
> analysis could possibly "slow down" a Mac VM; it turned out that we had
> optimized the VM so that this severely screwed up compiler would  
> generate
> the "right" assembly code).

mmm you must mean some old version of Code Warrior for os- 7.x? No?

I believe I switch quite awhile ago first to AH's lots of temp  
variables for the interp.c loop, then to
your localized temp variables within blocks for interp.c because I  
found that CW would give up on register
optimization when faced with 30ish temp variables in a really long  
method, whereas the localized temps
allowed for the optimization.

Speaking of which I'm reconsidering what to do about the CW compiled  
version for classic macs
Right now I think this is the only platform that requires $$$ to  
purchase a compiler in order to allow you
to compile the source code. At some point I'm not going to renew my CW  
compiler license, that does
cost $. If anyone has compiled  Squeak under MPW lately (or in the  
distance past) and has the MPW worksheet for
doing that, that would be great to see.

> --
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===