Report from a novice VM h4x0r.
John M McIntosh
johnmci at mac.com
Thu Apr 1 02:14:59 UTC 2004
On Mar 31, 2004, at 5:02 PM, Andreas Raab wrote:
>> This one I looked at a few years ago. You need to see what the
>> assembler your compile is producing, (intel,powerpc,68K) to
>> understand the implications. I had altered the more important ones to
>> be optimal for powerpc and 68K.
>
> You have? If I'd known it, I would have stood on my head to prevent
> it. The
> trouble with those optimizations is that while they *may* be optimal
> for a
> specific compiler/platform they *never* scale across platforms (and
> often
> not even within a single processor family). So what happens is that you
> unfairly optimize the VM for a platform which (in the case of 68k)
> noone
> even uses any longer! And are you sure that the generated code is still
> optimal on G4/G5?
See I can get VM developers to stand on their heads.
The issue here was the original convoluted code would not compile to
code which did pre/post
increment/decrement of the indexing register, rather it wanted to move
data
from/to locations then add/sub 4 bytes from each of the index registers.
Rearranging the way the loops worked made the C compiler understand
that it could
do the CPU specific instructions to do this.
This issue interesting enough applied to both 68K and powerpc.
The most gain was for 68K because you actually got rid of real
clock cycles for the add.
Now I can't recall the details but it involved something silly like
using
two memory pointers and two different index pointers and moving
backwards
or something to move memory from A to B.
WIth a bit of tweaking that became an innocent looking, processor
agnostic I MIGHT ADD,
for loop or was that a while loop.
> I mean it's not like we haven't (had and have) enough of that in the VM
> already - the number of temps that are inlined in the main interpreter
> loop
> were for years *precisely* matched so that a specific Mac compiler
> would
> generate the fastest code even though it could be proven that by NOT
> sharing
> the temps in interpret() compilers make a much better life-time
> analysis and
> therefore improve the overall speed for *every* other platform (even
> on Macs
> if you didn't use that specific version of the compiler - I still have
> the
> memos somewhere because I couldn't possibly imagine how the better
> life-time
> analysis could possibly "slow down" a Mac VM; it turned out that we had
> optimized the VM so that this severely screwed up compiler would
> generate
> the "right" assembly code).
mmm you must mean some old version of Code Warrior for os- 7.x? No?
I believe I switch quite awhile ago first to AH's lots of temp
variables for the interp.c loop, then to
your localized temp variables within blocks for interp.c because I
found that CW would give up on register
optimization when faced with 30ish temp variables in a really long
method, whereas the localized temps
allowed for the optimization.
Speaking of which I'm reconsidering what to do about the CW compiled
version for classic macs
Right now I think this is the only platform that requires $$$ to
purchase a compiler in order to allow you
to compile the source code. At some point I'm not going to renew my CW
compiler license, that does
cost $. If anyone has compiled Squeak under MPW lately (or in the
distance past) and has the MPW worksheet for
doing that, that would be great to see.
> --
========================================================================
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
More information about the Squeak-dev
mailing list
|