Report from a novice VM h4x0r.

Thu Apr 1 01:02:11 UTC 2004

> This one I looked at a few years ago. You need to see what the
> assembler your compile is producing, (intel,powerpc,68K) to
> understand the implications. I had altered the more important ones to
> be optimal for powerpc and 68K.

You have? If I'd known it, I would have stood on my head to prevent it. The
trouble with those optimizations is that while they *may* be optimal for a
specific compiler/platform they *never* scale across platforms (and often
not even within a single processor family). So what happens is that you
unfairly optimize the VM for a platform which (in the case of 68k) noone
even uses any longer! And are you sure that the generated code is still
optimal on G4/G5?

Really, those kinds of optimizations MUST be put into platform specific
macros. It gives other people a way of understanding that these are areas
that may matter for them as well (see Alan's notice about "rep movsb" which
is fairly fast on some x86 versions although it's not the fastest way across
x86 - that's still using the FPU ;-) and at least it avoids needless
discussions about whether it is "better" to write the code this or that way
because on "my platform using my compiler" it results in better assembly
code.

Guys, if we're optimizing towards a specific platform (and I'm all in favour
of that - competition is good for overall progress) let's make sure we give
each other an aheads warning and a fair "battle ground" rather than changing
Smalltalk code in a way that after being translated three times produces
optimal assembly code on one compiler and a single processor version.

I mean it's not like we haven't (had and have) enough of that in the VM
already - the number of temps that are inlined in the main interpreter loop
were for years *precisely* matched so that a specific Mac compiler would
generate the fastest code even though it could be proven that by NOT sharing
the temps in interpret() compilers make a much better life-time analysis and
therefore improve the overall speed for *every* other platform (even on Macs
if you didn't use that specific version of the compiler - I still have the
memos somewhere because I couldn't possibly imagine how the better life-time
analysis could possibly "slow down" a Mac VM; it turned out that we had
optimized the VM so that this severely screwed up compiler would generate
the "right" assembly code). The same for primitive dispatch - it's because
of Apple not getting their act together with defining a decent ABI that the
VM uses a case statement although it can be shown that on *every* platform
having a reasonable ABI a function-through-pointer call just about *doubles*
the primitive dispatch speed.

Let's play this fair, shall we? If we do optimizations that are specific for
a single platform, let's move it where it belongs - into a macro living
inside the platform specific code with the default being something that we
can expect to work reasonably across all platforms, not just our
pet-platform.

Cheers,
  - Andreas