Code Generation (was VM improvement: speeding up ...)

Mon Feb 14 09:55:00 UTC 2000

Marcel,

> The only way it would make sense is if Squeak simply doesn't use the  
> type of code that C compilers are good at optimizing.  The example  
> you give sort of falls into that category, but does this have wider  
> applicability?

Ah ... that requires either a very long or a very short answer. Because it's
late here I'll make it short (perhaps if there's enough discussion I'll post
a longer one tomorrow).

<short rant on>
As you know, my background is computer graphics where speed is almost always
critical. However, first when working at the University of Magdeburg and now
in Squeak we are dealing with real-time 3D graphics in Smalltalk (which is
supposed to be slow). How comes? It is because in my experience there are a
few critical parts of large systems that really need to be tweaked to death.
Examples are: Send and bytecode speed in an interpreter, speed of BitBlt for
display updates, rasterization and transformation speed in a 3D graphics
engine. Those parts are system-critical. It does hardly matter for speed if,
say, Wonderland would have been written in C or Smalltalk. It's just not the
bottleneck [ObBesides: This point will be proven once I got the HW
accelleration running] And so are many parts of the interpreter. Take out
all the support code for networking, sound, files etc and what remains is
the critical part. And if you compare *that* to the overall size of Squeak
you are likely to find that this is in the range of 5% (that's my usual
threshold). These five percent *really* matter. The rest usually doesn't.

Here's an interesting experiment: Take out the inner loop of interpret()
compile it highly optimized, the rest without optimizations. Run a couple of
benchmarks and then start moving critical functions over to the optimized
version. Measure the relative improvements and post the results. I'm willing
to bet a beer or two that you get 90% of the speed by moving 5% of code into
the optimized version.

Now to the point of the C compiler. C compilers are usually very good at
optimizing dumb code. Problem #1 that comes to mind is how tremendously hard
it is to profile a large system to actually find bottlenecks. The larger the
system, the harder it is to do in C because you can't just put a
MessageTally on a block and run it a zillion times on *exactly* the data
that was used when things seemingly slowed down. So, from my experience with
C programming, the usual thing people do is just to leave it up to the
compiler to make their badly written code fast. Another major point is the
de-facto non-existance of reusability in C. For Squeak, many people have
posted enhancements to fundamental classes on this list which speed up every
part of the system just because they're so widely used. Try *that* in C ;-) 

[ObBesides: Remember what Ian mentioned about the speed of the STL?! That's
why C compilers must be good at optimizing dumb code]

Finally to the point of how good a compiler needs to be at optimizing
things. Did it ever occur to anyone that the Smalltalk compiler is actually
the most stupid compiler there is?! It doesn't do *any* sort of
optimization. Does it matter?! Sometimes yes. However, you can always write
your code so that it's as fast as possible on the system. Is it hard to do?!
Yes it is. And this is very good. Because it means that you are enforced to
actually rethink your design if something is to slow. Writing optimal
methods with a non-optimizing compiler is very hard so you got to choose
your pick. And that encourages reusing methods of other classes that are
known to be fast, it encourages to look if there is another solution to the
problem that might be faster. All Very Good Things IM[not so]HO.

<short rant off>

[I could go on for days here - just keep in mind that the above was the
short answer ;-)]

> While it is almost always possible to hand-tune a single  
> particular function so it performs better than compiler output, this  
> issue seems to be entirely unrelated to comparing the efficiency of  
> two automatic code generators, one driven by Squeak and one 
> driven by native C compilers.

We're not talking about a general code generator. We are talking about a
code generator that is primarily intended to produce an efficient
interpreter and this is quite a specific task.

> The binary for egcs/gcc is 2.5 MB all by its lonesome, and that's  
> just one platform.  While I am sure that there is some bloat, that's  
> a lot of code generation know-how (also shown in the sources).  Is  
> all of that irrelevant?  It may be, I don't know, but I have a  
> difficult time with the idea.

It's not irrelevant - it's just that the RTOS generator would have a
specific goal for which it should be very well suited. There may be others
where egcs/gcc are *much* better suited. But then, as I said, I do believe
that tweaking the RTOS generator for those 5% that really matter for an
interpreter is likely to be enough for getting quite close to what we have
now using common C compilers. I'm not saying that you could run Microsoft
Office at it's mind-blowing speed using this code generator but then, it
seems that even MSVC has problems optimizing this code... ;-)

  Andreas