[Vm-dev] Strongtalk and Exupery

Thu Sep 21 04:01:57 UTC 2006

Hi Bryce,

I applaud what you are trying to do, and it sounds very interesting.  If you
can make it work with the compiler written in Smalltalk that would be great-
that is certainly the long-term goal for me too.  And you are more than
welcome to pick my brain about Strongtalk, if it would help you.  My only
goal here is to help speed up Smalltalk, however that happens.

Since you may be interested, I have responded in detail below:

> -----Original Message-----
> From: vm-dev-bounces at lists.squeakfoundation.org
> [mailto:vm-dev-bounces at lists.squeakfoundation.org]On Behalf Of Bryce
> Kampjes
> Sent: Wednesday, September 20, 2006 2:42 PM
> To: vm-dev at lists.squeakfoundation.org;
> exupery at lists.squeakfoundation.org
> Subject: [Vm-dev] Strongtalk and Exupery
> [...]
> Does the Strongtalk mailing list have a publicly available archive? One
> that doesn't require a yahoo sign-on? It would make it much easier for
> interested outsiders to follow what's going on.

Sorry!  I didn't realize the archives weren't public.  They are now.  The
list is moving to Google in the next day anyhow.

> How does Strongtalk compare with current Java Hotspot VMs? They are
> also available with source for study (though not open sourced).

Certainly the Java Hotspot VMs are descendants of the Strongtalk VM, but
they have been basically rewritten, and are definitely not just tweaked
Smalltalk under the covers.  For one thing, the languages are different;
Java has untagged immediates of various sizes, and Java has guaranteed
implementation type information available, unlike Strongtalk.  Although
inlining choices are done differently, in some ways I actually like the way
Strongtalk does it more, but unfortunately I can't talk about the exact
differences.  But the Java VM is not what I would call a type-feedback VM
anymore, and Strongtalk is.

For another thing, the Java VMs are fully internally multi-threaded, which
is a lot of work (and a *huge* amount of testing) that hasn't been done for
Strongtalk.

Another issue is that the downside of having all the implementation type
information in Java is that it has to be validated before you can trust it,
so class loading becomes a gigantic nightmare.  Strongtalk doesn't have to
deal with any of that, since it doesn't assume anything about static
implementation types at all (other than for the hardcoded boolean messages).

Another issue is that the Java VMs have on-stack replacement so that
compiled methods are used immediately even for active contexts.  That isn't
there yet in Strongtalk.

And of course Smalltalk is a smaller, simpler, better language :-).

> I'm the primary author of Exupery, another attempt at fast execution
> technology for Smalltalk. Exupery is written in Smalltalk. The
> original design was to combine Self's dynamic inlining with a strong
> optimising compiler. For that the goal, I don't think we can afford to
> write in anything less productive than Smalltalk.  That is still the
> goal but it's a long way off, Exupery is currently moving towards a
> 1.0 without full method inlining and without a strong optimiser. All
> the needed high risk features are there.
> Compile time is not the key issue for a dynamic compiler, pauses
> are. Compile time only becomes critical if you are stopping execution
> to compile. Exupery doesn't. Being normal Smalltalk like everything
> else pausing execution to compile is tricky. The trade offs to allow
> Exupery to be easily written in Smalltalk are the same as those
> required to allow long compile times for high grade optimisations.

It was for a similar reason that I forked off the Java Server VM at Sun.
Good inlining and a good code generator are synergistic, so I wanted a
really good code generator.  But I got my #ss handed to me because of the
difficulty of making it work.

Part of the problem is that it is more important than you might think for
the compiler to be fast.  A compiler that does really good register
allocation is likely to be more than a factor of 2 slower than a fast JIT,
when you do inlining.  Here is the important point: once you do inlining,
the average size of the methods you compile becomes much larger, and
register allocation is highly non-linear.

Like you we moved to background compilation, which gets rid of pauses, but
the time it takes for the program to get up to speed is still significantly
affected by having a slower compiler.  The problem isn't just that the
optimized code becomes available later, it is also that the compiler is
chewing up CPU in the meantime, so until it is available you are running
much slower code *and* are also getting fewer time slices.  Now that
multiprocessors are really here on the desktop, though, this might become
less of an issue.

Another factor that interacts with the above issue is that if you don't
compile the method eagerly, you end up getting other spurious compiles later
because the unoptimized code is still running, setting off invocation
counters for called methods that are already scheduled to be inlined, etc.
So a background compiler ends up compiling more methods.  Theoretically this
is still happening a bit in Strongtalk because on-stack replacement isn't
there, which has a similar effect, but it certainly isn't noticeable.

But the constraints in our case were that it had to work well in *all*
situations, especially for short lived Java programs.  They can end before
the compiler ever finishes.  So that is why there are two Java HotSpot VMs.

For your case, the constraints aren't nearly so strict, since your audience
can select itself for applications where the startup speed doesn't matter,
and you probably won't be running things like tiny dynamically-loaded
applets.  So hopefully it won't be a problem for you.

> If you, or other Strongtalkers are interested in talking about
> compiler design please feel free to join Exupery's mailing list.
> Don't worry if you don't have time to study the source or play with
> it. Sharing experience would be valuable. Exupery is now about 4
> years old, revisiting the design decisions with knowledgeable people
> would be useful, especially in an archived list. Exupery is another
> chance to keep the ideas and vision alive, if not the C++.
> The Exupery mailing list is here:
>
>   http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>
> Exupery's tiny benchmarks are:
>
>   1,176,335,439 bytecodes/sec; 16,838,438 sends/sec
>
> and with the interpreter:
>
>     228,367,528 bytecodes/sec;  7,241,760 sends/sec
>
> Which makes it currently much slower than Strongtalk for sends but the
> same speed for bytecodes. That's comparing against the numbers
> provided by Gilad via Dan's post to squeak-dev. Such a comparison is
> not fair as relative performance does vary greatly with
> architecture. Exupery is best on P4's, ok on Athlons, and least
> impressive on Pentium-Ms.
>
> The bytecode performance is the most interesting to me. Exupery does
> not yet do dynamic method inlining which explains Strongtalks strong
> send performance. Message inlining is not necessary for a 1.0. That
> the bytecode numbers are so close, and I know Exupery's weaknesses, is
> interesting. Exupery uses a colouring coalescing register allocator
> but also lives with Squeak's object memory and could do with a bit
> more tuning. I'm guessing Strongtalk's object memory is much cleaner
> and better designed for speed based on reading the Self papers. Did
> the Strongtalk team stop tuning for bytecode performance after they
> passed VisualWorks?

I'm not sure what those bytecode performance #s mean; I don't know how Gilad
did those measurements.  The bytecodes in Strongtalk are not one-to-one with
other Smalltalks.  It doesn't sound like it an apples-to-apples comparison,
since you quote the ratio of bytecodes-to-sends under the Squeak interpreter
as 32 and Dan quoted 44; they should be the same.  We should do some proper
benchmarks.  There are lots of benchmarks in Strongtalk if you want to try
them.  Look for classes matching *Benchmark*.

The notion of sends/second performance in Strongtalk does not make sense.
An inlined send takes 0 time, so depending on how the code is written, an
arbitrarily high send/sec number can apply.  For example, when you really
totally factor your Smalltalk code, always use instance variable access
methods, and use lots of non-pure blocks, you can get really massive
speedups in Strongtalk.  My Dictionary implementation is written that way,
and when I ported it to VisualWorks (a while ago), Strongtalk was 35 *times*
as fast, and the code uses only SmallIntegers, Associations, and Arrays.
Almost all the sends and blocks are optimized completely away.

So for me, Strongtalk isn't so much about absolute bytecode performance, as
it is about being able to write all the control structures and blocks and
sends that I want, and be confident that I pay basically no price for
factoring overhead.  It is a really cool feeling!

> Exupery has also recently been ported to Win 32 and Solaris 10 x86.
> Both ports were done by other people. Pre-built VMs will be available
> for both platforms in a few days.
>
> Bryce
>

That sounds great!  Hopefully there will be technology transfer both ways!
Cheers,
Dave