Welcome and where we're at - Exupery

3 Nov 2005


      Exupery is an attempt to make Smalltalk fast, as fast as C in many
cases. We're nowhere near that at the moment. The short term goal to
make the current system practical is much more important now.
The current release is slowly getting ready, I've got two bugs to fix.
This release has been mostly debugging. It's much better. The stress
test runs to completion.
The stress test runs all the test classes in the system then compiles
the top ten methods from each class then reruns it. That should be a
reasonable test to show that Exupery is reliable enough to play with.
This is the SqueakSource version not the released SqueakMap version
which is buggy.
Currently, I think the next release should have block support and
super sends. That will mean that Exupery will compile most methods.
There's a long list of things that could be done now, however the
trick is figuring out what's needed to make Exupery useful. The list I
can think of is:
* Block support
 * Super sends
 * Specialised inlined #new.
 * Full method inlining
 * A 64 bit port
 * Floating point support (faster, not fast)
 * 32 bit integer support (like the floating point support)
 * Instruction Scheduling (for P3s including Pentium Ms).
 * Ports (that's up to you)
Block creation is not currently compiled. This means that any method
that creates a real block isn't compiled which is a surprisingly large
number of loop methods. Exupery only speeds up calls from compiled
code to compiled code so compiling a full loop is important.
Super sends also are fairly common. Compiling both blocks and super
sends should mean that Exupery can compile everything required by the
compiler excluding primitives.
Profiling opening explorers on large lists (suggested by Eddie) showed
that a lot of time was being spend in either methods with blocks or
#new. The #news were often indirect, say hidden in a @ method. About
40-50% of the time was in #news as far as I could see. Better analysis
would help including some C/oprofile based profiling. Squeak's #new is
very slow, it spends a lot of time figuring out what the object's
shape is while creating a new object. By compiling a specialised
version of #new for each object this can be heavily optimised.
The above three should provide most of the "easy" gains for normal
Smalltalk code. The rest of the "easy" gains will be compiling
primitives directly to customised machine code. That's best driven
by profiling. There are a lot of primitives and only a few will matter
for each hotspot.
The next big architectural addition is likely to be full method
inlining. This provides a few benefits. First it'll make common
message sends very quick. Second adding it will change the
cost/benefits of other optimisations. If enough sends are inlined then
optimising the less common cases becomes less important and inlining
creates large methods with more opportunities for other optimisations.
The main reason to consider a x86 64 bit port is to make Exupery more
portable. So far I've been focussing on making it useful on one
platform and a few portability details have been ignored. An x86 port
would be a nice small port but still require cleaning up the
portability issues.
Exupery almost has a decent architecture to optimise floating point
expressions. What's missing is combining primitive inlining with type
feedback. Exupery currently does both, I just haven't glued them
together. The key to fast floating point without a full SSA optimiser
is removing boxing and deboxing floats inside an expression and
speeding up object creation (the same as #new above).
32 bit integers have exactly the same optimisation problems as floats.
This is assuming that the 32 bit integers are stored in their own
objects with primitive that do 32 bit math rather than using two
SmallIntegers.
Exupery performs badly on P3 cores, it's still faster than the
interpreter. This is because they have an asymmetric instruction
decoder. The chip decodes up to three instructions at once but only
the first one can be a complex instruction. Exupery will often
generate several complex instructions then a sequence of simple
instructions. Instruction selection could both reduce register
pressure by moving instructions closer to those that create the values
they use and deal with the P3's problems.
Then there's ports. Rick has started a PPC port.
OK, so that's the things that might be worthwhile starting soon.  The
strongest arguments are either significant benchmarks or
code. Benchmarks should be things that I can add to ExuperyBenchmark,
so no licencing issues and preferably that only use code in the base
image or standard (whatever that means) packages.
Bryce
P.S. The outline above is too brief but hopefully it'll give you a
flavor of where Exupery is currently and where it could go in the 
near future.