[squeak-dev] C++ parser in Smalltalk?

Sun Jul 6 13:57:16 UTC 2008

Eliot Miranda writes:
 > On Thu, Jul 3, 2008 at 2:56 PM, <bryce at kampjes.demon.co.uk> wrote:
 > 
 > > Eliot Miranda writes:
 > >  > > Write protection could be implemented using similar tricks to the
 > >  > > write barrier. Then send optimisation will help reduce the costs when
 > >  > > it's used. When it's not used, there's no cost.
 > >  >
 > >  >
 > >  > I don't understand this.  Can you explain the write-barrier tricks and
 > > the
 > >  > send optimization that eliminates them?
 > >
 > > Automatically create a new hidden subclass of the class that acts
 > > appropriately for every write. The write protection can then be
 > > encoded in the class. The only overhead is that required by dynamic
 > > dispatch which we're already paying for.
 > 
 > 
 > Ah, ok.  This doesn't work in general.  e.g. one can't turn on immutablity
 > in the middle of a method that assigns to inst vars.  The method in the
 > subclass would need to be different and mapping pcs at runtime is, uh,
 > decidely nontrivial.  It works for access to arrays but not to objects in
 > general.

A more complicated variant would be to use compiler created accessors
for all variable access then rely on inlining to remove the overhead.

The catch here is this would force some de-optimisation when switching
write protection on and off. Worst case, de-optimisation of inlined
code can require a full object memory scan to find all the contexts
that require deoptimisation.

Fast deoptimisation is one of the strongest reasons I can see for a
context cache in a system that does inlining. For pure speed, the
value of the context cache will be reduced because inlining should
remove the most frequent sends.

 >  > I think per-object write-protection is very useful.  Its very useful for
 > >  > read-only literals, OODBs, proxies (distributed objects), debugging,
 > > etc.
 > >  >  Amongst Smalltalks I think VisualAge had it first and I did it for VW
 > > round
 > >  > about 2002.  I did it again for Squeak at Cadence.  In both the VW and
 > >  > Squeak cases the performance degradation was less than 5% for standard
 > >  > benchmarks.  Its cheap enough not to be noticed and there's lots more
 > > fat in
 > >  > the Squeak VM one can cut to more than regain performance.
 > >  >
 > >  > So unlike, say, named primitives for the core primitives, this is
 > > something
 > >  > I am in favour of.  It is a cost well worth paying for the added
 > >  > functionality.
 > >
 > > And when we're twice as fast as VisualWorks is now it'll be a 10%
 > > overhead. Twice as fast as VisualWorks is the original goal for
 > > Exupery.
 > >
 > 
 > Where are you on that?

Progress is good though will be a little slow over the summer.

I'm working towards the 1.0 release. That's going well. The last
release looks reasonably reliable and the current release compiles
much quicker than previously. I've fixed the performance of the
register allocator so that it doesn't blow out like it used to. Now it
always takes about 50% of the compile time. Cascade support has also
been added so all major language features are now supported.  There's
a bug in the current development version that'll need fixing before
the next minor release.

The work towards 1.0 now involves adding primitives which the
interpreter inlines and tuning. The current engine should be able to
provide a nice performance improvement for Squeak. 1.0 will be worse
than VisualWorks for overall performance as send performance is worse
though still twice as good as Squeak's interpreter. 1.0 should be a
little faster than VisualWorks for bytecode performance though I'd
guess it's a bit slower now because I haven't done any bytecode tuning
in the last few years.

The current releases are good enough to play with. The next one will
be much nicer to play with than the current released version due to
faster compilation.

Compilation is still much slower than it needs to be, so far I've
favoured simplicity, debuggability, and testability over compile time
performance. For instance every compiler stage copies it's input to
create the output even for stages that only do a few
optimisations. Besides the register allocator all optimisations are
simple linear time tree traversals.

After 1.0, the plan is to add full dynamic message inlining in 2.0,
then an SSA optimiser in 3.0. Exupery's goals are similar to AoSTa's,
the major differences are the code generator is in the image, and
Exupery doesn't stop execution to optimise. It compiles in the
background then registers the compiled method which will be used by
later calls. Exupery relies on the interpreter to execute code that
isn't used frequently enough to be worth compiling or that isn't
currently compiled.

Bryce

P.S. Being able to control precisely what's in the code cache makes
debugging crashes much easier. A common trick when debugging Exupery
bugs is to recompile everything that was compiled when it crashed then
try to reproduce the crash. Once it's reproduced, it's possible to do
binary chop of the compiled methods to get down to the few that are
required to reproduce the problem.

That would be much more difficult to do with a HPS style system which
compiles the method before execution. Of course HPS's style has it's
own advantages.