Eliot Miranda writes:
On Thu, Jul 3, 2008 at 2:56 PM, bryce@kampjes.demon.co.uk wrote:
Eliot Miranda writes:
Write protection could be implemented using similar tricks to the write barrier. Then send optimisation will help reduce the costs when it's used. When it's not used, there's no cost.
I don't understand this. Can you explain the write-barrier tricks and
the
send optimization that eliminates them?
Automatically create a new hidden subclass of the class that acts appropriately for every write. The write protection can then be encoded in the class. The only overhead is that required by dynamic dispatch which we're already paying for.
Ah, ok. This doesn't work in general. e.g. one can't turn on immutablity in the middle of a method that assigns to inst vars. The method in the subclass would need to be different and mapping pcs at runtime is, uh, decidely nontrivial. It works for access to arrays but not to objects in general.
A more complicated variant would be to use compiler created accessors for all variable access then rely on inlining to remove the overhead.
The catch here is this would force some de-optimisation when switching write protection on and off. Worst case, de-optimisation of inlined code can require a full object memory scan to find all the contexts that require deoptimisation.
Fast deoptimisation is one of the strongest reasons I can see for a context cache in a system that does inlining. For pure speed, the value of the context cache will be reduced because inlining should remove the most frequent sends.
I think per-object write-protection is very useful. Its very useful for
read-only literals, OODBs, proxies (distributed objects), debugging,
etc.
Amongst Smalltalks I think VisualAge had it first and I did it for VW
round
about 2002. I did it again for Squeak at Cadence. In both the VW and Squeak cases the performance degradation was less than 5% for standard benchmarks. Its cheap enough not to be noticed and there's lots more
fat in
the Squeak VM one can cut to more than regain performance.
So unlike, say, named primitives for the core primitives, this is
something
I am in favour of. It is a cost well worth paying for the added functionality.
And when we're twice as fast as VisualWorks is now it'll be a 10% overhead. Twice as fast as VisualWorks is the original goal for Exupery.
Where are you on that?
Progress is good though will be a little slow over the summer.
I'm working towards the 1.0 release. That's going well. The last release looks reasonably reliable and the current release compiles much quicker than previously. I've fixed the performance of the register allocator so that it doesn't blow out like it used to. Now it always takes about 50% of the compile time. Cascade support has also been added so all major language features are now supported. There's a bug in the current development version that'll need fixing before the next minor release.
The work towards 1.0 now involves adding primitives which the interpreter inlines and tuning. The current engine should be able to provide a nice performance improvement for Squeak. 1.0 will be worse than VisualWorks for overall performance as send performance is worse though still twice as good as Squeak's interpreter. 1.0 should be a little faster than VisualWorks for bytecode performance though I'd guess it's a bit slower now because I haven't done any bytecode tuning in the last few years.
The current releases are good enough to play with. The next one will be much nicer to play with than the current released version due to faster compilation.
Compilation is still much slower than it needs to be, so far I've favoured simplicity, debuggability, and testability over compile time performance. For instance every compiler stage copies it's input to create the output even for stages that only do a few optimisations. Besides the register allocator all optimisations are simple linear time tree traversals.
After 1.0, the plan is to add full dynamic message inlining in 2.0, then an SSA optimiser in 3.0. Exupery's goals are similar to AoSTa's, the major differences are the code generator is in the image, and Exupery doesn't stop execution to optimise. It compiles in the background then registers the compiled method which will be used by later calls. Exupery relies on the interpreter to execute code that isn't used frequently enough to be worth compiling or that isn't currently compiled.
Bryce
P.S. Being able to control precisely what's in the code cache makes debugging crashes much easier. A common trick when debugging Exupery bugs is to recompile everything that was compiled when it crashed then try to reproduce the crash. Once it's reproduced, it's possible to do binary chop of the compiled methods to get down to the few that are required to reproduce the problem.
That would be much more difficult to do with a HPS style system which compiles the method before execution. Of course HPS's style has it's own advantages.