Exupery is an attempt to make Smalltalk fast, as fast as C in many cases. We're nowhere near that at the moment. The short term goal to make the current system practical is much more important now.
The current release is slowly getting ready, I've got two bugs to fix. This release has been mostly debugging. It's much better. The stress test runs to completion.
The stress test runs all the test classes in the system then compiles the top ten methods from each class then reruns it. That should be a reasonable test to show that Exupery is reliable enough to play with. This is the SqueakSource version not the released SqueakMap version which is buggy.
Currently, I think the next release should have block support and super sends. That will mean that Exupery will compile most methods. There's a long list of things that could be done now, however the trick is figuring out what's needed to make Exupery useful. The list I can think of is:
* Block support * Super sends * Specialised inlined #new. * Full method inlining * A 64 bit port * Floating point support (faster, not fast) * 32 bit integer support (like the floating point support) * Instruction Scheduling (for P3s including Pentium Ms). * Ports (that's up to you)
Block creation is not currently compiled. This means that any method that creates a real block isn't compiled which is a surprisingly large number of loop methods. Exupery only speeds up calls from compiled code to compiled code so compiling a full loop is important.
Super sends also are fairly common. Compiling both blocks and super sends should mean that Exupery can compile everything required by the compiler excluding primitives.
Profiling opening explorers on large lists (suggested by Eddie) showed that a lot of time was being spend in either methods with blocks or #new. The #news were often indirect, say hidden in a @ method. About 40-50% of the time was in #news as far as I could see. Better analysis would help including some C/oprofile based profiling. Squeak's #new is very slow, it spends a lot of time figuring out what the object's shape is while creating a new object. By compiling a specialised version of #new for each object this can be heavily optimised.
The above three should provide most of the "easy" gains for normal Smalltalk code. The rest of the "easy" gains will be compiling primitives directly to customised machine code. That's best driven by profiling. There are a lot of primitives and only a few will matter for each hotspot.
The next big architectural addition is likely to be full method inlining. This provides a few benefits. First it'll make common message sends very quick. Second adding it will change the cost/benefits of other optimisations. If enough sends are inlined then optimising the less common cases becomes less important and inlining creates large methods with more opportunities for other optimisations.
The main reason to consider a x86 64 bit port is to make Exupery more portable. So far I've been focussing on making it useful on one platform and a few portability details have been ignored. An x86 port would be a nice small port but still require cleaning up the portability issues.
Exupery almost has a decent architecture to optimise floating point expressions. What's missing is combining primitive inlining with type feedback. Exupery currently does both, I just haven't glued them together. The key to fast floating point without a full SSA optimiser is removing boxing and deboxing floats inside an expression and speeding up object creation (the same as #new above).
32 bit integers have exactly the same optimisation problems as floats. This is assuming that the 32 bit integers are stored in their own objects with primitive that do 32 bit math rather than using two SmallIntegers.
Exupery performs badly on P3 cores, it's still faster than the interpreter. This is because they have an asymmetric instruction decoder. The chip decodes up to three instructions at once but only the first one can be a complex instruction. Exupery will often generate several complex instructions then a sequence of simple instructions. Instruction selection could both reduce register pressure by moving instructions closer to those that create the values they use and deal with the P3's problems.
Then there's ports. Rick has started a PPC port.
OK, so that's the things that might be worthwhile starting soon. The strongest arguments are either significant benchmarks or code. Benchmarks should be things that I can add to ExuperyBenchmark, so no licencing issues and preferably that only use code in the base image or standard (whatever that means) packages.
Bryce
P.S. The outline above is too brief but hopefully it'll give you a flavor of where Exupery is currently and where it could go in the near future.
I would like a lot to see (and take advantage of) squeak with this kind of improvments. My best wishes for this project.
thank you and great job
Sebastián Sastre
ssastre@seaswork.com.ar Seaswork Special Software Solutions www.seaswork.com.ar
Este mensaje y sus adjuntos son confidenciales y de uso exclusivo para el usuario a quien esta dirigido. Puede contener información amparada por el secreto profesional. Si Ud. no es el destinatario especificado no debe copiar, enviar o utilizar ninguna parte del mismo y/o de sus adjuntos por ningún medio tecnológico. Las opiniones vertidas son responsabilidad del autor y no son emitidas ni avaladas por SEASWORK a menos que se indique claramente lo contrario y que la identidad y autoridad del autor, para comprometer a nuestra empresa, puedan ser verificados. No se garantiza la integridad de los mensajes enviados por e-mail ni que los mismos sean enviados en termino, o que no contengan errores o virus. El emisor no aceptara responsabilidad por los errores, modificaciones u omisiones que resulten en el mensaje, bajo la hipótesis de que pudo ser modificado.
-----Mensaje original----- De: exupery-bounces@lists.squeakfoundation.org [mailto:exupery-bounces@lists.squeakfoundation.org] En nombre de Bryce Kampjes Enviado el: Miércoles, 02 de Noviembre de 2005 20:00 Para: exupery@lists.squeakfoundation.org Asunto: Welcome and where we're at
Exupery is an attempt to make Smalltalk fast, as fast as C in many cases. We're nowhere near that at the moment. The short term goal to make the current system practical is much more important now.
The current release is slowly getting ready, I've got two bugs to fix. This release has been mostly debugging. It's much better. The stress test runs to completion.
The stress test runs all the test classes in the system then compiles the top ten methods from each class then reruns it. That should be a reasonable test to show that Exupery is reliable enough to play with. This is the SqueakSource version not the released SqueakMap version which is buggy.
Currently, I think the next release should have block support and super sends. That will mean that Exupery will compile most methods. There's a long list of things that could be done now, however the trick is figuring out what's needed to make Exupery useful. The list I can think of is:
- Block support
- Super sends
- Specialised inlined #new.
- Full method inlining
- A 64 bit port
- Floating point support (faster, not fast)
- 32 bit integer support (like the floating point support)
- Instruction Scheduling (for P3s including Pentium Ms).
- Ports (that's up to you)
Block creation is not currently compiled. This means that any method that creates a real block isn't compiled which is a surprisingly large number of loop methods. Exupery only speeds up calls from compiled code to compiled code so compiling a full loop is important.
Super sends also are fairly common. Compiling both blocks and super sends should mean that Exupery can compile everything required by the compiler excluding primitives.
Profiling opening explorers on large lists (suggested by Eddie) showed that a lot of time was being spend in either methods with blocks or #new. The #news were often indirect, say hidden in a @ method. About 40-50% of the time was in #news as far as I could see. Better analysis would help including some C/oprofile based profiling. Squeak's #new is very slow, it spends a lot of time figuring out what the object's shape is while creating a new object. By compiling a specialised version of #new for each object this can be heavily optimised.
The above three should provide most of the "easy" gains for normal Smalltalk code. The rest of the "easy" gains will be compiling primitives directly to customised machine code. That's best driven by profiling. There are a lot of primitives and only a few will matter for each hotspot.
The next big architectural addition is likely to be full method inlining. This provides a few benefits. First it'll make common message sends very quick. Second adding it will change the cost/benefits of other optimisations. If enough sends are inlined then optimising the less common cases becomes less important and inlining creates large methods with more opportunities for other optimisations.
The main reason to consider a x86 64 bit port is to make Exupery more portable. So far I've been focussing on making it useful on one platform and a few portability details have been ignored. An x86 port would be a nice small port but still require cleaning up the portability issues.
Exupery almost has a decent architecture to optimise floating point expressions. What's missing is combining primitive inlining with type feedback. Exupery currently does both, I just haven't glued them together. The key to fast floating point without a full SSA optimiser is removing boxing and deboxing floats inside an expression and speeding up object creation (the same as #new above).
32 bit integers have exactly the same optimisation problems as floats. This is assuming that the 32 bit integers are stored in their own objects with primitive that do 32 bit math rather than using two SmallIntegers.
Exupery performs badly on P3 cores, it's still faster than the interpreter. This is because they have an asymmetric instruction decoder. The chip decodes up to three instructions at once but only the first one can be a complex instruction. Exupery will often generate several complex instructions then a sequence of simple instructions. Instruction selection could both reduce register pressure by moving instructions closer to those that create the values they use and deal with the P3's problems.
Then there's ports. Rick has started a PPC port.
OK, so that's the things that might be worthwhile starting soon. The strongest arguments are either significant benchmarks or code. Benchmarks should be things that I can add to ExuperyBenchmark, so no licencing issues and preferably that only use code in the base image or standard (whatever that means) packages.
Bryce
P.S. The outline above is too brief but hopefully it'll give you a flavor of where Exupery is currently and where it could go in the near future. _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
exupery@lists.squeakfoundation.org