Hi All,
On Mon, Sep 15, 2014 at 6:01 AM, Thierry Goubier thierry.goubier@gmail.com wrote:
2014-09-15 14:39 GMT+02:00 Clément Bera bera.clement@gmail.com:
Hello,
Note that slang is a subset of smalltalk. The Slang compiler does not allow to compile smalltalk to C. It allows to compile a smalltalk with restricted message sends and classes to C.
Yes, I am aware of that. I remember that from the very beginnings of Squeak.
Wasn't Smalltalk/X the one which had a more complete version of that C translation? I did an internship in a French company who had a Smalltalk to C translator done for them a long time ago.
2014-09-15 13:28 GMT+02:00 Thierry Goubier thierry.goubier@gmail.com:
Hi Phil,
thanks for the update on Slang to C. Allways significant to have that.
Two open questions:
- would a slang to x86 asm via NativeBoost be doable / a nice target?
Yes it would be interesting. However, by having a Slang to C compiler, we're platform-independent, we can compile the C code to x86, x86_64 and ARM quite easily (some part of the VM are already processor dependent, but not so much). Targeting direct machine code implies evolving the Slang compiler for each new assembly code we support. It sounds like a lot of engineering work compared to our resources and the gain.
It would allow JIT-type compilation experiments than a Slang-to-C chain isn't designed for :) With a lot more work doing the various NB ports, of course.
- would targetting LLVM-IR be of interest?
If you compile the C code with Clang instead of gcc, which starts to be
the case because of the lack of support for gcc in the latest Mac OS X, you are already using LLVM IR because Clang uses it. As the VM use the GNU C extensions to improve performance, I do not think that targeting directly the LLVM IR would greatly improve performance. So it sounds like quite some engineering work for no gain.
I would not suggest replacing C by LLVM-IR for VM work, in part because LLVM-IR is not what I would call a readable source code format... But I do know that even when doing C to C rewritting for embedded compilation, there is some low-level code that you can't write in C.
I find this whole discussion depressing. It seems people would rather put their energy in chasing quick fixes or other technologies instead of contributing to the work that is being done in the existing VM. People discuss using LLVM as if the code generation capabilities inside Cog were somehow poor or have no chance of competing. Spur is around twice as fast as the current memory manager, has much better support for the FFI. Clément and I, now with help from Ronie, are making excellent progress towards an adaptive optimizer/speculative inliner that will give us similar performance to V8 (the Google JavaScript VM, lead by Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al. We are trying to get person-power for a high-quality FFI and have a prototype for a non-blocking VM. When we succeed C won't be any better and so it won't be an interesting target. One will be able to program entirely in Smalltalk and get excellent performance. But we need effort. Collaboration.
Personally I feel so discouraged when people talk about using LLVM or libffi or whatever instead of having the courage and energy to make our system world-class. I have the confidence in our abilities to compete with the best and am saddened that people in the community don't value the technology we already have and can't show faith in our abilities to improve it further. Show some confidence and express support and above all get involved.
Collaborators http://www.mirandabanda.org/cogblog/collaborators/ Cog Projects http://www.mirandabanda.org/cogblog/cog-projects/ Spur 1/3 https://www.youtube.com/watch?v=k0nBNS1aHZ4&index=49&list=PLJ5nSnWzQXi_6yyRLsMMBqG8YlwfhvB0X Spur, a new object representa... http://www.slideshare.net/esug/spur-a-new-object-representation-for-cog Sista: Improving Cog's JIT performance 1/2 https://www.youtube.com/watch?v=X4E_FoLysJg&list=PLJ5nSnWzQXi_6yyRLsMMBqG8YlwfhvB0X&index=76 Sista: Improving Cog’s JIT pe http://www.slideshare.net/esug/sista-talkesug2.. Lowcode: Redoing NativeBoost ... http://www.slideshare.net/esug/03-lowcodeslides
However, I think Ronie was interested in doing such work. If he succeeds
and reports performance improvement, then we can consider using his compiler to compile the VM.
Keep us posted!
Thierry
Hear hear!
-C
[1] http://tinyurl.com/m66fx8y (original message)
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok) + 1 415 287 3547 (no SMS)
Hello,
I am segmenting this mail in several sections.
--------------------------------------------------------------- - On Lowcode and Cog
I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.
Lowcode is currently a spec of new bytecode instructions. These instructions can be used for: - Implementing a C like language compiler. - Making FFI calls
I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.
These instructions provides features for: - Int32 and Int64 integer arithmetic without type checking. - Pointers, with arithmetics. - Memory access and memory manipulation. - Single and double precision floating point arithmetics. - Conversion between primitive types. - Boxing and unboxing of primitive types. - Unchecked comparisons. - Native function call. Direct and indirect calls. - The atomic operation compare and swap. - Object pin/unpin (requires Spur). - VM releasing and grabbing for threaded ffi.
Current I have implemented the following backends: - A C interpreter plugin. - A LLVM based backend.
Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.
During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.
After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.
--------------------------------------------------------------- - Smalltalk -> LLVM ?
As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.
To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.
Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it: - Don't optimize anything. - Perform a costly static global analysis of the whole program. - Measure in runtime and hope for the best. - Extend the language.
In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.
Real time applications requires an upper bound guarantee in their response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.
For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either: - Extend the language to hand optimize some performance critical sections of code. - Use another language to optimize these critical section. - Use another language for the whole project.
And of course, you have to perform lot of profiling.
Greetings, Ronie
2014-09-15 16:38 GMT-03:00 Craig Latta craig@netjam.org:
Hear hear!
-C
[1] http://tinyurl.com/m66fx8y (original message)
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok)
- 1 415 287 3547 (no SMS)
Hoi Ronie--
Nice summary. Thanks!
-C
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok) + 1 415 287 3547 (no SMS)
Hi Ronie,
On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado roniesalg@gmail.com wrote:
Hello,
I am segmenting this mail in several sections.
- On Lowcode and Cog
I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.
remember to send me code for integration. I'm eagerly waiting to use your code!
Lowcode is currently a spec of new bytecode instructions. These
instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls
I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.
These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.
Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.
Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.
During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.
After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.
Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".
I am so happy to have your involvement. You and Clément bring such strength and competence.
---------------------------------------------------------------
- Smalltalk -> LLVM ?
As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.
To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.
Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.
In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.
Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.
Real time applications requires an upper bound guarantee in their response
time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.
For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical sections
of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.
The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But
a) I've yet to see anybody raise JIT latency as an issue in Cog b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.
And of course, you have to perform lot of profiling.
Early and often :-).
Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.
Greetings,
Ronie
2014-09-15 16:38 GMT-03:00 Craig Latta craig@netjam.org:
Hear hear!
-C
[1] http://tinyurl.com/m66fx8y (original message)
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok)
- 1 415 287 3547 (no SMS)
2014-09-16 1:46 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Ronie,
On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado roniesalg@gmail.com wrote:
Hello,
I am segmenting this mail in several sections.
- On Lowcode and Cog
I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.
remember to send me code for integration. I'm eagerly waiting to use your code!
Lowcode is currently a spec of new bytecode instructions. These
instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls
I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.
These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.
Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.
Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.
During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.
After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.
Lack of documentation ?
About Cog there are these documentation: Back to the future http://ftp.squeak.org/docs/OOPSLA.Squeak.html About VMMaker http://wiki.squeak.org/squeak/2105 Object engine http://www.rowledge.org/resources/tim%27s-Home-page/Squeak/OE-Tour.pdf General information http://squeakvm.org/index.html Blue book part 4 http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf Deep into Pharo part 4 about blocks and exceptions http://www.deepintopharo.com/ VMIL paper about Cogit http://design.cs.iastate.edu/vmil/2011/papers/p03-miranda.pdf The Cog blog http://www.mirandabanda.org/cogblog/ About Spur: summary http://clementbera.wordpress.com/2014/02/06/7-points-summary-of-the-spur-memory-manager/ and object format http://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/ This post http://clementbera.wordpress.com/2013/08/09/the-cog-vm-lookup/ And many useful class and method comments that taught me a lot.
When I try to work with Pharo frameworks, even recent ones, it is very rare that I see as much documentation than it exists for Cog. Some frameworks are documented in the Pharo books and a few other as Zinc have good documentation, but in general, there are few documentation and *even fewer people writing documentation*. The website about Cog has existed for over 6 years now. I think Cog is far from the worst documented part of Pharo.
Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".
It's also difficult because the first tests to write are the hardest to
write.
I am so happy to have your involvement. You and Clément bring such
strength and competence.
- Smalltalk -> LLVM ?
As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.
To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.
Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.
In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.
Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.
Real time applications requires an upper bound guarantee in their response
time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.
For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical sections
of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.
The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But
a) I've yet to see anybody raise JIT latency as an issue in Cog b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.
And of course, you have to perform lot of profiling.
Early and often :-).
Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.
The solution of Eliot makes sense. To write a paper about that I need benchs showing result on real time applications. So there's quite some work to do before.
Greetings,
Ronie
2014-09-15 16:38 GMT-03:00 Craig Latta craig@netjam.org:
Hear hear!
-C
[1] http://tinyurl.com/m66fx8y (original message)
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok)
- 1 415 287 3547 (no SMS)
-- best, Eliot
What would be valuable is a reading list / path to VM enlightenment.
Bluebook is useful Then a tour of the Object Engine by Tim Then plugin articles + Slang The bytecode set Primitive... Context to stack mapping Blocks Non local returns Display/Sensor/event look/timer implementation (like in the porting document). and only then one would move to more advanced topics.
I saw that Clement had a set of VM related books on his desk at INRIA, maybe posting the list would be great!
All the best, Phil
On Tue, Sep 16, 2014 at 11:48 AM, Clément Bera bera.clement@gmail.com wrote:
2014-09-16 1:46 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Ronie,
On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado roniesalg@gmail.com wrote:
Hello,
I am segmenting this mail in several sections.
- On Lowcode and Cog
I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.
remember to send me code for integration. I'm eagerly waiting to use your code!
Lowcode is currently a spec of new bytecode instructions. These
instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls
I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.
These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.
Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.
Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.
During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.
After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.
Lack of documentation ?
About Cog there are these documentation: Back to the future http://ftp.squeak.org/docs/OOPSLA.Squeak.html About VMMaker http://wiki.squeak.org/squeak/2105 Object engine http://www.rowledge.org/resources/tim%27s-Home-page/Squeak/OE-Tour.pdf General information http://squeakvm.org/index.html Blue book part 4 http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf Deep into Pharo part 4 about blocks and exceptions http://www.deepintopharo.com/ VMIL paper about Cogit http://design.cs.iastate.edu/vmil/2011/papers/p03-miranda.pdf The Cog blog http://www.mirandabanda.org/cogblog/ About Spur: summary http://clementbera.wordpress.com/2014/02/06/7-points-summary-of-the-spur-memory-manager/ and object format http://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/ This post http://clementbera.wordpress.com/2013/08/09/the-cog-vm-lookup/
And many useful class and method comments that taught me a lot.
When I try to work with Pharo frameworks, even recent ones, it is very rare that I see as much documentation than it exists for Cog. Some frameworks are documented in the Pharo books and a few other as Zinc have good documentation, but in general, there are few documentation and *even fewer people writing documentation*. The website about Cog has existed for over 6 years now. I think Cog is far from the worst documented part of Pharo.
Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".
It's also difficult because the first tests to write are the hardest to
write.
I am so happy to have your involvement. You and Clément bring such
strength and competence.
- Smalltalk -> LLVM ?
As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.
To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.
Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.
In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.
Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.
Real time applications requires an upper bound guarantee in their
response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.
For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical
sections of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.
The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But
a) I've yet to see anybody raise JIT latency as an issue in Cog b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.
And of course, you have to perform lot of profiling.
Early and often :-).
Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.
The solution of Eliot makes sense. To write a paper about that I need benchs showing result on real time applications. So there's quite some work to do before.
Greetings,
Ronie
2014-09-15 16:38 GMT-03:00 Craig Latta craig@netjam.org:
Hear hear!
-C
[1] http://tinyurl.com/m66fx8y (original message)
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok)
- 1 415 287 3547 (no SMS)
-- best, Eliot
2014-09-16 14:55 GMT+02:00 phil@highoctane.be phil@highoctane.be:
What would be valuable is a reading list / path to VM enlightenment.
Bluebook is useful Then a tour of the Object Engine by Tim Then plugin articles + Slang The bytecode set Primitive... Context to stack mapping Blocks Non local returns Display/Sensor/event look/timer implementation (like in the porting document). and only then one would move to more advanced topics.
I saw that Clement had a set of VM related books on his desk at INRIA, maybe posting the list would be great!
The book that explains the best how to implement a high performance VM for
Smalltalk and why is Urs Holzle phd http://www.cs.ucsb.edu/~urs/oocsb/self/papers/urs-thesis.html.
Other relevant books on my office focus on specific topics, such as Advanced Compiler Design and Implementation by Steven Muchnick for optimizing compilers or The garbage collection handbook by Richard Jones, Antony Hosking and Eliot Moss.
All the best,
Phil
On Tue, Sep 16, 2014 at 11:48 AM, Clément Bera bera.clement@gmail.com wrote:
2014-09-16 1:46 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Ronie,
On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado roniesalg@gmail.com wrote:
Hello,
I am segmenting this mail in several sections.
- On Lowcode and Cog
I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.
remember to send me code for integration. I'm eagerly waiting to use your code!
Lowcode is currently a spec of new bytecode instructions. These
instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls
I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.
These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.
Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.
Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.
During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.
After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.
Lack of documentation ?
About Cog there are these documentation: Back to the future http://ftp.squeak.org/docs/OOPSLA.Squeak.html About VMMaker http://wiki.squeak.org/squeak/2105 Object engine http://www.rowledge.org/resources/tim%27s-Home-page/Squeak/OE-Tour.pdf General information http://squeakvm.org/index.html Blue book part 4 http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf Deep into Pharo part 4 about blocks and exceptions http://www.deepintopharo.com/ VMIL paper about Cogit http://design.cs.iastate.edu/vmil/2011/papers/p03-miranda.pdf The Cog blog http://www.mirandabanda.org/cogblog/ About Spur: summary http://clementbera.wordpress.com/2014/02/06/7-points-summary-of-the-spur-memory-manager/ and object format http://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/ This post http://clementbera.wordpress.com/2013/08/09/the-cog-vm-lookup/ And many useful class and method comments that taught me a lot.
When I try to work with Pharo frameworks, even recent ones, it is very rare that I see as much documentation than it exists for Cog. Some frameworks are documented in the Pharo books and a few other as Zinc have good documentation, but in general, there are few documentation and *even fewer people writing documentation*. The website about Cog has existed for over 6 years now. I think Cog is far from the worst documented part of Pharo.
Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".
It's also difficult because the first tests to write are the hardest to
write.
I am so happy to have your involvement. You and Clément bring such
strength and competence.
- Smalltalk -> LLVM ?
As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.
To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.
Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.
In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.
Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.
Real time applications requires an upper bound guarantee in their
response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.
For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical
sections of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.
The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But
a) I've yet to see anybody raise JIT latency as an issue in Cog b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.
And of course, you have to perform lot of profiling.
Early and often :-).
Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.
The solution of Eliot makes sense. To write a paper about that I need benchs showing result on real time applications. So there's quite some work to do before.
Greetings,
Ronie
2014-09-15 16:38 GMT-03:00 Craig Latta craig@netjam.org:
Hear hear!
-C
[1] http://tinyurl.com/m66fx8y (original message)
-- Craig Latta netjam.org +31 6 2757 7177 (SMS ok)
- 1 415 287 3547 (no SMS)
-- best, Eliot
On 09/16/2014 06:34 AM, Clément Bera wrote:
The book that explains the best how to implement a high performance VM for Smalltalk and why is Urs Holzle phd http://www.cs.ucsb.edu/~urs/oocsb/self/papers/urs-thesis.html.
Agreed. This is good (almost required) reading for anyone who wants to understand how to implement dynamic languages in a way that is not slow, and to understand why performance of dynamic languages does not need to be much slower than that of statically-typed languages.
After reading this paper, it's also good to think about the fact that it describes work that was done over 20 years ago, and that hardware has changed a great deal in the interim, and think hard about what improvements might be made today over the techniques that Urs and the Self team came up with back then.
Regards,
-Martin
Hi Eliot and all!
Since I work with Ron at 3DICC and Cog is vital to us, I wanted to chime in here.
On 09/15/2014 06:23 PM, Eliot Miranda wrote:
I find this whole discussion depressing. It seems people would rather put their energy in chasing quick fixes or other technologies instead of contributing to the work that is being done in the existing VM. People discuss using LLVM as if the code generation capabilities inside Cog were somehow poor or have no chance of competing. Spur is around twice as fast as the current memory manager, has much better support for the FFI. Clément and I, now with help from Ronie, are making excellent progress towards an adaptive optimizer/speculative inliner that will give us similar performance to V8 (the Google JavaScript VM, lead by Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al.
One thing you need to understand Eliot is that most of us don't have the mind power or time to be able to contribute on that level.
But still, a lot of us are tickled by ideas on the low level - and thus ideas like reusing LLVM, reusing some other base VM, cross compilation etc - pop up.
Don't put too much into it - I am always toying with similar ideas in my head for "fun", it doesn't mean we don't also see/know that *real* VM work like Cog is the main road.
We are trying to get person-power for a high-quality FFI and have a prototype for a non-blocking VM. When we succeed C won't be any better and so it won't be an interesting target. One will be able to program entirely in Smalltalk and get excellent performance. But we need effort. Collaboration.
Let me just mention LuaJIT2 - besides very good performance, among other things it sports a *very* good FFI. Well, in fact Lua in general has several FFIs and tons of C++ bindings tools too - so IMHO anyone doing work in that area should take a sneak peek at LuaJIT2.
And this is a truly "sore" area in Smalltalk since eternity. If we had something as solid as the stuff in the Lua community - then Cog and Smalltalk could go places where it haven't been before I suspect.
If we look at the codebase we have at 3DICC - a very large part consists of complicated plugin code to external libraries and accompanying complicated Smalltalk glue.
Also, if we compare the Lua community with the Squeak/Pharo community, it is quite obvious that the lack of really good FFI solutions leads us to "reinvent" stuff over and over, often poorly, while the Lua people simply wrap high quality external libraries and that's it. Done.
Of course still also stems from the very different background and motives behind the two languages and their respective domains, but still.
Personally I feel so discouraged when people talk about using LLVM or libffi or whatever instead of having the courage and energy to make our system world-class.
Don't feel discouraged - its just that 99% of the community can't help you. :) Instead we should feel blessed that we have 1 Eliot, 1 Clement, 1 Igor and 1 Ronie. Do we have more?
I have the confidence in our abilities to compete with the best and am saddened that people in the community don't value the technology we already have and can't show faith in our abilities to improve it further. Show some confidence and express support and above all get involved.
Let me then make sure you know that 3DICC values *all* work in Cog *tremendously*.
As soon as you have something stable on the Linux side - we would start trying it. Just let me know, on Linux (server) we run your upstream Cog "as is". In fact, I should probably update what we use at the moment :)
Every bit of performance makes a big impact for us - but to be honest, what we would value even more than performance would be ... robustness. I mean, *really* robust. As in a freaking ROCK.
An example deployment: More than 3000 users running the client on private laptops (all Windows variants and hw you can imagine, plus some macs) and the server side running on a SLEW of FAT EC2 servers. We are talking about a whole BUNCH of Cogs running 24x7 on a bunch of servers.
We experience VM blow ups on the client side, both Win32 and OSX. OSX may be due to our current VM being built by clang, but I am not sure. Our Win32 VM is old, we need to rebuild it ASAP. Hard to know if these are Cog related or more likely 3DICC plugin related, but still.
But the client side is still not the "painful" part - we also experience Linux server side Cogs going berserk (100% CPU, no response) or just locking up or suddenly failing to resolve localhost :) etc. I suspect the networking code in probably all these cases. Here we do NOT have special 3DICC plugins so no, here we blame Cog or more likely, Socket plugin. Often? No, but "sometimes" is often enough to be a big problem. In fact, a whole new networking layer would make sense to me.
Also... we need to be able to use more RAM. We are now deploying to cloud servers more and more - and using instances with 16Gb RAM or more is normal. But our Cogs can't utilize it. I am not up to speed what Spur gives us or if we in fact need to go 64 bit for that.
regards, Göran
On Tue, Sep 16, 2014 at 12:56 AM, Göran Krampe goran@krampe.se wrote:
Hi Eliot and all!
Since I work with Ron at 3DICC and Cog is vital to us, I wanted to chime in here.
On 09/15/2014 06:23 PM, Eliot Miranda wrote:
I find this whole discussion depressing. It seems people would rather put their energy in chasing quick fixes or other technologies instead of contributing to the work that is being done in the existing VM. People discuss using LLVM as if the code generation capabilities inside Cog were somehow poor or have no chance of competing. Spur is around twice as fast as the current memory manager, has much better support for the FFI. Clément and I, now with help from Ronie, are making excellent progress towards an adaptive optimizer/speculative inliner that will give us similar performance to V8 (the Google JavaScript VM, lead by Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al.
One thing you need to understand Eliot is that most of us don't have the mind power or time to be able to contribute on that level.
Time is the issue. I'm no brighter than anyone here, but I have my passion. And one can learn. Doug McPherson just contributed the ThreadedARMPlugin having never read the ABI (because he never needed to) before he started the project.
But still, a lot of us are tickled by ideas on the low level - and thus
ideas like reusing LLVM, reusing some other base VM, cross compilation etc
- pop up.
Don't put too much into it - I am always toying with similar ideas in my head for "fun", it doesn't mean we don't also see/know that *real* VM work like Cog is the main road.
We are trying to get person-power for a high-quality FFI and have a
prototype for a non-blocking VM. When we succeed C won't be any better and so it won't be an interesting target. One will be able to program entirely in Smalltalk and get excellent performance. But we need effort. Collaboration.
Let me just mention LuaJIT2 - besides very good performance, among other things it sports a *very* good FFI. Well, in fact Lua in general has several FFIs and tons of C++ bindings tools too - so IMHO anyone doing work in that area should take a sneak peek at LuaJIT2.
And this is a truly "sore" area in Smalltalk since eternity. If we had something as solid as the stuff in the Lua community - then Cog and Smalltalk could go places where it haven't been before I suspect.
If we look at the codebase we have at 3DICC - a very large part consists of complicated plugin code to external libraries and accompanying complicated Smalltalk glue.
Also, if we compare the Lua community with the Squeak/Pharo community, it is quite obvious that the lack of really good FFI solutions leads us to "reinvent" stuff over and over, often poorly, while the Lua people simply wrap high quality external libraries and that's it. Done.
Well I hear you and think that the FFI is extremely important. That's why I implemented proper callbacks for Squeak, why Spur supports pinning, and why I did the MT prototype, and one of the main areas the Pharo team is working on.
Of course still also stems from the very different background and motives
behind the two languages and their respective domains, but still.
Personally I feel so discouraged when people talk about using LLVM or
libffi or whatever instead of having the courage and energy to make our system world-class.
Don't feel discouraged - its just that 99% of the community can't help you. :) Instead we should feel blessed that we have 1 Eliot, 1 Clement, 1 Igor and 1 Ronie. Do we have more?
Collaborators http://www.mirandabanda.org/cogblog/collaborators/
I have the confidence in our abilities to compete
with the best and am saddened that people in the community don't value the technology we already have and can't show faith in our abilities to improve it further. Show some confidence and express support and above all get involved.
Let me then make sure you know that 3DICC values *all* work in Cog *tremendously*.
As soon as you have something stable on the Linux side - we would start trying it. Just let me know, on Linux (server) we run your upstream Cog "as is". In fact, I should probably update what we use at the moment :)
Every bit of performance makes a big impact for us - but to be honest, what we would value even more than performance would be ... robustness. I mean, *really* robust. As in a freaking ROCK.
An example deployment: More than 3000 users running the client on private laptops (all Windows variants and hw you can imagine, plus some macs) and the server side running on a SLEW of FAT EC2 servers. We are talking about a whole BUNCH of Cogs running 24x7 on a bunch of servers.
Without error reports, in fact, without an ability to debug in place (run the assert VM for example, using the -blockonerror switch to freeze it when an assert fails) there's nt a lot I can do. We use a CI server to run regressions at Cadence and my boss makes sure I fix VM bugs promptly when the CI system shows them. We deploy on linux and so reliability there-on is important to us. So perhaps we can discuss how to debug your server issues.
We experience VM blow ups on the client side, both Win32 and OSX. OSX may
be due to our current VM being built by clang, but I am not sure. Our Win32 VM is old, we need to rebuild it ASAP. Hard to know if these are Cog related or more likely 3DICC plugin related, but still.
There are ways of finding out.
But the client side is still not the "painful" part - we also experience
Linux server side Cogs going berserk (100% CPU, no response) or just locking up or suddenly failing to resolve localhost :) etc. I suspect the networking code in probably all these cases. Here we do NOT have special 3DICC plugins so no, here we blame Cog or more likely, Socket plugin. Often? No, but "sometimes" is often enough to be a big problem. In fact, a whole new networking layer would make sense to me.
So we should talk.
Also... we need to be able to use more RAM. We are now deploying to cloud
servers more and more - and using instances with 16Gb RAM or more is normal. But our Cogs can't utilize it. I am not up to speed what Spur gives us or if we in fact need to go 64 bit for that.
yes. Spur 32-bit will allow you to use a little more memory than 32-bit Cog, but tens of percent, not large factors. You'll need to go to 64-bit Spur to be able to access more than 2 or perhaps 3 Gb at the outside.
regards, Göran
Hi Goran
Also, if we compare the Lua community with the Squeak/Pharo community, it is quite obvious that the lack of really good FFI solutions leads us to "reinvent" stuff over and over, often poorly, while the Lua people simply wrap high quality external libraries and that's it. Done.
With Pharo ***every*** single day we improve the system. We asked clement to work since more than a year with Eliot. If people would understand that we created a consortium so that we can put more forces on the VM parts including FFI then it would have an impact. Now comparing lua that has been designed to interact with C and Smalltalk is not really fair but we will get there.
We are attracting smart guys now in the VM because the spirit of the VM guys CHANGED. I remember not so long ago Mariano being told to do his homework. And Mariano as well as all the smart guys in our team were shocked. How could we expect smart guys to join and help. Now this period is over and this is good. We are already seeing the difference: clement, ronie and other will follow.
I hope that we will be able to edit a book based on clement blogs and other information but this is taking time.
RMoD invested in the build and the fact that everybody can compile a VM to attract people too. We proposed to help at the server infrastructure to push commit validation and we will see what can be done.
Every bit of performance makes a big impact for us - but to be honest, what we would value even more than performance would be ... robustness. I mean, *really* robust. As in a freaking ROCK.
This is why I would like to push more regression testing. Goran do you have a regression system for your deployement? I wanted to check the work of Jan Vrany that he proposed to us more than a year from now.
Here we do NOT have special 3DICC plugins so no, here we blame Cog or more likely, Socket plugin. Often? No, but "sometimes" is often enough to be a big problem. In fact, a whole new networking layer would make sense to me.
For me I found that normal jumping over the dirt catch you after a while: this is a law of nature. Now the point is how can we inverse the tendency as we started to do it.
Do you have money to put on the table for that? Else do you prey enough to see it happening magically :) Noury and luc were so fed up with this code that they started to rewrite it and test it but they got exhausted after a while. Because testing network layer. Now these are typical points that we want to discuss within the pharo consortium. Esteban will work on 64 bits port. This is on his official (inria) roadmap. But again we will play it with people that want to play it. 1000/2000 Euros to be in the consortium is not even a trip to the US or Germany.
Stef
Le 15/09/2014 18:23, Eliot Miranda a écrit :
I find this whole discussion depressing. It seems people would rather put their energy in chasing quick fixes or other technologies instead of contributing to the work that is being done in the existing VM. People discuss using LLVM as if the code generation capabilities inside Cog were somehow poor or have no chance of competing. Spur is around twice as fast as the current memory manager, has much better support for the FFI. Clément and I, now with help from Ronie, are making excellent progress towards an adaptive optimizer/speculative inliner that will give us similar performance to V8 (the Google JavaScript VM, lead by Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al. We are trying to get person-power for a high-quality FFI and have a prototype for a non-blocking VM. When we succeed C won't be any better and so it won't be an interesting target. One will be able to program entirely in Smalltalk and get excellent performance. But we need effort. Collaboration.
Hi Eliot,
Not everybody has the necessary skills to help and contribute to your work, my assembly skills are really faraway and outdated now (... little frustration here :( ... ) but imho your work is unvaluable to pharo and smalltalk community - just to mention it, I noticed a 30 to 50% gain in a small bench I wrote for fun recently (a very dumb chess pawn moves generator) with the last Spur vm I was shocked :) 64bits + x2 perfs + non blocking (or multi threaded?) vm are giant steps forward that makes it possible for pharo smalltalk to compete with mainstream technologies
Regards,
Alain
vm-dev@lists.squeakfoundation.org