Since lately there a some interest in C/C++ compiling and some people mentioned how it would be cool to make everything to be dynamically compiled, i decided to make a preview announce of a research which i done during last few months.
A project name is a weird, and definitely it require more appropriate name to not scare off a potential users/developers, but this is not an issue right now :)
Let me describe a little the key features of project and what goals it pursuing:
- the main goal is to create a smalltalk language environment (similar to other smalltalks), but i avoid to call it VM, because its not really a VM, because there is no VM at all. - everything is written in smalltalk - system is completely self sustaining: smalltalk code compiled down to native code (no initial need in having bytecodes). Of course no-one prevents you from implementing a bytecode interpreter on top of it. But this is beyond the scope of current project. :) - there is no primitives nor need in writing external code in C (or in any other statically typed language). A primitives replaced by native methods (methods with <native> pragma), by using which you can implement any low-level behavior.
- everything (by a 99.9% ;) in system is up to implementor. There is a few 'glue' semantics used by compiler, but compiler itself extensively using static inlining (inlining native methods from well-known classes such as CompiledMethod/ProtoObject or StackContext). Memory management/relocation, FFI , a diverse set of what we currently know as 'privitives' will be implemented in a system. This opens a potentially huge playground, how system would look like :)
- avoid using global state. All state which code can potentially refer to is placed in literals. There is no difference between native methods and smalltalk methods in compiled method format. The difference only how they are compiled. Of course there will be some global state , i think it would be a single 'lobby' object, which contains a symbols table (required to support symbols uniqueness thoughout all system). But anyway, references to it will be possible only from method literals. - generated native code are location independent. Since all jumps will be relative, and all location-dependent stuff are either held in literals or computed. Therefore a CompiledMethod instances can be relocated freely in memory (by GC and friends) without any change that it will cause any harm.
- compiler translates smalltalk code to a lambda representation. Then using different transformations it generates a low-level lambdas, which represent a virtual machine CPU instructions. No AST nor bunch of different classes to represent semantic elements of code used. Lambdas all the way down.
- an object memory model is initially based on Ian's minimal object system. With some changes.
You can download a snapshot of project at squeaksource: http://www.squeaksource.com/CorruptVM
What is currently should work:
CVMachineSimulator bootstrap -- bootstrap a object memory for simulation CVSimulationTests run -- run different tests on boostrapped object memory
There are also an initial implementation of translating to native code using Exupery (you need to load Exupery for that). Do it: CVExuperyCompiler test inspect
I am currently open for suggestions and advices or discussion in how is better to implement system based on such design. Would be glad to read your comments.
There is also a wiki page of project: http://wiki.squeak.org/squeak/6041