[squeak-dev] [ANN] CorruptVM preview

Tue Jul 1 06:06:35 UTC 2008

Since lately there a some interest in C/C++ compiling and some people
mentioned how it would be cool to make everything to be dynamically
compiled,
i decided to make a preview announce of a research which i done during
last few months.

A project name is a weird, and definitely it require more appropriate
name to not scare off a potential users/developers, but this is not an
issue right now :)

Let me describe a little the key features of project and what goals it pursuing:

- the main goal is to create a smalltalk language environment (similar
to other smalltalks), but i avoid to call it VM, because its not
really a VM, because there is no VM at all.
- everything is written in smalltalk
- system is completely self sustaining: smalltalk code compiled down
to native code (no initial need in having bytecodes). Of course no-one
prevents you from implementing a bytecode interpreter on top of it.
But this is beyond the scope of current project. :)
- there is no primitives nor need in writing external code in C (or in
any other statically typed language). A primitives replaced by native
methods (methods with <native> pragma), by using which you can
implement any low-level behavior.

- everything (by a 99.9% ;)  in system is up to implementor. There is
a few 'glue' semantics used by compiler, but compiler itself
extensively using static inlining (inlining native methods from
well-known classes such as CompiledMethod/ProtoObject or
StackContext). Memory management/relocation, FFI , a diverse set of
what we currently know as 'privitives' will be implemented in a
system. This opens a potentially huge playground, how system would
look like :)

- avoid using global state. All state which code can potentially refer
to is placed in literals. There is no difference between native
methods and smalltalk methods in compiled method format. The
difference only how they are compiled. Of course there will be some
global state , i think it would be a single 'lobby' object, which
contains a symbols table (required to support symbols uniqueness
thoughout all system). But anyway, references to it will be possible
only from method literals.
- generated native code are location independent. Since all jumps will
be relative, and all location-dependent stuff are either held in
literals or computed. Therefore a CompiledMethod instances can be
relocated freely in memory (by GC and friends) without any change that
it will cause any harm.

- compiler translates smalltalk code to a lambda representation. Then
using different transformations it generates a low-level lambdas,
which represent a virtual machine CPU instructions. No AST nor bunch
of different classes to represent semantic elements of code used.
Lambdas all the way down.

- an object memory model is initially based on Ian's minimal object
system. With some changes.

You can download a snapshot of project at squeaksource:
http://www.squeaksource.com/CorruptVM

What is currently should work:

CVMachineSimulator bootstrap   -- bootstrap a object memory for simulation
CVSimulationTests run -- run different tests on boostrapped object memory

There are also an initial implementation of translating to native code
using Exupery (you need to load Exupery for that).
Do it:
CVExuperyCompiler test inspect

I am currently open for suggestions and advices or discussion in how
is better to implement system based on such design.
Would be glad to read your comments.

There is also a wiki page of project:
http://wiki.squeak.org/squeak/6041

-- 
Best regards,
Igor Stasenko AKA sig.