Hi Igor,

    this looks cool.  It is related to David Ungar's Klein which was an attempt at a self-hosted Self system, and to Exupery and Typed Smalltalk, and Ian's Cola.  Whenever I've thought about this style VM I've always been put off by a bug issue.  How are you going to deal with hard crashes?

One needs some form of symbolic debugging at the machine code level.  If one is debugging the Squeak VM (or any other Smalltalk VM I've worked on) one can compile a version with debug symbols, use the platform's debugger (e.g. gdb) and write debugging functions in C to be called from that debugger.

If one has a self-hosted Smalltalk system with no symbolic information that can be read by a platform's debugger because the system, being Smalltalk, has is own fully reflective self-description, then it seems to me one really is fishing about in a vast hex dump of the entire system, and that doesn't seem workable.  Note that in the presence of a hard crash one doesnt have the system to debug itselr because it has just crashed.

So are you going to export symbolic information that a platform debugger can consume (and if so, how?) or are you going to do something else (e.g. mirrors)?

On Mon, Jun 30, 2008 at 11:06 PM, Igor Stasenko <siguctua@gmail.com> wrote:
Since lately there a some interest in C/C++ compiling and some people
mentioned how it would be cool to make everything to be dynamically
compiled,
i decided to make a preview announce of a research which i done during
last few months.

A project name is a weird, and definitely it require more appropriate
name to not scare off a potential users/developers, but this is not an
issue right now :)

Let me describe a little the key features of project and what goals it pursuing:

- the main goal is to create a smalltalk language environment (similar
to other smalltalks), but i avoid to call it VM, because its not
really a VM, because there is no VM at all.
- everything is written in smalltalk
- system is completely self sustaining: smalltalk code compiled down
to native code (no initial need in having bytecodes). Of course no-one
prevents you from implementing a bytecode interpreter on top of it.
But this is beyond the scope of current project. :)
- there is no primitives nor need in writing external code in C (or in
any other statically typed language). A primitives replaced by native
methods (methods with <native> pragma), by using which you can
implement any low-level behavior.

- everything (by a 99.9% ;)  in system is up to implementor. There is
a few 'glue' semantics used by compiler, but compiler itself
extensively using static inlining (inlining native methods from
well-known classes such as CompiledMethod/ProtoObject or
StackContext). Memory management/relocation, FFI , a diverse set of
what we currently know as 'privitives' will be implemented in a
system. This opens a potentially huge playground, how system would
look like :)

- avoid using global state. All state which code can potentially refer
to is placed in literals. There is no difference between native
methods and smalltalk methods in compiled method format. The
difference only how they are compiled. Of course there will be some
global state , i think it would be a single 'lobby' object, which
contains a symbols table (required to support symbols uniqueness
thoughout all system). But anyway, references to it will be possible
only from method literals.
- generated native code are location independent. Since all jumps will
be relative, and all location-dependent stuff are either held in
literals or computed. Therefore a CompiledMethod instances can be
relocated freely in memory (by GC and friends) without any change that
it will cause any harm.

- compiler translates smalltalk code to a lambda representation. Then
using different transformations it generates a low-level lambdas,
which represent a virtual machine CPU instructions. No AST nor bunch
of different classes to represent semantic elements of code used.
Lambdas all the way down.

- an object memory model is initially based on Ian's minimal object
system. With some changes.

You can download a snapshot of project at squeaksource:
http://www.squeaksource.com/CorruptVM

What is currently should work:

CVMachineSimulator bootstrap   -- bootstrap a object memory for simulation
CVSimulationTests run -- run different tests on boostrapped object memory

There are also an initial implementation of translating to native code
using Exupery (you need to load Exupery for that).
Do it:
CVExuperyCompiler test inspect


I am currently open for suggestions and advices or discussion in how
is better to implement system based on such design.
Would be glad to read your comments.

There is also a wiki page of project:
http://wiki.squeak.org/squeak/6041

--
Best regards,
Igor Stasenko AKA sig.