[squeak-dev] The solution of (Was: Creating an image from first principles)

Wed Aug 6 10:14:43 UTC 2008

2008/7/8 Andreas Raab <andreas.raab at gmx.de>:
> Folks -
>
> Eliot and I had a great lunch conversation today and it convinced me that I
> really should write up an idea that I had earlier and that is actually
> pretty simple: How to create your own image from scratch.
> Here is how it goes.
>
> Start with the interpreter simulator and a (literally) empty object memory.
> Read a series of class definitions (you can use either MC class defs or
> simply parse simple class definitions from sources) that are sufficient to
> define all of the kernel structures that are required by the running VM
> (incl. Object, Behavior, Class, Integer, Array, Process, CompiledMethod,
> ContextPart, Semaphore etc. etc. etc.). Create those by calling the
> allocators explicitly and set them up such that the structure is correct
> (format, superclasses, metaclasses etc). Create nil, true and false based on
> these definitions.
>
> At this point we have a skeleton of classes that we can use to instantiate
> all behaviors required by a running image.
>
> Next, make a modification to the compiler that allows one to create a
> compiled method in the simulator from a MethodNode (which should be
> straightforward since the simulator exposes all of the good stuff for
> creating new objects and instances).
>
> Now we can create new compiled methods in the new image as long as they
> don't refer to any globals.
>
> Next, find a way of dealing with two issues: a) adding the compiled method
> "properly" (e.g., deal with symbol interning and modifying
> MethodDictionaries) and b) global name lookups performed by the compiler
> (since the image is prototypical we can't have it send actual messages; not
> even simulated ones ;-)
>
> The latter issue is the only one that doesn't seem completely obvious which
> is why I would advocate that a bootstrap kernel mustn't use class variables
> or shared pools (in which case the lookup is again trivial since you know
> all the possible names from compiling the original structure).
>
> Now we can load all the source we want to be in our bootstrap image.
>
> Lastly, do the bootstrap: Instantiate the first process, its first context,
> the first message. Run it in the simulator to set up the remaining parts of
> the kernel image (Delay, ProcessorScheduler etc).
>
> Voila, at this point we have a fully functioning kernel image, created
> completely from first principles.
>
> Once you have the kernel image there is no end to the fun: Since you can now
> start sending messages "into" the image (by way of the simulator) you can
> compile any code you want (incl. pools and class vars) and lookup the names
> properly by sending a message to the interpreter simulator. And then you
> just save the image and are ready to go.
>
> Anyone interested?
>
> Cheers,
>  - Andreas
>
> PS. Oh, and I'd be also interested in defining a good interface to do this
> by means of Hydra, i.e., instead of having to run the simulator run the
> compiled VM on an "empty image" to do all of this "for real" instead of in
> the simulator.
>
>

I think i have found a simple and lean solution how to instantiate an
object memory based on analysis & processing within another object
memory.
For this, we need a single primitive, which can be added to Hydra VM
to do all things for us, and there is no need to use simulator!

At first stage, the running image should designate all objects which
have to be cloned into new heap, by providing a collection of such
objects - this can be done completely using available tools & by
writing a proper code.

Then we simply call a primitive, which creating a new (empty) object
memory and cloning all objects found in collection to new heap.
It also should follow the rules, that if (objects at: x) having
reference to (objects at: y) then cloned (objects at: x) will have a
reference to cloned (objects at: y).
In this way, if our collection forms an isolated subgraph (which does
not referencing to any objects outside itself)  - we are able to
instantiate a new object memory without any risk of broken references.

There are three ways how primitive can deal with situation when some
object pointing to an object which is outside a given collection:
- be stupid, and simply fail
- fail as well, but return an array of objects which is the cause of failure
- do not fail and threat such references as far references . This is
for future use, when miracle happens and we can have a cross-heap
references.

I want to give a credit to Klaus, because this idea is born during
discussion with him :)

P.S. if you have noticed, the way how new object memory created is
very similar to http://en.wikipedia.org/wiki/Cell_division
The implementor don't need to care about object formats, different
bits, etc etc - he just needs to designate a full set of objects which
should appear in new object memory.

-- 
Best regards,
Igor Stasenko AKA sig.