[Vm-dev] Re: [Pharo-dev] [ANN] Pharo bootstrap

Eliot Miranda eliot.miranda at gmail.com
Fri Jan 22 18:06:39 UTC 2016

Hi Guille,

On Wed, Jan 20, 2016 at 4:52 AM, Guillermo Polito <guillermopolito at gmail.com
> wrote:

> Well, there is no formal specification… But I could summarize the features
> as follows (and as Christophe points mostly out)
> OzVM supports having two images in logically separated spaces. This
> separation is achieved by using the free bit in the object header. Thus we
> support only two spaces by now.
> - The main addition is actually a new primitive:
> #resumeFromSpecialObjectsArray that receives an special objects array as
> argument and:
>     - resumes the execution from the active process of such array
>     - on context switch, if the VM is running from a second special
> objects array, it will return to the first one instead.
> - fixes to primitives/bytecodes that assume a single special objects
> array. e.g.,
>    - #someObject and #nextObject should only iterate objects from the
> correct “space”
>    - #isNil comparisons should take the correct nil
>    - the GC should set the correct nil object on Weak containers
> The extensions I made are about 20 overrides to StackPrimitive and
> NewObjectMemory methods.
> The rest is implemented completely on image side. We use mirror primitives
> to manipulate objects from other space (as each image/space has it’s own
> selector table).
> I have in my long TODO list see if the "clever hack” I did can be
> supported in a nice manner on top of Spur’s segmented memory. However, I
> should stop commencing new projects and start finishing them :P.

Indeed, I can join you in wanting things to work first :-).

I want to respond to both sketch how you could port the architecture to
Spur, but also suggest what I think is a much more attractive alternative.

First, how to support two spaces under Spur or Cog Spur.  Spur's memory map
looks like a low segment, followed by any number of segments higher in
memory.  The first segment is composed of several spaces and is allocated
at startup.  Subsequent segments are obtained by mmap and the only
requirement is that they be at higher addresses than the first segment, so
star-up goes to some effort to obtain a suitably sized chunk of memory at
as low as possible an address.

In Cog Spur the first segment looks like this, from low to high:

1. a code zone holding the jit's machine code, any size up to 16Mb, but 1Mb
by default.
2. an object zone holding Spur's new space, an eden and two small survivor
spaces for a classical generation scavenger.  It can be of any size up to
1Gb but is 4Mb by default, with eden being 5/7th and each survivor space
being 1/7th of the space.
3. the object zone holding the first old space segment.  This is big enough
to hold the loaded image plus a free chunk from which new old space objects
(e.g. objects tenured from new space) are allocated.

The Spur interpreter simply omits the code zone.

Now the placing of eden beneath the first old space segment is "baked in".
The store check checks for new objects being stored into old objects, since
the old objects must be added to the remembered table (which lives in old
space).  The first object in the first old space segment is nil, and the
store check checks for objects less than nil being stored into objects
greater than or equal to nil.  Inside the Spur memory manager nilObject,
the variable that holds onto nil, is a variable, not a constant, and so can
be changed, e.g. when switching between two sets of spaces.  Inside machine
code, nil is a constant, referred to directly from machine code.

So in having two object spaces one approach that seems feasible is to have
two initial segments, each with distinct code zones, edens, and first old
space segments:

low: code zone for space 1; new space for space 1; first old space segment
for space 1; code zone for space 2; new space for space 2; first old space
segment for space 2; high

which would be very easy to switch between, but would cause problems for
the store check since objects in new space for space 2 appear greater than
space 1's nil.

So a refinement would be to organize the space like this:

low: code zone for space 1; code zone for space 2; new space for space 1;
new space for space 2; first old space segment for space 1; first old space
segment for space 2; high

and always use space 1's nil for the store check.

You still need to consult both remembered sets during scavenging to find
all the roots, but it would seem feasible.  The VM would need two
CogMethodZones, and your swap primitive would need to exchange them.  The
segment manager could mark old space segments as belonging to either space
1 or space 2, and you could add a snapshot primitive that would snapshot
only space 2. I'm sure there are lots of other problems but I expect you
could make this work.

However, I think there's a /much/ nicer architecture, and I use it myself
for the Spur bootstrap both from the V3 memory manager you're working with
(NewObjectMemory) to Spur 32-bit, and from Spur 32-bit to Spur 64-bit.  And
that is to use the VM simulator and mirrors.

In the VM simulator one has either a JIT or the Stack Interpreter.  But we
can eliminate the JIT because simularting machine code is slower than the
Stack Interpreter, so the fastest execution engine in the simulator is the
StackInterpreter.  And please suspend your disbelief as to this simulator
being fast enough for work.  I will address that later on.  First let's
look at the architecture.

In the simulator, the heap is represented by a single ByteArray "memory",
but I'll call "heapMemory" to avoid confusion.  So in a Spur
StackInterpreter the start of heapMemory is new space, followed immediately
by the first old space segment, whose first object is nil.  When the
simulator grows memory by adding new segments it grows heapMemory, and
leaves a gap between the end of what was the last segment and the new
highest last segment.  So through heapMemory one can access any object in
the space.

The simulator contains a limited set of mirror objects right now, but these
can be extended.  But currently, for example, there are mirror objects that
can be attached to "methods" in heapArray (which are actually just
sequences of bytes) and these mirrors make those sequences of bytes appear
to be real CompiledMethod instances in the host and hence allow the host to
e.g. decompile them or print their bytecodes.  For example, this is
SmalltalkImage>>at:ifAbsent: in a Squeak image in the simulator's heap
pretty printed by the hosts's InstructionPrinter:
17 <00> pushRcvr: 0
18 <10> pushTemp: 0
19 <11> pushTemp: 1
20 <F0> send: a VMObjectProxy for #at:ifAbsent: (2 args)
21 <7C> returnTop

So with suitable mirrors one can access the sequences of bytes in
heapMemory as if they were objects.  The reverse would be possible, where a
special kind of proxy in heapMemory would cause the simulator to escape
back out to the host and send messages to the host system, but I've not
done this.  [The other kind of proxy I use is one that takes a host object
and makes it appear to be a sequence of bytes.  This allows for example,
the Cogit to JIT a method in the host system to heapMemory to test the
Cogit compiler without having to start the simulator. But that's not useful
to you.]

An example of the use of the simulator for bootstrapping is in converting a
V3 image to a Spur image.  Two simulators are created, one for the V3 image
and one, initially empty, for the Spur image.  A conversion pass is run
which clones the relevant objects from the V3 heap to the Spur heap.  Not
all objects get cloned; for example Character objects in V3 get replaced by
immediate Character objects in Spur.

Interesting things happen when moving methods.  In V3 a method's primitive
number is embedded in the header, but in Spur there is a one but flag that
identifies methods with a primitive, and the first bytecode is a 3-byte
CallPrimitive bytecode if there is a primitive.  So cloning methods with
primitives requires allocating three extra bytes and inserting the
CallPrimitive bytecode at the start of the method.  That's an example of a
simple transformation.  But a number of methods in Spur, for object
allocation, object enumeration (allObjects, allInstances, etc) and indeed
compiled methods, are different and so we need to compile the correct code
for these methods since they don't, indeed can't exist in V3, because
they're incompatible with V3.  So what we do is

a) compile from source in the host to a CompiledMethod in the host
b) use the disassembler I wrote as part of my MethodMassage assembler I
wrote to generate assembly (a sequence of Message objects, one per bytecode
with the same signatures used in Context's interpretation messages such as
pushReceiverVariable: etc, but you could use Marcus' assembler too)
c) process the set of literals to map them into proxies for objects in the
Spur heapMemory.
d) use the assembler in MethodMassage to generate a new CompiledMethod
e) clone this into the Spur heapMemory

Up until now we haven't needed to run /any/ code in the Spur image.  But in
cloning, given that Spur has a 22-bit per-object identityHash and V3 has
only an 11-bit identityHash we at least need to rehash all hashed
collections.  To make this convenient the simulator is extended to provide
a perform:-like abstraction that allows us to send messages to specific
objects in the heap and have them executed.  The simulator provides
object:perform:withArguments: and uses its bytecode interpreter to execute
methods.  If we wanted to evaluate arbitrary expressions we could compile a
doit to a CompiledMethod, clone the method into heapMemory and provide e.g.

So with this approach
- the host is given god-like powers to reach inside the target heap and
alter whatever it likes.  the target system doesn't have to be complete, or
capable of execution.
- the target can still self-organise, being able to execute using the
simulator's StackInterpreter
- no special VM support is needed; you get to use a normal Cog (or Spur
Cog) VM for the host running at full speed
- you can deal with 64-bits from 32-bits or 32-bits from 64-bits; there is
no relationship between word size in the host and word size in the target,
beyond the fact that bytecode pcs are affected by the width of literals, so
in a one literal method (methods also have a one word header) the first pc
is 8 in 32-bits and 16 in 64-bits.

Now, the simulator is slooooooow.  Even so (IIRC) the V3 to Spur bootstrap
takes about 10 minutes and the 32-bit Spur to 64-bit Spur bootstrap takes
about 2 minutes.  But the StackInterpreter is slow because all of the code
is full of expensive error-checking asserts.  I bet you could speed up the
StackInterpreter simulator by at least an order of magnitude by using a
special version of the bytecode compiler (e.g. Opal) that recompiled all of
StackInterpreter, SpurMemoryManager et al to eliminate these asserts.
Earlier you've claimed that the difference between the StackInterpreter and
Cog is about a factor of two.  That's a big under estimate.  While the
ration doers depend on how much work is being done in Smalltalk vs how much
work is done external to the system (since Cog only speeds up Smalltalk
code) if you're talking about pure Smalltalk code the difference is more
like 5 to 15 times faster.  So I think if you use Cog Spur and recompile
without asserts you've be able to get a simulated StackInterpreter that
was, say, a hundred to ten times sower than the (non-Oz) Stack VM, and
depending on how much computation needs to be done in the host, you'd find
performance was /better/ in this architecture than in the Oz multiple
spaces VM.

Remember that Clément is stabilizing Sista and that will give us another
factor of 3 performance boost for pure Smalltalk from Cog Spur.  Sista is
well suited to optimizing the kinds of loops that exist in the simulator's
StackInterpreter, so that code should be well-optimized, so if my estimate
was correct that would mean three to thirty times slower than the Stack
VM.  And it means you get out of the "hack and maintain a VM" business and
focus on the bootstrap.

When you focus on the bootstrap using the above architecture
- the target system *doesn't even have to have a compiler*, since the host
can be used to compile all source, or /any/ tools at all
- the target system *doesn't have to have a complete implementation of self
reflection*.  While it still needs classes CompiledMethod, Behavior,
ClassDescription, MethodDictionary, etc, these don't have to provide
development-time methods such as CompiledMethod
class>>newBytes:trailerBytes:nArgs:nTemps:nStack:nLits:primitive:, or
ClassDescription>>addSelector:withMethod:notifying:.  The target only needs
to support what the target needs to do.  If all you want is Transcript
show: 'Hello world' a lot can be left out, and images below 1Mb are

Let me ask you to at least think about this before you respond and dismiss
it.  I've been using it for a couple of years now as it was key to the V3
to Spur transition, and I like it and can see that with the right mirrors
it can be very powerful.  Doru, Clement and I are intrigued by the
possibilities of combining mirrors with GT ti provide really good tools for
visualizing and manipulating objects in heapMemory. There could be real
synergy of the Oz bootstrap took a similar approach.  I think you lose a
lot of complexity and maintainance activity and gain a lot of power.  Worth
thinking about.


> On 20 ene 2016, at 9:03 a.m., Christophe Demarey <
> Christophe.Demarey at inria.fr> wrote:
> Hi Eliot,
> Le 19 janv. 2016 à 19:29, Eliot Miranda a écrit :
> Hi All,
>     great news!  Where can I read a specification of the Oz VM facilities?
> I do not know all the details of the Oz VM but basically, it is a standard
> stack interpreter vm specialized to be able to run 2 images at the same
> time on top of it.
> Guillermo made it possible by using one available bit in the object header
> of objects to mark the ownership of an object (e.g. is my object from
> image1 or image2?). Then, he mainly had to specialize VM primitives dealing
> with objects retrieval from memory. By example, he had to modify the
> garbage collector to set the right nil instance (yes, we have 2 images so 2
> nil instances. we need to take the one of the active image) when an object
> is garbage collected; the primitive someObject has to return an object from
> the active image, etc.
> There is also a way to switch execution from an image to the other just by
> switching the special objects array reference.
> You can find some information in:
> https://guillep.github.io/files/publications/Poli15Thesis.pdf.
> The Oz code is in http://smalltalkhub.com/#!/~Guille/ObjectSpace and in
> https://github.com/guillep/OzVM
> The current implementation uses Cog  6.6.1and OzVM-GuillermoPolito.22.
> I do not know if Guille has a more formal specification of the Oz VM.
> If you have advices on what is the best way to handle two distinct object
> (memory) spaces in Spur, they will be welcomed :)
> Cheers,
> Christophe
> _,,,^..^,,,_ (phone)
> On Jan 19, 2016, at 6:29 AM, Christophe Demarey <
> christophe.demarey at inria.fr> wrote:
> Hi all,
> In case you do not know, we work on bootstrapping Pharo, i.e. create a
> Pharo image from sources, not based on a previous image (well, we use a
> pharo image to produce it but no code / state from it).
> This process will allow to define a minimal Pharo kernel (currently 52
> packages but we could have it far smaller) and to modularize the whole
> image (currently packages have too much dependencies on packages already
> loaded into the image).
> The bootstrap process also allows to write down the recipe to initialize a
> new image from scratch (some code is missing in the image or is wrong). In
> addition, I think we will clean a lot of historical objects that are not
> used anymore.
> With the amazing work done by Guillermo Polito during his Phd (around
> Espell, Oz): https://guillep.github.io/files/publications/Poli15Thesis.pdf
> , *we succeeded to get a first prototype of a bootstraped Pharo 5 image
> (from 5.392)*.
> This prototype is able to run an eval command line handler and to log
> output / errors. Not all classes are yet initialized and you cannot yet
> save / restart this image but it is a big step forward.
> It is a 4 mb image (could be half the size without unicode data). You can
> download it at:
> http://chercheurs.lille.inria.fr/~demarey/pmwiki/pub/pharo-bootstrap/pharo-bootstrap.zip
> .
> Next steps are to have a bootstrapped image fully working, then to load
> packages on top of it (like network, sunit) to produce a minimal image.
> Then, we need to implement an Oz VM on top of Spur.
> After that, we need to work on a reliable way to build the bootstrap (not
> too sensitive to changes in the image).
> Christophe.
> -------
> demarey at 193-51-236-143:~/dev/rmod/bootstrap/bootstrap-2016-01-19$
> ../pharo bootstrap.image --no-default-preferences eval "1 + 1"
> 2
> demarey at 193-51-236-143:~/dev/rmod/bootstrap/bootstrap-2016-01-19$
> ../pharo bootstrap.image --no-default-preferences eval "'a' , 'b'"
> 'ab'
> demarey at 193-51-236-143:~/dev/rmod/bootstrap/bootstrap-2016-01-19$
> ../pharo bootstrap.image --no-default-preferences eval "1 / 0"
> ZeroDivide
> SmallInteger>>/
> UndefinedObject>>DoIt
> OpalCompiler>>evaluate
> OpalCompiler(AbstractCompiler)>>evaluate:
> SmalltalkImage>>evaluate:
> EvaluateCommandLineHandler>>no (source is Undeclared)
> no source in EvaluateCommandLineHandler>>evaluate: in Block: no source
> BlockClosure>>on:do:
> EvaluateCommandLineHandler>>evaluate:
> EvaluateCommandLineHandler>>evaluateArguments
> EvaluateCommandLineHandler>>activate
> EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
> BasicCommandLineHandler>>no (source is Undeclared)
> no source in
> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in
> Block: no source
> BlockClosure>>on:do:
> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
> BasicCommandLineHandler>>no (source is Undeclared)
> no source in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in
> Block: no source
> BlockClosure>>on:do:
> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
> PharoCommandLineHandler>>activate
> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
> PharoCommandLineHandler class>>no (source is Undeclared)
> no source in PharoCommandLineHandler class>>activateWith: in Block: no
> source
> NonInteractiveUIManager(UIManager)>>defer:
> PharoCommandLineHandler class>>activateWith:
> no source in BasicCommandLineHandler>>activateSubCommand: in Block: no
> source
> BlockClosure>>on:do:
> BasicCommandLineHandler>>activateSubCommand:
> BasicCommandLineHandler>>handleSubcommand
> BasicCommandLineHandler>>handleArgument:
> no source in BasicCommandLineHandler>>activate in Block: no source
> SmallInteger>>no (source is Undeclared)
> UndefinedObject>>no (source is Undeclared)
> AbstractCompiler>>no (source is Undeclared)
> SmalltalkImage>>no (source is Undeclared)
> BlockClosure>>no (source is Undeclared)
> EvaluateCommandLineHandler>>no (source is Undeclared)
> EvaluateCommandLineHandler>>no (source is Undeclared)
> CommandLineHandler class>>no (source is Undeclared)
> BasicCommandLineHandler>>no (source is Undeclared)
> BasicCommandLineHandler>>no (source is Undeclared)
> PharoCommandLineHandler>>no (source is Undeclared)
> UIManager>>no (source is Undeclared)
> UndefinedObject>>no (source is Undeclared)
> CommandLineUIManager>>no (source is Undeclared)
> SmalltalkImage>>no (source is Undeclared)
> DelayMicrosecondScheduler>>no (source is Undeclared)
> BlockClosure>>no (source is Undeclared)
> SmalltalkImage>>no (source is Undeclared)
> WeakArray class>>no (source is Undeclared)
> ps: source cannot be displayed because there is no formatter available in
> the bootstrap
> _,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160122/19f6846e/attachment-0001.htm

More information about the Vm-dev mailing list