[Vm-dev] Re: [Pharo-project] [Ann] Ephemerons for Cog

Tue May 24 23:44:16 UTC 2011

On 24 May 2011 23:51, Gerardo Richarte <gera at corest.com> wrote:
>
> Isn't it would be simpler to treat machine code to be 'alive' as long
>> as its corresponding compiled method are alive as well?
>> And objects , reachable directly from compiled method are the same as
>> reachable from machine code. No?
>
> Tracing objects have two ends, keeping reachable alive,
> and fixing references if the referred object moves.
>
> Even if we could assume that tracing and marking could be done
> without paying attention to the native code, fixing the references will
> still need some knowledge of native code.
>

Yes. But currently the border of my interest  are not extends beyond
mark phase.
Ephemerons don't need special handling during other GC phases, only
during mark phase.
While fixing refs are done usually at compaction phase.

> Two observations about this:
>
> . While a CompiledMethod is being executed in its native form,
> and if this same CompiledMethod changes itself, for example,
> the two versions of the CompiledMethod should be kept alive,
> and the only references to the literals in the old CompiledMethod
> could be the native code itself. Unless you somehow keep a reference
> to the old CompiledMethod (for example, in the stack).
>

A CompiledMethod could "change" itself only via #become:
If you just replacing old compiled method in class with a new one,
nothing bad happens,
since method activation record (aka context) keeps a reference to
activated method and it
is not changes if you simply manipulating with classe's method dictionary.
You can even remove method from class, but it doesn't means that all
contexts with such method
should magically disappear from stack, right? :)

So, the only special case is when compiled method are turned into
another object via #become: .
And you can deal with it pretty easily: simply deoptimize all stack
frames with machine code for that method
and make sure that it will run interpreted code when those contexts
will be activated.
In any way, just make sure that you never invoking a machine code
which is now invalidated because of #become:

Btw.. what happen with contexts whose compiled methods are turned not
to compiled methods?
For instance:

thisContext sender method becomeForward: (XYZ new)

?

To my expectation, VM could behave gracefully in such cases by simply sending
#cannotInterpret: (or variant of it) once context with such thing
(which has to be a method but its not)
at attempt to activate the context with it.
(Currently such expression just violently crashing the VM )

> . An alternative option to fix the references from native methods,
> is to have arrays of pairs, Object-Offset. Where the object is what's
> pointed to by the native array, and the offset is where in the code
> cache the reference is. Regular GC will fix this auxiliary arrays,
> and a final step in GC should fix the native references.
>

Again, fixing refs may be needed during compaction phase. It depends
on implementation, of course
(you can generate native code which have no inlined literals, but
refers to them via CompiledMethod as interpreter does).

But not during mark phase. Frankly i don't see good reason for a need
to scan native code during mark phase
(apart from detecting unused code.. but then i think a better place
for this stuff closer to code which responsible for compaction phase).

> I'm not sure about Cog, but I'm sure that at least one other VM
> implements this later technique.
>
> It has three different references from native code to Objects,
> and one is special: literals referred from CompiledMethods are
> inlined references from native code. The CompiledMethod itself
> is referenced from native code, as it's pushed into the stack on
> the prologue. And finally, the "class pointer" of the receiver is
> checked in the prologue to see if the receiver is of the right
> "class".
>
> This last references, lets call them classCheckReferences, are modified
> when there's polymorphism and the same native code is activated for
> a different class. For this case we would need to also change the class
> in the auxiliary array, but to do that we'd need to know where to change
> it, etc. It could have been an indirect reference through the auxiliary
> array,
> however, the auxiliary array may also move, so we enter a circle.
>
> The solution is to just have offsets in the auxiliary array for
> classCheckReferences.
> The GC walks this array in a special way: dereferencing the "offsets", and
> treating that word in the code cache as a regular object slot. This is
> one way
> of knowing how to trace into the native code for object references.
>
> You could also have some special format with metainformation on every native
> code, to tell where the references are. Or you could make all references
> indirect
> through the PC (on platforms where it's supported), and have all references
> togehter in memory, etc.
>
> oh well... I hope you skipped part of this :)

No. It was infromative.

Still it doesn't answers my question: why you need to trace the native
code during mark phase. Why CompiledMethods, which
represent the native code are not enough?

Auxuliary array.. okay. Lets assume that if CompiledMethod has a
machine code, it requires 1 extra object on heap (auxuliary array),
which of course could have some arbitrary references.
But tracing this aux array can be done automatically during usual
tracing phase, if it detects that:
a) an oop is compiled method
b) a method has native code
then c) we also tracing auxiliary array

So then, when you tracing stuff, you deal with it in-place, per each
discovered compiled method.

But what i see in Cog's #markPhase: is different:

[ trace roots ]
[ trace & free stack pages ].
[ trace & free machine code ]  << why it here?

I cannot say if it good or bad, robust or not, because largely i don't
know the logic
behind it..
But what i clearly see that such composition are not so nice for
introducing ephemerons.

So, Eliot, please shed some more light on the purpose of
#markAndTraceOrFreeMachineCode:

-- 
Best regards,
Igor Stasenko AKA sig.