[Vm-dev] Re: [Pharo-project] [Ann] Ephemerons for Cog

Wed May 25 03:12:31 UTC 2011

On 25 May 2011 03:56, Gerardo Richarte <gera at corest.com> wrote:
>
> On 05/24/2011 08:44 PM, Igor Stasenko wrote
>>> Even if we could assume that tracing and marking could be done
>>> without paying attention to the native code, fixing the references will
>>> still need some knowledge of native code.
>>>
>> Yes. But currently the border of my interest  are not extends beyond
>> mark phase.
> well, it depends on how you are going to later fix the addresses.
> If you are, for example, using proxy/forwarders or threading the
> referrers, then you usually do this during the mark phase, and
> take advantage of that during the compact phase (if there's
> such a phase). I don't know nothing about Squeak's GC, how
> does it assign new addresses and do the compaction? does it
> have two GCs? (a faster, generational, and a slower for full
> compaction for example? does it have an incremental GC?)
>>> Two observations about this:
>>>
>>> . While a CompiledMethod is being executed in its native form,
>>> and if this same CompiledMethod changes itself, for example,
>>> the two versions of the CompiledMethod should be kept alive,
>>> and the only references to the literals in the old CompiledMethod
>>> could be the native code itself. Unless you somehow keep a reference
>>> to the old CompiledMethod (for example, in the stack).
>>>
>> A CompiledMethod could "change" itself only via #become:
> uhm, not sure what you mean, what about something like:
>
> thisContext sender method literalAt: 1 put: 'sarasa'
>
> (I'm taking your syntax, and inventing what I don't know, use
> your imagination to understand :)
>
> This will "kill" the reference to the literal from the CM,
> while the native code should still have it.
>
> could be funnier, like changing the bytecodes, but that's
> somehow strange (thought NOT never seen).
>
>> If you just replacing old compiled method in class with a new one,
>> nothing bad happens,
>> since method activation record (aka context) keeps a reference to
>> activated method and it
> true if prologue pushes the address of the CM (we are talking
> about a JIT VM, right?), for what the prologue must have a
> pointer to the CM, which must be fixed if the CM changes. Same
> for:
>> Again, fixing refs may be needed during compaction phase. It depends
>> on implementation, of course
>> (you can generate native code which have no inlined literals, but
>> refers to them via CompiledMethod as interpreter does).
> You'll need a reference to the CM from native code to do this,
> unless of course you are doing all activations via some sort of
> interpreter/dispatcher, in which case you loose some speed
> you could gain by direct native-to-native direct method call.
> (I'm sure Eliot knows the name for this :)
>> Still it doesn't answers my question: why you need to trace the native
>> code during mark phase. Why CompiledMethods, which
>> represent the native code are not enough?
> Again, not sure about Squeak, but if you use threading or forwarding
> you do that during the Mark phase, and take advantage of it latter (if
> there's any latter at all... I mean, some copying GCs with semi-spaces
> don't have anything else than a trace phase, in which they trace and
> copy)
>> Auxuliary array.. okay. Lets assume that if CompiledMethod has a
>> machine code, it requires 1 extra object on heap (auxuliary array),
>> which of course could have some arbitrary references.
>> But tracing this aux array can be done automatically during usual
>> tracing phase, if it detects that:
>> a) an oop is compiled method
>> b) a method has native code
>> then c) we also tracing auxiliary array
> right, you could do that, an implementation I know from very close has
> some of this auxiliary arrays, only three for all CMs.
>> But what i see in Cog's #markPhase: is different:
>>
>> [ trace roots ]
>> [ trace & free stack pages ].
>> [ trace & free machine code ]  << why it here?
> uhm, that's interesting. I think Eliot said that machine code has structure,
> it is interesting indeed, and lets the GC collect code too. Why there?
> I think anywhere during the tracing phase is the same, isn't it? It does
> look robust (aside from possible bugs, of course).
>> But what i clearly see that such composition are not so nice for
>> introducing ephemerons.

> That shouldn't be a problem, but of course it all depends on your
> implementation of ephemerons and how the rest of the code
> reuses tracing algorithms, so I can't say.
>

For ephemerons there is a queue which should be considered after whole heap
is already marked.
Then each item in that queue should be considered and could trigger
additional marking
(because ephemeron's value should be traced only if it's key are
reachable from other objects than
ephemeron itself).

Because of that, no object should be considered as "free" until
ephemeron queue is fully processed
and no additional values found to trace.

So, if you put it like that:

[ mark usual stuff ]
[ mark and free machine code ]
[ process ephemerons queue ]

then it is wrong , because you can mistakenly free the machine code
which reachable only via one of ephemerons.
Or if you do it like that:

[ mark usual stuff ]
[ process ephemerons queue ]
[ mark and free machine code ]

then it is also wrong, since queue should be considered last.
The only way how it can work is when you have unified trace procedure,
which makes no difference between machine code and heap, and you
postpone freeing any memory
after ephemerons processing are done:

[ mark usual stuff ]  << machine code, stack pages etc etc should be traced here
[ process ephemerons queue ]  << during processing queue, again it
should also trace newly discovered reachable machine code, stack pages
etc
[ free machine code, stack pages etc ]

>    gera

-- 
Best regards,
Igor Stasenko AKA sig.