[Vm-dev] Questions about Cog internals

Eliot Miranda eliot.miranda at gmail.com
Tue May 3 17:17:55 UTC 2011


On Tue, May 3, 2011 at 4:49 AM, Mariano Martinez Peck <marianopeck at gmail.com
> wrote:

>
> Hi Eliot. I am really trying (with all my lack of knowledge) to understand
> a little about how Cog works internally. I am also reading your posts, and I
> have a couple of (probably newbie) questions. If any of them are answered in
> the blog, please point me to them (I couldn't ream all of them yet):
>
> 1) Suppose you have a CompiledMethod XXX that you JIT, and you get a
> CogMethod YYY. While doing the GC (#lastPointersOf:,
> #lastPointerWhileForwarding:, etc), you need to check whether XXX is a
> CogMethodReference because if so, you need to fetch XXX header from YYY.
> Perfect. To avoid the GC to look in the CogMethod "objects" you put a header
> with the special format for empty objects, hence the GC doesn't follow the
> non-existent "instVars" of CogMethod. Perfect. CogMethod has a pointer to
> its original CM (back-pointer), called 'methodObject'. In this case, YYY has
> a pointer to XXX. So....my question is, during a GC compaction or a #become,
> where the address of XXX is changed, how do you update YYY so that to point
> to the new address of XXX?   because if you flag YYY as an empy object, then
> the GC doesn't update it.
>

The garbage collector uses NewObjectMemory>>#mapPointersInObjectsFrom:to: to
update pointers for compactions and becomes.  This always invokes
CoInterpreter>>mapInterpreterOops which always invokes
CoInterpreter>>mapMachineCode, which always invokes
Cogit>>mapObjectReferencesInMachineCode:.  That splits into either
Cogit>>mapObjectReferencesInMachineCodeForFullGC or
Cogit>>mapObjectReferencesInMachineCodeForIncrementalGC, depending on this
being an incremental GC or not.  The CogMethodZone maintains a list of Cog
methods containing young references so in an incremental GC only these
methods are scanned.


2) As far as I understand, CogMethod doesn't "store/duplicate" the literals
> of the CompiledMethod. Hence, even when you have a jitted method, when you
> need a special literal, you ask it to the CM, using the backPointer
> 'methodObject'. Is this correct ?
>

That's not correct.  Literals are embedded in machine code, both in inline
caches (selectors and classes) and in literal references.  See
Cogit>>annotate:objRef:.

>
> 3) This is the most stupid question, but I don't see WHERE the machine code
> is kept. When I jit a method, I get a structure CogMethod, perfect. What
> where is the generated machine code? where is it kept? how can I know from a
> CogMethod which is the associated machine code?
>

Look at CoInterpreter>>readImageFromFile:HeapSize:StartingAt: (for the real
VM) and CogVMSimulator>>openOn:extraMemory: (for the simulator).  These
set-up the memory via the variable memory (in the real VM) or 0 (in the
simulator the heap starts at address 0), and cogCodeSize.  Then see
Cogit>>initializeCodeZoneFrom:upTo: for initialization.  The CogMethodZone
is at the start of the heap.

4) I guess that my thought of 2) is not correct, because otherwise, I don't
> understand why you need CoInterpreter >>markAndTraceOrFreeMachineCode:. The
> comments says "Deal with a fulGC's effects on machine code.  Either mark and
> trace oops in machine code or free machine-code methds that refer to
> freed  oops.  The stack pages have already been traced so any method  of
> live stack activations have already been marked and traced."
>
> which oops do you mean by "oops in machine code" ? literals? the
> back-poiner to the CM?
>

Both, and oops in inline caches.


> and by " free machine-code methods that refer to freed  oops"  what do you
> mean?  literals or oops as the back pointer?  I can think you refer to the
> backpointer since the original CM could have been garbage collected and
> since you flag the CogMethod as empty...
>

This is the tracing step that marks live objects.  It must identify all
object references in a Cog method.  But if the Cog method's bytecoded method
isn't marked it frees the Cog method.  See
Cogit>>markAndTraceOrFreeCogMethod:firstVisit:.


> 5) This is not a question, but rather that I would like to know whether I
> understood correctly or not. You Jit a method when it is secondly used, that
> is, when you find it in the cache. To know how to generate the machine code
> or a particular bytecode, you check in the table that you generate wth
> #initializeBytecodeTableForClosureV3 where you basically map bytecodes to
> methods that generates the machine code of such bytecode. If it is a
> primitive you use instead #compilePrimitive which cecks in a similar table,
> but for primitives, which is set in #initializePrimitiveTableForSqueakV3.
>

Methods are jitted either when found in the cache, or when a block is
invoked in the same method twice in a row (on the second block invocation)
or on the Nth backward jump in a loop or when a method is evaluated via
withArgs:executeMethod: (a doit).  Look for transitive senders of
Cogit>>cog:selector:.


Now, I have compiled method XXX (selector xxx) which sends #foo. XXX was
> jitted to CogMethod YYY (selector yyy).   When xxx is executed, YYY is
> executed. When YYY was jitted, you defined in
> #initializeBytecodeTableForClosureV3  that it just be a specific method,
> which at the end, for normal messages it is:  #genSend:numArgs:. That method
> to generate the machine code includes the "trampoline" (which is searched in
> 'sendTrampolines', and in #generateSendTrampolines we can see how you map
> from one to the other one) and sends the associated message, in this case,
> #ceSend:super:to:numArgs:. So...the #foo will be finally "handle" in
> ceSend:super:to:numArgs:.  This is ONLY true if the send was "unlinked".  If
> #foo in fact was jitted also, then you try to link it (to avoid searching in
> cache next times???). Suppose you could link both of them,so next time YYY
> is executed, it will call DIRECTLY the CogMethod of #foo. In this case, the
> method to be executed in the VM is
> #executeCogMethodFromLinkedSend:withReceiver:   instead of
> #ceSend:super:to:numArgs:
>
> So..I am delirious or that is more or less correct ?
>

More or less.  Yes.  Have you read
http://www.mirandabanda.org/cogblog/2011/03/01/build-me-a-jit-as-fast-as-you-can/?
 It covers ceSend:... in detail.


>
> Thanks a lot in advance,
>

you're welcome.


>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20110503/50616bd1/attachment.htm


More information about the Vm-dev mailing list