[Vm-dev] Weird problem when adding a method to StackInterpreter

Mariano Martinez Peck marianopeck at gmail.com
Fri Dec 30 11:13:58 UTC 2011


On Wed, Dec 28, 2011 at 8:28 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:

>
>
>
> On Wed, Dec 28, 2011 at 3:14 AM, Mariano Martinez Peck <
> marianopeck at gmail.com> wrote:
>
>>
>>
>>
>> On Tue, Dec 27, 2011 at 7:02 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:
>>
>>>
>>> Hi Mariano,
>>>
>>> On Tue, Dec 27, 2011 at 7:05 AM, Mariano Martinez Peck <
>>> marianopeck at gmail.com> wrote:
>>>
>>>>
>>>> Hi Eliot. Now I found another thing which took my attention. I would
>>>> also like to trace when objects receives messages from the special
>>>> selectors (special bytecode associated). So for example, I would like to
>>>> trace an object that receives the message #new, #x, etc etc etc. With a
>>>> StackVM I need to call my method #traceObjectUsage: from the bytecodePrim*
>>>> methods. Usually, only when those methods answers before than the
>>>> #normalSend. For example, in #bytecodePrimAdd I trace both the argument and
>>>> the receiver when they are floats. If I do not add my sends to
>>>> #traceObjectUsage:, then they receivers are not marked (logically).
>>>>
>>>> Now, what I don't understand is what happens with CogVM. In Cog, even
>>>> if I don't put my calls to #traceObjectUsage:  the receiver is always
>>>> marked. I guess this is because I have put #traceObjectUsage: in a lot of
>>>> general places of Cog. The "problem" is that with #class and #== the
>>>> receiver is not marked (right now I don't want to discuss whether I should
>>>> trace this or not) . Previously, with StackVM, if I have the call to
>>>> #traceObjectUsage: in #bytecodePrimClass and #bytecodePrimEquivalent  then
>>>> the receiver is marked perfectly. But with Cog I noticed that it doesn't
>>>> matter what I put in #bytecodePrim*   because they seem they are never
>>>> executed.  Is this correct?  Are these special bytecode always jitted from
>>>> the very first time?  or they are jitted on demand (when they are found in
>>>> the cache) like the rest of the normal methods ?    And the main question,
>>>> what can be the cause of why I can trace with all the #bytecodePrim*  but
>>>> not with #class and #== ?   I am obviously missing a place where I should
>>>> trace....
>>>>
>>>
>>> #class and #== are always inlined in jitted code and so if you want to
>>> trace you'll have to modify the jit to add the tracing code as part of the
>>> inlined code.
>>>
>>
>> Ahhh that was is :)  I didn't know that. So now I see that in
>> #initializeBytecodeTableForClosureV3 or friends, you define them as
>> notMapped:
>>         #(1 198 198 genSpecialSelectorEqualsEquals needsFrameNever:
>> notMapped -1). "not mapped because it is directly inlined (for now)"
>>         #(1 199 199 genSpecialSelectorClass needsFrameNever: notMapped
>> 0). "not mapped because it is directly inlined (for now)"
>> And you have comments there and in the beginning of the method. Ok got it
>> :)
>>
>>
>>>  Note that #class and #== must be inlined and not sent for the semantics
>>> to be the same as the interpreter.   In the interpreter these are never
>>> sent, but the bytecode for them is executed, just as in jitted code, the
>>> fetch of class and the comparison are executed but not sent.
>>>
>>>
>> I understand and it makes sense. I have only one small doubt. With the
>> rest of the special shortcut bytecodes such us #bytecodePrimAdd,
>> #bytecodePrimNew, #bytecodePrimGraterThan, etc. there is usually the same
>> behavior: check whether the receiver is of a certain type (like integers,
>> floats, booleans, arrays etc)  and if true then perform a C code instead of
>> the regular message send. Then, if the receiver or argument are not of the
>> expected type, then you follow with a #commonSend. Some other shortcut
>> bytecodes just set the selector and argument count, such us
>> #bytecodePrimAtEnd. And then of course you have #class and #==.
>>
>> Now, in the jit, you seem to use the same method for all of them (all but
>> #class and #==) and it is #genSpecialSelectorSend. Such method seems to
>> only set the selector and argument count. That is the style of the
>> #bytecodePrimAtEnd that I mentioned.  So..... my question is... is it ok to
>> assume that when you JIT those special method they "stop making much sense"
>> (in fact, they have less sense) since the only thing you do is to just set
>> the selector and argumentCount?   What I mean is that the jitted version of
>> #+ for example will be generated as a regular jit (using genSend: selector
>> numArgs: numArgs) rather than checking that the receivers are integers and
>> if true answer directly (as #bytecodePrimAdd does). Am I correct?
>>
>
> Nearly correct :)  There are two JITs, SimpleStackBasedCogit that does no
> inlining except for #class and #== (because Squeak assumes these are
> executed without lookup) and StackToRegisterMappingCogit that inlines
> SmallInteger arithmetic and comparison, #class and #==, and short-cuts
> SmallInteger comparison followed by conditional jumps.
>  SimpleStackBasedCogit compiles the special selector bytecodes for #+, #-,
> #<, #> et al merely by generating normal sends.
>  StackToRegisterMappingCogit compiles (currently) #+ #- #bitAnd: #bitOr: as
> a test for SmallInteger arguments, inlined code, possibly an overflow test,
> and a fall-back conventional send if not SmallIntegers or overflow (see
> genSpecialSelectorArithmetic).  It compiles #< #> #<= #>= #= #~= as tests
> for SmallIntegers followed by inline comparison, and possibly, if followed
> by a conditional branch, the inlined conditional branch, with a fall-back
> conventional send if not SmallIntegers (see genSpecialSelectorComparison).
>  It will also constant fold #+, #-, #bitAnd: & #bitOr: so that e.g. (1 + 2
> bitAnd: 5) bitOr: 8 is compiled to a load of 9.  And I reserve the right to
> add additional optimizations as time passes ;)
>

Thanks Eliot. Much clear now. I understand. Indeed, I was looking at
SimpleStackBasedCogit when I wrote that ;)


> So in summary, the old simple JIT did nothing special, compiling the
> special selectors to normal sends, the new JIT does some simple inlining,
> just for SmallIntegers.
>
> I think this has no implications for your tracing code.  You're unlikely
> to unload the SmallIntegers and so you don't need to trace them.
>

Exactly. I don't care at all to trace SmallIntegers. Even if I wanted, I
cannot right now because they are immediate objects and I am using a bit in
the object header to store the mark.


>  Instead, I would try and define the abstract semantic model for Smalltalk
> and come up with the minimal set of trace points.  For example, for any
> non-immediate object not created as a side-effect of execution (by which I
> mean contexts, blocks and indirection vectors for closed-over variables) it
> can only be accessed via a send.  So it seems to me that the only six
> places in the VM you need to trace objects are sends in the interpreter,
> sends in jitted code, and the inlining of #class & #== in the interpreter
> and jitted code.
>


I think that if you only want to trace when an object receives a message,
then you could be right. In my particular case, I need to go a little bit
further: i need to trace when an object receives a message or when it is
"directly used by the VM". For example, if I send a message to anObject
instance of MyClass, I would like to trace its class, its method
dictionary, its compiled method, and all the involved classes/methodDict in
the lookup (assuming it was a hard lookup). In this case, those objects
(classes, methodDict and compiledMethod) do not receive any message, but
instead they are used by the VM. Since I am tracing object usage to then
decide whether to swap them out or not, this is important. This was just an
example.



>  For performance, you could inline the bit test on the receiver in jitted
> code either into each method's prolog or into the ceTraceLinkedSend
> trampoline, avoiding going to C on every send, which kills performance.
>

mmmm interesting. I am not sure how to start with this. Any deeper hint? an
example to take a look?  :)   My methods are so far

traceObjectUsage: anOop
    ((self isIntegerObject: anOop) not and: [hasToTrace])
        ifTrue: [
            objectMemory setExperimentalBitOf: anOop to: true.
            ]


setExperimentalBitOf: anOop to: boolean
    | hdr |
    self inline: true.
    "Dont check here if it is integer. Check in the sender of this."
    hdr := self baseHeader: anOop.
    boolean
        ifTrue: [ self baseHeader: anOop put: (hdr bitOr:
ExperimentalObjectBit). ]
        ifFalse: [ self baseHeader: anOop put: (hdr bitAnd:
AllButExperimentalBit). ]



>
>>
>>> But given that the stack vm and the cog vm are semantically equivalent
>>> do you even need to add tracing code to the jit? If you're tracing e.g. to
>>> discover how much of the object graph a given computation uses and you;re
>>> going to use this information for something later on, like creating a
>>> kernel image or something, why not just use the stack vm for tracing?
>>>
>>
>> Indeed :)
>> Thanks for going beyond my questions. For this thing I am doing (I call
>> it ObjectUsageTracer) we have so far 2 users:
>> - Luc is trying to do boostrap/kernel. In such scenario he can PERFECTLY
>> use the StackVM since the computation of used/unused objects is mostly done
>> once and then such information is used.
>> - In Marea (what I am doing for my PhD), I want to dynamically detect
>> unused objects, swap them out and replace them by proxies. It means that
>> the system needs detecting these unused objects all the time. It is not
>> something I just do once. Anyway, I could use the StackVM, no problem.
>> But....with Cog I can improve the performance of my solution hehehhe. So I
>> wanted to give it a try and see if I could make the ObjectUsageTracer work
>> in Cog. So far it is working more or less good and I only found the problem
>> of #class and #==. And I am not even sure if that's a problem in my case (I
>> need to think a little bit about it).
>>
>
> Think about the abstract semantics.  How can an object be used?
>  Encapsulation is your friend.
>
>
Not sure if I understood. You mean intercepting bytecodes for variable
accessing rather than message received?

Thanks



>
>> Best regards,
>>
>>
>>
>>
>>>
>>>
>>>>
>>>> Thanks a lot in advance,
>>>>
>>>>
>>>>
>>>> On Mon, Dec 26, 2011 at 10:00 AM, Mariano Martinez Peck <
>>>> marianopeck at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>>> Then the test fails in self assert: (tracer isMarked: obj2).  I
>>>>>>> imagine it is because it is executing the machine code of #foo: . So my
>>>>>>> question is if there is a way where I could intercept and trace the
>>>>>>> receiver also there?   I tried to do it but I failed.
>>>>>>>
>>>>>>
>>>>>> See the flag word traceLinkedSends in cogit.c.  A bit in the flags
>>>>>> causes the JIT to generate a call at the start of a method for tracing:
>>>>>>
>>>>>> #define recordSendTrace() (traceLinkedSends & 2)
>>>>>>
>>>>>> The result is that ceTraceLinkedSend is called on every send.
>>>>>>
>>>>>>
>>>>> Wow. I cannot believe how easy it was :)  Thanks Eliot. So what I did
>>>>> is to change Cogit class >> declareCVarsIn:
>>>>> to set 2 rather than 8:
>>>>>
>>>>>         var: #traceLinkedSends
>>>>>             declareC: 'int traceLinkedSends = 2';
>>>>>
>>>>>
>>>>> And then just add my tracing stuff in #ceTraceLinkedSend
>>>>>
>>>>> Thank you very much Eliot and Happy Christmas to all VM hackers
>>>>>
>>>>>
>>>>>
>>>>>> HTH
>>>>>> Eliot
>>>>>>
>>>>>>
>>>>>>> Thanks a lot in advance,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 23, 2011 at 11:23 AM, Mariano Martinez Peck <
>>>>>>> marianopeck at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Weird ehh, because you use #internalStackValue:  along
>>>>>>>>>> StackInterpreter in a lot of other places and you don't have problems
>>>>>>>>>> there.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Turns out it's not weird at all.  Since
>>>>>>>>>  lookupInMethodCacheSel:class: is used outside of interpret in
>>>>>>>>> findNewMethodInClass: and in callback lookup it can't be inlined and hence
>>>>>>>>> can't access localSP.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Eliot. Thanks for you answer. It also turns out that I don't
>>>>>>>> know enough about SLANG ;)  so it was not weird at all but expected. Ok, I
>>>>>>>> am learning in the way. So I understand that sentence. But  (down)
>>>>>>>>
>>>>>>>>
>>>>>>>>> If you want to get the receiver you'll need to use stackValue:
>>>>>>>>> *and* you'll need to ensure that stackPointer is updated when
>>>>>>>>> calling lookupInMethodCacheSel:class: from internalFindNewMethod (see
>>>>>>>>> externalizeFPandSP), which may slow down the interpreter slightly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I DO understand what #externalizeFPandSP does, but what I don't
>>>>>>>> understand is why I should only do it in #internalFindNewMethod. I mean,
>>>>>>>> what happens with all the rest of the senders of
>>>>>>>> #lookupInMethodCacheSel:class:   ?  maybe if one of those senders do not
>>>>>>>> update stackPointer (externalizeFPandSP), then in
>>>>>>>> #lookupInMethodCacheSel:class:  I will be accessing something wrong ?
>>>>>>>>
>>>>>>>> Anyway, I wanted to trace the receiver in
>>>>>>>> #lookupInMethodCacheSel:class:  to avoid doing it in all its senders. But
>>>>>>>> with the problem found, I workarrounded by tracing the receiver in its
>>>>>>>> senders (only those inlined) and that seems to work :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  You're going to have to delve into the inliner in Slang.  This
>>>>>>>>>>> is, um, not fun.  I liken it to getting hit on the head with a stick by
>>>>>>>>>>> your guru, except that no enlightenment results.  Good luck.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> :(   thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Mariano
>>>>>>>>>>>> http://marianopeck.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> best,
>>>>>>>>>>> Eliot
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Mariano
>>>>>>>>>> http://marianopeck.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> best,
>>>>>>>>> Eliot
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mariano
>>>>>>>> http://marianopeck.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mariano
>>>>>>> http://marianopeck.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> best,
>>>>>> Eliot
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mariano
>>>>> http://marianopeck.wordpress.com
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> best,
>>> Eliot
>>>
>>>
>>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>>
>>
>
>
> --
> best,
> Eliot
>
>
>


-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20111230/3d6672b7/attachment-0001.htm


More information about the Vm-dev mailing list