[Vm-dev] Re: Problem with alien callbacks
elpochodelagente at gmail.com
Mon Apr 4 03:52:12 UTC 2011
We were researching about this problem for some days now. We also met with
Richie yesterday and he explained us a bit what we were seeing. If we are
right, the problem is that the interpreter is not fully reentrant, so for
callbacks mechanism to work you have to first set the interpreter in a state
that it waits the callback to come in. That way, when the callback arrives,
the interpreter is ready to handle it, and when it finishes handling, the
state is correctly restored. Is that right?
But in the multithreaded stack VM, you Eliot solved this in other way,
right? I think you said that you set the stack as if it was full so that on
the next method activation (well here I'm just guessing) or the next
heartbeat you detected it and made space for the callback to come. I may
have said nonsense sorry if that happened.
The thing is that for our application of paging we need to solve the
callback instantly, no matter what the interpreter is doing at the moment.
So if I was correct about the StackVM, then we couldn't use that either. In
that case what we'd need for this special type of callbacks is to be able to
save all the context of the interpreter. We don't want to have a perfect
solution, just a fine enough one (then we can improve it after moving to
cog, but we must finish this step first, one at a time). Could you tell us
which variables of the interpreter must be saved and which mustn't?
Maybe Igor you had some experience about this with hydravm, right? Also, how
do the nativeboost callbacks work, they might be just what we are looking
On Thu, Mar 31, 2011 at 2:27 PM, Javier Pimás <elpochodelagente at gmail.com>wrote:
> hi! the callback is comming just in:
> "Clean up session id and external primitive index"
> self storePointerUnchecked: 2 ofObject: lit withValue: ConstZero. <- here
> self storePointerUnchecked: 3 ofObject: lit withValue: ConstZero.
> I know, because I'm debugging with gdb, that writing to that place causes a
> page fault (target object's page is marked as read only), and the page fault
> handling mecanism issues the callback to handle it). After all that, the
> original primExternalCall continues execution, and uses the wrong values of
> messageSelector, and lkupClass (even if it found the primitive it would
> write in the wrong place of the cache I think).
> I know that the vm has a lot of state and of course you don't want to save
> everything, but the callback could come in any place, not just
> primExternalCall, so any variable could be used. I was actually surprised
> that just saving the active context and creating a new one was enough to
> save all the state of the VM. Thinking what is enough will not be easy. I
> tried manually saving and then restoring messageSelector and lkupClass
> before and after the callback, which solved the problem for some iterations
> of interpreting, but seemed to corrupt the image, which crashed after some
> moments. Is there anything else you'd recommend to save to workaround this
> for now?
> On Thu, Mar 31, 2011 at 12:53 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:
>> On Thu, Mar 31, 2011 at 6:11 AM, Javier Pimás <elpochodelagente at gmail.com
>> > wrote:
>>> Hi, we are having a problem with callbacks in alien and we would like to
>>> see if we are doing something wrong or if it is a bug in the implementation
>>> (for the standard old vm).
>>> We are receiving the callback just in the middle of a
>>> primitiveExternalCall (actually to a function that will fail because the
>>> plugin is not present, but i don't think that's important). We pinned it to
>>> occur always in the same line, which is
>>> longAtput((lit + (BASE_HEADER_SIZE)) + (2 << (SHIFT_FOR_WORD)),
>>> of primitiveExternalCall. When the callback occurs, the thunkEntry is
>>> called, which if we understand correctly, saves the active context and runs
>>> the interpreter by calling sendInvokeCallbackStackRegistersJmpbuf. The
>>> problem is that things like messageSelector and lkupClass, which are global
>>> variables are not saved while saving the context, and when the callback
>>> returns, the last line of primitiveExternalCall,
>>> rewriteMethodCacheSelclassprimIndex(messageSelector, lkupClass, 0);
>>> puts a 0 in the wrong place. Also, probably as las message sent
>>> was primReturnFromContext:through: (because we just returned from the
>>> context), we get a primitiveFailed, but not for the original called function
>>> but for primReturnFromContext:through:.
>>> What do you think? are we missing something?
>> Hmmm, looking at it I think you must be taking a callback before the
>> external call occurs. Here's how the code reads in Cog:
>> addr := self ioLoadExternalFunction: functionName + BaseHeaderSize
>> OfLength: functionLength
>> FromModule: moduleName + BaseHeaderSize
>> OfLength: moduleLength.
>> addr = 0
>> ifTrue: [index := -1]
>> ifFalse: ["add the function to the external primitive table"
>> index := self addToExternalPrimitiveTable: addr].
>> "Store the index (or -1 if failure) back in the literal"
>> objectMemory storePointerUnchecked: 3 ofObject: lit withValue:
>> (objectMemory integerObjectOf: index).
>> "If the function has been successfully loaded cache and call it"
>> index >= 0
>> [self rewriteMethodCacheEntryForExternalPrimitiveToFunction: (self cCode:
>> [addr] inSmalltalk: [1000 + index]).
>> self callExternalPrimitive: addr]
>> ifFalse: ["Otherwise void the primitive function and fail"
>> self rewriteMethodCacheEntryForExternalPrimitiveToFunction: 0.
>> ^self primitiveFailFor: PrimErrNotFound]
>> So the rewrite to zero (self
>> rewriteMethodCacheEntryForExternalPrimitiveToFunction: 0) isn't done if no
>> callout is made. Where is your callback comming from? Looks like its
>> comming from the internals of things like ioLoadExternalFunction...
>> It is hard to save and restore all the VM state around a callback.
>> There's too much of it in the current VM design. Take a look
>> at rewriteMethodCacheEntryForExternalPrimitiveToFunction:. It is written to
>> be fast, using lastMethodCacheProbeWrite to avoid work in rewriting the
>> cache entry if the module and/or function load fails. That's state one
>> doesn't want to have to save and restore around callbacks along with
>> lkupClass, messageSelector. primitiveFunctionPointer, newMethod,
>> framePointer, instructionPointer and stackPointer are already a lot. Tthis
>> needs more thought.
>>> Javier Pimás
>>> Ciudad de Buenos Aires
> Javier Pimás
> Ciudad de Buenos Aires
Ciudad de Buenos Aires
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Vm-dev