[Vm-dev] Event-driven Cog still crashing (more observations)

Tue Jul 19 13:20:38 UTC 2011

Hi,

David T. Lewis <lewis at mail.msen.com > wrote:

>> I haven't tried to work around this; just wandering whether using
>> malloc instead of alloca is in any way harmful...

>If you use malloc(), then you would also need to call free(), which
>might be tricky given that you are longjmp'ing over the situation.

Well, not really. The StackEvtInterpreter >>
initStackPagesAndInterpret is called once and never returns: it
tail-calls into interpret() setting a global flag somewhere that stack
pages have been allocated. Since interpret() is not supposed to return
originally, nothing bad happens. In fact, I have to retain these stack
pages in memory even when I exit the interpreter, so they are in place
when I reenter. Longjmp discarded the alloca'ed stack space, but some
references in the heap were ponting there: hence the observed
behavior. In fact, any return from the interpreter would create this
problem.

Stack Cog (VMMaker.oscog-eem.105.mcz):

-----------------------------------------------------------------------
interpret
	"This is the main interpreter loop. It normally loops forever,
fetching and executing bytecodes. When running in the context of a
browser plugin VM, however, it must return control to the browser
periodically. This should done only when the state of the currently
running Squeak thread is safely stored in the object heap. Since this
is the case at the moment that a check for interrupts is performed,
that is when we return to the browser if it is time to do so.
Interrupt checks happen quite frequently."

	<inline: false>
	"If stacklimit is zero then the stack pages have not been initialized."
	stackLimit = 0 ifTrue:
		[^self initStackPagesAndInterpret].
	"record entry time when running as a browser plug-in"
	self browserPluginInitialiseIfNeeded.
	self internalizeIPandSP.
	self fetchNextBytecode.
	[true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable].
	localIP := localIP - 1.  "undo the pre-increment of IP before returning"
	self externalizeIPandSP.
	^nil
-----------------------------------------------------------------------

Classic VM (VMMaker-dtl.244.mcz):

-----------------------------------------------------------------------
interpret
	"This is the main interpreter loop. It normally loops forever,
fetching and executing bytecodes. When running in the context of a
browser plugin VM, however, it must return control to the browser
periodically. This should done only when the state of the currently
running Squeak thread is safely stored in the object heap. Since this
is the case at the moment that a check for interrupts is performed,
that is when we return to the browser if it is time to do so.
Interrupt checks happen quite frequently."

	<inline: false> "should not be inlined into any senders"
	"record entry time when running as a browser plug-in"
	self browserPluginInitialiseIfNeeded.
	self initializeImageFormatVersionIfNeeded.
	self internalizeIPandSP.
	self fetchNextBytecode.
	[true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable].
	localIP := localIP - 1.  "undo the pre-increment of IP before returning"
	self externalizeIPandSP.
-----------------------------------------------------------------------

>Note that alloca is used in various places in the interpreter and
>in plugins, both in the slang and in support code for the various
>platforms.

The only place in gcc-interp as I found. Plugins definitely have it; I
did not investigate yet, but will do of course.

>1. Set a jmp_buf to to hold the jump target at the start of the
>interpret() function
>2. longjmp to this jmp_buf in the transferTo function if its argument
>is 0 (thus exiting the interpreter once no processes are ready to
>run).

Yes, this is tne main idea.

>I wonder if there is some other approach that you could use to
>accomplish this?

Another way to accomplish this would be to place the interpreter into
a coroutine with its own stack e. g. using setcontext and getcontext
functions. A big problem with this: Android (my main target beyond
just getting an embeddable VM) does not support these (I got
confirmation from Google). And if I port these functions to ARM, I am
not sure how JNI would work if native stack is switched this way.

In Android, there is no monotonic thread of execution: an Activity
(application) has several methods onXXX() called upon various reasons
(user input event, application backgrounded or about to be terminated,
timer, etc.). And I cannot avoid making JNI callbacks from within the
interpreter. So it may happen that onTouchEvent() is called, and makes
a JNI call to reenter the interpreter, coroutine stack switches (that
is CPU's SP will be somewhere in the heap range), and then I make JNI
callback from within the interpreter - I am not sure how JVM can
handle this. And given that native debugging capabilities on a tablet
are pretty much limited - I would avoid this in any possible way.

Thanks.

-- 
Dimitry Golubovsky

Anywhere on the Web