[Vm-dev] rearchitecting the FFI implementation for reentrancy

Eliot Miranda eliot.miranda at gmail.com
Thu Aug 6 00:56:02 UTC 2009


Hi All,
    I'm looking at making the Squeak FFI reentrant to support nested calls
and possibly threading.  The current FFI has a couple of issues which render
it non-reentrant.

1. The FFI plugin has a number of static state variables which convey
information from the pre-call side of a call-out to the return-side of a
call-out.  These include ffiRetOop, ffiRetClass (oops), ffiRetSpec (IIUC a
pointer into an object body, the result box for a struct result),
ffiRetSpecSize (the size of the struct result box), etc.  Since these
variables are set-up before a call-out and used once the call-out has
returned, any nested call-out that happens within a call-back from the
call-out will overwrite these variables.  Further, any GC that occurs during
a call-back can move the struct result box object, rendering the ffiRetSpec
pointer invalid, and likely resulting in heap corruption.

2. The platform support code for argument marshalling has static areas in
which to marshall the outgoing arguments (which is fine since they're only
used on the way out).  But there are also static areas used to hold copies
of string arguments.  These are malloced, stored in a static array of
pointers, and freed on the return side by ffiCleanup.  Any nested callout
will either overwrite the outer call's strings and/or prematurely free the
string copies when the nested call-outdoes an ffiCleanup.

Making the variables in 1. reentrant is straight-forward.  All state is
derived from the external call spec in an external callout method
(primititive 120 method).  The callback machinery is careful to restore
newMethod to its state on return form a call-back.  So the state can be
refetched from newMethod on the return side of the call-out.  This also
avoids having to update ffiRetOop & ffiRetClass in any GC, and allows them
to be kept as local variables of the call-out side.

Making 2. reentrant is a little more involved, which is why I wanted to
raise this on the list.  here is a sketch of a generic solution that I want
y'all to sanity-check for different architectures with which you're
familiar.


The basic idea is to use alloca to stack allocate all non-movable state for
a particular call-out, using stack discipline to reclaim it automatically
when the FFI call-out primitive returns.  The alloca'ed state comprises
three regions.  In the quasi-diagram below the stack grows down.
    region 1, space for any copied strings and a temporary result for
structure returns
    region 2, an array of pointers, one for each argument, each pointing to
the start of its corresponding argument further on in the alloca'ed space
    region 3, the marshalled arguments, ready for call-out

The call-out machinery can either make a pass over the arguments before
marshalling to compute the size of the alloca'ed region, or it can
guestimate, based on e.g. a precomputing of the size of any struct
arguments.

One question is whether marshaling can be done generically in the FFI plugin
based on e.g. functions such as alignStruct, alignDouble, alignLongLong
implemented in the platform to tell the generic code how to align elements
greater than a word in size, or whether it would be better to put the code
in the platform.  I'm guessing the alignStruct approach would work fine and
could be implemented with macros.

An important question is whether the array-of-pointers scheme will allow
platforms that pass values in registers to locate any and all arguments.
 Perhaps some platforms will require a separate array of pointers to
floating-point values?  Please speak up if you know of any issues here.

Finally I'm assuming that to actually make the call once the arguments have
been marshalled some assembler stub (perhaps one for each argument count up
to the number of register arguments times the number of basic calling
conventions, integer/pointer result, double, struct result) can be called
which takes pointers to the three regions, and the function to be called,
cuts back the stack to the top of the alloca'ed area, putting the return
address in the right place, and tail-calls/jumps to the function.

Does this sound sane?  I'm sure it'll work on a number of platforms I'm
familiar with (which e.g. does not include iPhone). I'm sure it won't work
for x86-64 structures that partytially fit within registers, but the current
implementation is broken for those anyway.  I've used alloca successfully in
the VW FFI but there were different implementations of marshalling for each
ABI.  Can you say either way whether you think this would work for
particular platforms you're familiar with?

TIA
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20090805/2e503a34/attachment.htm


More information about the Vm-dev mailing list