[Vm-dev] Robust FFI with Memory Protection Keys

Eliot Miranda eliot.miranda at gmail.com
Sat Jun 29 00:40:38 UTC 2019


Hi Ben,

On Thu, Jun 27, 2019 at 10:31 PM Ben Coman <btc at openinworld.com> wrote:

>
> Hi Eliot,
>
> On Tue, 7 Aug 2018 at 03:32, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> Hi Ben,
>>
>>
>> > On Aug 5, 2018, at 6:34 PM, Ben Coman <btc at openinworld.com> wrote:
>> >
>> >
>> >> On 5 August 2018 at 23:10, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>> >>
>> >> Hi Ben,
>> >>
>> >>
>> >>> On Aug 4, 2018, at 8:40 AM, Ben Coman <btc at openinworld.com> wrote:
>> >>>
>> >>>
>> >>> A problem with FFI is that if a callout segfaults, all of memory
>> >>> including that of the Image is suspect, and execution of the Image
>> terminates.
>> >>>
>> >>> Occasionally I hunt around hoping to find technology to mitigate that
>> problem.
>> >>> Maybe this time in I found something... Memory Protection Keys [1]
>> >>> Perhaps these could ensure Image memory safe when an FFI callout
>> segfaults.
>> >>>
>> >>> IIUC the main problem with protecting Image memory on every FFI
>> callout
>> >>> is the time it would take update the flags on every page of Image
>> memory.
>> >>> Would being able to change the protection of a massive number of pages
>> >>> with one syscall make it feasible to wrap them around FFI callouts?
>> >>>
>> >>> This may be useful at least where the FFI use is more about reuse of
>> >>> existing functionality than about performance.
>> >>> Or at least useful while someone is learning/experimenting with FFI
>> for
>> >>> the first time or while becoming familiar with some external library.
>> >>> Further info at [2] & [3].
>> >>
>> >> I think there’s a much simpler improvement that doesn’t go this far.
>> I implemented it in VisualWorks and it’s been in production for more than a
>> decade.  It should be easy to add to Cog.
>> >> The idea is simply to add a flag that tracks if the VM is in an FFI
>> call or not and to test this flag in the VM’s exception handlers for
>> SIGBUS, SIGILL, SIGSEGV and their equivalents on Windows.  The exception
>> handlers then respond when in an FFI call by failing the FFI call
>> primitive, answering a primitive fail code that includes the exception
>> information.  Recently we extended Cog’s failure codes to allow a
>> structured object (I font have the details handy; I’ll check soon).  In
>> this case we need a pc and/or address and an exception code.
>> >> Would this approach satisfy you?
>> >
>> > That sounds good.  Although the argument I've seen is that a memory
>> > access error means you "cant recover because you don't know what may
>> have been corrupted"
>> > I think its worthwhile to be optimistic that the Image may last a bit
>> > longer to get more information about what call from the Image
>> invoked the FFI failure.
>> > And if you've been notified (e.g. via Growl message) you can still
>> > take steps to move to a new Image if the current one is suspect.
>>
>> >> While it’s possible that an FFI call could damage the Smalltalk heap
>> and VM state it’s often not the case do one wants to be able to at least
>> reap the error code and hence identify where the error occurred, inspect
>> the arguments to the call, etc.  It’s hence about having enough
>> functionality to gather what information is available from outside the
>> call, not about being able to continue for a long time after.
>>
>> > Also, the approach you suggest would be a pre-requisite for what I
>> > suggested anyway, and make it easier to later experiment with MPKs.
>>
>> Cool.
>>
>> > Let me know what I can do to help (probably more capable on the testing
>> side).
>>
>> Will do.  If you’re happy with C programming and the simulator part of it
>> is adding a global flag variable, setting and u setting it in FFI calls and
>> callbacks, and then responding to the flag in the exception handler.  The
>> tricky bit is arranging failure.  That I can work on when time allows.
>>
>
> I believe you did some work on this catching of segfaults in FFI callouts
> to return a primitive failure.
> Where did it get up to?  Can you point me at the code that sets/tests this
> flag and sets up the primitive failure?
>

Indeed I did.  Here's the status.  It works on Unix and MacOS but fails on
Windows due to a failure in structur4edd exception handling (stack walking)
with the MinGW toolchain.  I may revisit the code in the context of MSVC
Community Edition 2017, which I'm using for the Terf VM.  Thanks for the
reminder.  here's the structure of the code:

In SmalltalkImage>>#recreateSpecialObjectsArray the error table is extended
to include a prototype instance of ExceptionInFFICallError.  When wanting
to deliver such an error the VM creates a shallow copy of this object,
fills it in. and supplies it as the errorCode in an FFI primitive method.
This change was introduced in System-eem.1041.

newArray at: 52 put: #(nil "nil => generic error" #'bad receiver'
#'bad argument' #'bad index'
#'bad number of arguments'
#'inappropriate operation'  #'unsupported operation'
#'no modification' #'insufficient object memory'
#'insufficient C memory' #'not found' #'bad method'
#'internal error in named primitive machinery'
#'object may move' #'resource limit exceeded'
#'object is pinned' #'primitive write beyond end of object'
#'object moved' #'object not pinned' #'callback error'),
{PrimitiveError new errorName: #'operating system error'; yourself.
ExceptionInFFICallError new errorName: #'exception in FFI call'; yourself}.

ExceptionInFFICallError allInstVarNames #('errorName' 'errorCode' 'pc')
So errorName is #'exception in FFI call', errorCode will be either the
second argument to the signal handler on Unix, or the Win32 exception code
on Win32.  The pc is the pc at which the exception took place.  The error
code will only be delivered if the method contains a primitive error code.
There is a flag in the VM to provide overriding of this behavior, but as
yet there is no primitive to access this flag.  See the two implementors
of primitiveFailForFFIException:at: in the VMMaker source code.

Within the VM the fatal exception handlers (sigsegv in the Unix & MacOS
VMs; squeakExceptionHandler within the win32 VM) always
call primitiveFailForFFIExceptionat.  primitiveFailForFFIExceptionat checks
to see if the VM is in an FFI call and if not, simply returns.  If so, it
does the relevant stack switching actions to discard the C stack and fail
the primitive with the supplied error code & pc.

For reasons unknown, the exception handler squeakExceptionHandler seems not
to be reached if the VM is compiled with clang and/or gcc on win32.  [I
have to confirm this; it's been ten months].

N.B. I had to add an error code variable
to ExternalFunction>>#invokeWithArguments:. Pharo should ensure it also has
one.

HTH
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190628/8e0ff414/attachment-0001.html>


More information about the Vm-dev mailing list