[Vm-dev] Robust FFI with Memory Protection Keys

Mon Aug 6 19:32:10 UTC 2018

Hi Ben,

> On Aug 5, 2018, at 6:34 PM, Ben Coman <btc at openinworld.com> wrote:
> 
> 
>> On 5 August 2018 at 23:10, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>> 
>> Hi Ben,
>> 
>> 
>>> On Aug 4, 2018, at 8:40 AM, Ben Coman <btc at openinworld.com> wrote:
>>> 
>>> 
>>> A problem with FFI is that if a callout segfaults, all of memory
>>> including that of the Image is suspect, and execution of the Image terminates.
>>> 
>>> Occasionally I hunt around hoping to find technology to mitigate that problem.
>>> Maybe this time in I found something... Memory Protection Keys [1]
>>> Perhaps these could ensure Image memory safe when an FFI callout segfaults.
>>> 
>>> IIUC the main problem with protecting Image memory on every FFI callout
>>> is the time it would take update the flags on every page of Image memory.
>>> Would being able to change the protection of a massive number of pages
>>> with one syscall make it feasible to wrap them around FFI callouts?
>>> 
>>> This may be useful at least where the FFI use is more about reuse of
>>> existing functionality than about performance.
>>> Or at least useful while someone is learning/experimenting with FFI for
>>> the first time or while becoming familiar with some external library.
>>> Further info at [2] & [3].
>> 
>> I think there’s a much simpler improvement that doesn’t go this far.  I implemented it in VisualWorks and it’s been in production for more than a decade.  It should be easy to add to Cog.
>> 
>> The idea is simply to add a flag that tracks if the VM is in an FFI call or not and to test this flag in the VM’s exception handlers for SIGBUS, SIGILL, SIGSEGV and their equivalents on Windows.  The exception handlers then respond when in an FFI call by failing the FFI call primitive, answering a primitive fail code that includes the exception information.  Recently we extended Cog’s failure codes to allow a structured object (I font have the details handy; I’ll check soon).  In this case we need a pc and/or address and an exception code.
>> 
>> Would this approach satisfy you?
> 
> That sounds good.  Although the argument I've seen is that a memory
> access error
> means you "cant recover because you don't know what may have been corrupted"
> I think its worthwhile to be optimistic that the Image may last a bit
> longer to get more information about what call from the Image invoked
> the FFI failure.
> And if you've been notified (e.g. via Growl message) you can still
> take steps to move to a new Image if the current one is suspect.

While it’s possible that an FFI call could damage the Smalltalk heap and VM state it’s often not the case do one wants to be able to at least reap the error code and hence identify where the error occurred, inspect the arguments to the call, etc.  It’s hence about having enough functionality to gather what information is available from outside the call, not about being able to continue for a long time after.

> I guess you'd want to be able to turn it off for native level debugging,
> and for critical production applications where its judged better to
> crash than continue.

Perhaps, but debuggers like gdb often give one the control one needs to stop at an exception before it is delivered.

> Also, the approach you suggest would be a pre-requisite for what I
> suggested anyway,
> and make it easier to later experiment with MPKs.

Cool.

> Let me know what I can do to help (probably more capable on the testing side).

Will do.  If you’re happy with C programming and the simulator part of it is adding a global flag variable, setting and u setting it in FFI calls and callbacks, and then responding to the flag in the exception handler.  The tricky bit is arranging failure.  That I can work on when time allows.

> 
> cheers -ben
> 
>>> [1] https://lwn.net/Articles/643797/
>>> [2] http://man7.org/linux/man-pages/man7/pkeys.7.html
>>> [3] https://lwn.net/Articles/689395/