[squeak-dev] Loading FFI is broken

Wed Nov 20 03:13:02 UTC 2013

On Tue, Nov 19, 2013 at 7:00 PM, Andres Valloud <
avalloud at smalltalk.comcastbiz.net> wrote:

> There are other points of view worth considering.  Let's require that the
> resulting system works correctly, and backtrack from there to determine how
> to achieve that goal.
>
> Sometimes, such as with Single Unix Specification / POSIX sockets, it is
> *impossible* to use an FFI correctly because the standard is such that
> using an FFI cannot be guaranteed to produce correct results.  Another way
> of saying the same thing is that you can use an FFI, as long as you don't
> care about the presence of undefined behavior in the general case.
>
> (note that "undefined behavior" is specification language short hand for
> "execute arbitrary instructions", basically.  Usually this results in a
> segfault, but data corruption and security holes are possible too)
>
>
> > Show me how you can replace the SocketPlugin with FFI, and
> > I'll consider it. ;)
>
> Specifically, SUS / POSIX sockets rely on partially specified structs that
> can change size, field, and field order from Unix to Unix. Moreover, the
> functions you'd call using those structs as arguments can be defined as
> macros.  Even trivial things like malloc() can be macros.  It's impossible
> to use those kinds of APIs in a sane manner from an FFI.

That's not so. I came up with a scheme and implemented a prototype for VW.
 All one need do is generate a wrapper and compile it on the platform.  One
can autogenerate and autocompile the wrapper.  The wrapper can either be
something that outputs metadata interpreted by the image or something that
actually wraps the platform functions.  If it can be called from C then,
with a little ingenuity, it an be called through an FFI.  An FFI is not
just a marshaller.

I would argue that in fact the best way to deal with differing UNIX
implementations is this approach.  For example, ioctl defines, socket
constant defines, struct layouts, etc, etc all differ markedly between UNIX
implementations, and hence one easy way to extract exact information is to
generate, compile and either run or load a program that reveals the
implementation details.

> Theoretically it's conceivable, but at the cost of breaking C's
> encapsulation mechanism, thus making the whole application non portable
> across SUS / POSIX compliant implementations.  If one wanted to go that
> route, keep in mind the resulting never ending maintenance homework is
> extremely time consuming, and the application's behavior cannot ever be
> proven correct.  In real life, the FFI approach to these APIs means
> applications are not rationally supportable due to undefined behavior.
>
> Speaking of symlinks, the function-like-things symlink() and stat() can
> also be macros as per SUS / POSIX.  So, even if there was a function called
> "symlink" you could find via dlsym() or an equivalent, it's *unsafe* to
> assume you can use an FFI to call that something called "symlink" and
> produce the same effect as writing "symlink" in a C source file that is
> given to a C compiler.
>
> This problem has already been satisfactorily addressed in the form of a C
> compiler and a properly configured compilation environment producing
> primitives (or things equivalent to primitives), such that you write
> something like
>
>         make fooPrimitivesOrBarPlugin
>
> and in O(1 second) you have something that could possibly work correctly.
>  Note that I mean "correctly" as in
>
>         "if it doesn't work, then it's conceivable you can file a well
> documented bug report with the maintainer after a modest amount of effort",
>
> as opposed to
>
>         "send the author a circumstantial account to the effect that after
> looking at random .h files with a random (perhaps human) .h file parser,
> using binaries compiled with random optimization switches on a random
> machine, and violating the relevant specification that describes the
> rational use of the feature in question, the resulting application fails
> due to an unspecified cause --- help!".
>
> For some reason, code maintainers tend to pay attention to the former and
> ignore the latter.
>
> In short, an issue with these types of FFIs is that all too often they
> merely *appear* to work.  The only rational usage model for some (most?) of
> the APIs mentioned in this thread involves a C compiler, which in practice
> means a C primitive or a C plugin.
>
> The above points, argued strictly on technical grounds, are not intended
> to "cause a confrontation" or to "negate benefits of FFIs and plugins".  I
> just strongly care that applications Work(TM).  That goal sometimes implies
> dealing with SUS / POSIX (or, gasp, MSDN) and a C compiler. Maybe it's not
> necessarily the most enjoyable activity, but at least then the C stuff will
> be used as intended.  The alternative is non stop stochastic crashes
> preventing everyone's progress.
>
> ... my 2 cents...
>
>
> On 11/19/13 10:35 , Eliot Miranda wrote:
>
>> Hi All,
>>
>>      this is an important discussion that is taking a religious tone
>> that we should strive to avoid.  There are good arguments for plugins,
>> namely security and encapsulation.  There are good arguments for an FFI,
>> namely extensibility and platform compatibility.
>>
>> Plugins provide security because they allow the system to control any
>> and all access to the underlying platform, permitting access only
>> through plugins.  With an FFI the underlying platform is exposed and one
>> needs other mechanisms, for example Newspeak mirrors, to prevent
>> untrusted code from accessing the platform with potentially disastrous
>> effects (self shell: '/bin/rm -rf /*').
>>
>> Plugins encapsulate all sorts of details behind a potentially simple
>> primitive interface.  This can avoid confusing the newcommer (but at the
>> same time frustrate them by hiding details), provide portability, can
>> make it easier to determine the extent of work in moving to a new OS
>> platform, and so on.
>>
>> An FFI allows immediate extensibility.  External functionality can be
>> invoked immediately.  With plugins a primitive interface must be
>> designed and then implemented. With the FFI the API is already defined;
>> it must "merely" be accessed.  This immediacy can itself provide
>> simplicity, especially where callbacks and threads are involved.
>>   Plugins can hide a lot of complexity (e.g. the SocketPlugin
>> encapsulates platform threads that are waiting on blocking calls so that
>> Squeak itself is provided with an interrupt-driven interface,
>> necessitated by the Squeak platform's lack of native thread support).
>>
>> An FFI allows all underlying functionality to be accessed.  The plugin
>> approach necessitates defining a lowest common denominator approach to
>> functionality, especially irksome in some applications where setting the
>> right flag, e.g. on a socket stream, can have a significant performance
>> impact.
>>
>> So there are good arguments either way.  In a system oriented towards
>> safe play plugins make excellent sense.  In a platform oriented towards
>> industrial development an FFI is a must-have, and a weak one will really
>> hurt acceptance.
>>
>> IMO Squeak needs to have both.  It needs plugins to provide its
>> hallmarks such as eToys.  But to be a more general platform it needs an
>> FFI.  Managing this split personality will take work but I don't see any
>> fundamental issues.  Having a well-factored base into which packages can
>> be loaded to create different personalities is key, and good work is
>> being done here.  There may be a half-way house where the FFI is
>> strictly encapsulated, but this is hypothetical.  I know how to solve
>> threads, pinning, etc, but I don't know off the top of my head how to
>> encapsulate the FFI, so I can't propose it as a solution.
>>
>> A number of straw men have been raised against the FFI in this
>> discussion.  OK, that's unfair.  A number of important questions have
>> been asked of the FFI in this discussion.
>>
>> Levente asks "Show me how you can replace the SocketPlugin with FFI, and
>> I'll consider it. ;)".
>> The issue here is threads.  The SocketPlugin encapsulates blocking
>> calls, spawning hidden OS threads to make these calls and then signal
>> semaphores when they complete.  To solve this one needs both native
>> thread support in the VM (and I have a prototype that needs Spur's
>> facilities to make practicable) and pinning (the ability to stop certain
>> objects moving).  Spur provides pinning.
>>
>> David says "I remember when somebody on the Pharo list suggested
>> reimplementing the
>> OSProcessPlugin in FFI. I told them it was a really great idea, and they
>> should give it a try. That settled the matter quite quickly ;-)".  Again
>> they failed because of the lack of necessary underlying functionality
>> from the VM.  With threads, pinning and a way of expressing the array of
>> pointers to strings idiom (a simple extension to marshalling, and/or
>> pinning, e.g. provide an address of first field primitive) an FFI can do
>> all the OSProcessPlugin can do and significantly simpler.
>>
>> David also says "it is a complete mystery to me why people are willing
>> to work so hard to avoid writing a VM plugin. VM plugins are reliable,
>> portable, and debuggable. They work across a range of processors. They
>> work on 64-bit platforms. So why would someone prefer to switch to a
>> calling interface that basically only works on 32-bit Intel processors
>> and that may require low level knowledge of calling conventions, word
>> alignment, and platform-specific data types?"
>>
>> This is a non-sequitur.  The sentences beginning "So why would
>> someone..." don't follow from the first sentences.  Writing the plugin
>> requires even more knowledge than writing the FFI interface because one
>> needs to know the VM facilities for mating Squeak objects to plugins.
>>   Writing plugins /and/ writing interfaces above FFIs are hard.  But in
>> my experience a powerful FFI provides a faster and easier development
>> experience.  Both can be difficult to port, but plugins have the
>> advantage that only the innards have to be ported while facing the C
>> code face.  My experience in that regard leaves me with a preference for
>> FFIs.  The lack of a 64-bit FFI is a bad weakness of the Squeak
>> platform, something Spur again makes easy to rectify.
>>
>> Bert asks "Suppose we add a new VM platform, like a VM running on
>> JavaScript in the browser. Do you really want to re-implement all the C
>> libraries utilized via FFI? Or rather a handful of primitives in your
>> language of choice?".  First it is not clear that one *can* implement
>> these primitives taking either approach.  If the platform, e.g.
>> JavaScript in a browser, takes the Squeak plugin approach of preventing
>> access to the platform except through a restricted set of facilities,
>> then certain functionality will simply be off-limits, whether one has an
>> FFI or not.  Second, reimplementing all the C libraries isn't
>> obligatory.  If the platform provides an FFI one simply mates to its FFI
>> and accesses the underlying libraries.  If it doesn't then that
>> functionality is off-limits, but that doesn't mean the rest of the
>> system doesn't work.  It also means that Squeak running in that context
>> is no less useful than any other platform, because the underlying
>> platform (just as Squeak does with plugins)
>>
>> --
>> best,
>> Eliot
>>
>
>

-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20131119/64f266d6/attachment-0001.htm