[squeak-dev] Loading FFI is broken

Andres Valloud avalloud at smalltalk.comcastbiz.net
Wed Nov 20 03:00:38 UTC 2013


There are other points of view worth considering.  Let's require that 
the resulting system works correctly, and backtrack from there to 
determine how to achieve that goal.

Sometimes, such as with Single Unix Specification / POSIX sockets, it is 
*impossible* to use an FFI correctly because the standard is such that 
using an FFI cannot be guaranteed to produce correct results.  Another 
way of saying the same thing is that you can use an FFI, as long as you 
don't care about the presence of undefined behavior in the general case.

(note that "undefined behavior" is specification language short hand for 
"execute arbitrary instructions", basically.  Usually this results in a 
segfault, but data corruption and security holes are possible too)

 > Show me how you can replace the SocketPlugin with FFI, and
 > I'll consider it. ;)

Specifically, SUS / POSIX sockets rely on partially specified structs 
that can change size, field, and field order from Unix to Unix. 
Moreover, the functions you'd call using those structs as arguments can 
be defined as macros.  Even trivial things like malloc() can be macros. 
  It's impossible to use those kinds of APIs in a sane manner from an 
FFI.  Theoretically it's conceivable, but at the cost of breaking C's 
encapsulation mechanism, thus making the whole application non portable 
across SUS / POSIX compliant implementations.  If one wanted to go that 
route, keep in mind the resulting never ending maintenance homework is 
extremely time consuming, and the application's behavior cannot ever be 
proven correct.  In real life, the FFI approach to these APIs means 
applications are not rationally supportable due to undefined behavior.

Speaking of symlinks, the function-like-things symlink() and stat() can 
also be macros as per SUS / POSIX.  So, even if there was a function 
called "symlink" you could find via dlsym() or an equivalent, it's 
*unsafe* to assume you can use an FFI to call that something called 
"symlink" and produce the same effect as writing "symlink" in a C source 
file that is given to a C compiler.

This problem has already been satisfactorily addressed in the form of a 
C compiler and a properly configured compilation environment producing 
primitives (or things equivalent to primitives), such that you write 
something like

	make fooPrimitivesOrBarPlugin

and in O(1 second) you have something that could possibly work 
correctly.  Note that I mean "correctly" as in

	"if it doesn't work, then it's conceivable you can file a well 
documented bug report with the maintainer after a modest amount of effort",

as opposed to

	"send the author a circumstantial account to the effect that after 
looking at random .h files with a random (perhaps human) .h file parser, 
using binaries compiled with random optimization switches on a random 
machine, and violating the relevant specification that describes the 
rational use of the feature in question, the resulting application fails 
due to an unspecified cause --- help!".

For some reason, code maintainers tend to pay attention to the former 
and ignore the latter.

In short, an issue with these types of FFIs is that all too often they 
merely *appear* to work.  The only rational usage model for some (most?) 
of the APIs mentioned in this thread involves a C compiler, which in 
practice means a C primitive or a C plugin.

The above points, argued strictly on technical grounds, are not intended 
to "cause a confrontation" or to "negate benefits of FFIs and plugins". 
  I just strongly care that applications Work(TM).  That goal sometimes 
implies dealing with SUS / POSIX (or, gasp, MSDN) and a C compiler. 
Maybe it's not necessarily the most enjoyable activity, but at least 
then the C stuff will be used as intended.  The alternative is non stop 
stochastic crashes preventing everyone's progress.

... my 2 cents...

On 11/19/13 10:35 , Eliot Miranda wrote:
> Hi All,
>
>      this is an important discussion that is taking a religious tone
> that we should strive to avoid.  There are good arguments for plugins,
> namely security and encapsulation.  There are good arguments for an FFI,
> namely extensibility and platform compatibility.
>
> Plugins provide security because they allow the system to control any
> and all access to the underlying platform, permitting access only
> through plugins.  With an FFI the underlying platform is exposed and one
> needs other mechanisms, for example Newspeak mirrors, to prevent
> untrusted code from accessing the platform with potentially disastrous
> effects (self shell: '/bin/rm -rf /*').
>
> Plugins encapsulate all sorts of details behind a potentially simple
> primitive interface.  This can avoid confusing the newcommer (but at the
> same time frustrate them by hiding details), provide portability, can
> make it easier to determine the extent of work in moving to a new OS
> platform, and so on.
>
> An FFI allows immediate extensibility.  External functionality can be
> invoked immediately.  With plugins a primitive interface must be
> designed and then implemented. With the FFI the API is already defined;
> it must "merely" be accessed.  This immediacy can itself provide
> simplicity, especially where callbacks and threads are involved.
>   Plugins can hide a lot of complexity (e.g. the SocketPlugin
> encapsulates platform threads that are waiting on blocking calls so that
> Squeak itself is provided with an interrupt-driven interface,
> necessitated by the Squeak platform's lack of native thread support).
>
> An FFI allows all underlying functionality to be accessed.  The plugin
> approach necessitates defining a lowest common denominator approach to
> functionality, especially irksome in some applications where setting the
> right flag, e.g. on a socket stream, can have a significant performance
> impact.
>
> So there are good arguments either way.  In a system oriented towards
> safe play plugins make excellent sense.  In a platform oriented towards
> industrial development an FFI is a must-have, and a weak one will really
> hurt acceptance.
>
> IMO Squeak needs to have both.  It needs plugins to provide its
> hallmarks such as eToys.  But to be a more general platform it needs an
> FFI.  Managing this split personality will take work but I don't see any
> fundamental issues.  Having a well-factored base into which packages can
> be loaded to create different personalities is key, and good work is
> being done here.  There may be a half-way house where the FFI is
> strictly encapsulated, but this is hypothetical.  I know how to solve
> threads, pinning, etc, but I don't know off the top of my head how to
> encapsulate the FFI, so I can't propose it as a solution.
>
> A number of straw men have been raised against the FFI in this
> discussion.  OK, that's unfair.  A number of important questions have
> been asked of the FFI in this discussion.
>
> Levente asks "Show me how you can replace the SocketPlugin with FFI, and
> I'll consider it. ;)".
> The issue here is threads.  The SocketPlugin encapsulates blocking
> calls, spawning hidden OS threads to make these calls and then signal
> semaphores when they complete.  To solve this one needs both native
> thread support in the VM (and I have a prototype that needs Spur's
> facilities to make practicable) and pinning (the ability to stop certain
> objects moving).  Spur provides pinning.
>
> David says "I remember when somebody on the Pharo list suggested
> reimplementing the
> OSProcessPlugin in FFI. I told them it was a really great idea, and they
> should give it a try. That settled the matter quite quickly ;-)".  Again
> they failed because of the lack of necessary underlying functionality
> from the VM.  With threads, pinning and a way of expressing the array of
> pointers to strings idiom (a simple extension to marshalling, and/or
> pinning, e.g. provide an address of first field primitive) an FFI can do
> all the OSProcessPlugin can do and significantly simpler.
>
> David also says "it is a complete mystery to me why people are willing
> to work so hard to avoid writing a VM plugin. VM plugins are reliable,
> portable, and debuggable. They work across a range of processors. They
> work on 64-bit platforms. So why would someone prefer to switch to a
> calling interface that basically only works on 32-bit Intel processors
> and that may require low level knowledge of calling conventions, word
> alignment, and platform-specific data types?"
>
> This is a non-sequitur.  The sentences beginning "So why would
> someone..." don't follow from the first sentences.  Writing the plugin
> requires even more knowledge than writing the FFI interface because one
> needs to know the VM facilities for mating Squeak objects to plugins.
>   Writing plugins /and/ writing interfaces above FFIs are hard.  But in
> my experience a powerful FFI provides a faster and easier development
> experience.  Both can be difficult to port, but plugins have the
> advantage that only the innards have to be ported while facing the C
> code face.  My experience in that regard leaves me with a preference for
> FFIs.  The lack of a 64-bit FFI is a bad weakness of the Squeak
> platform, something Spur again makes easy to rectify.
>
> Bert asks "Suppose we add a new VM platform, like a VM running on
> JavaScript in the browser. Do you really want to re-implement all the C
> libraries utilized via FFI? Or rather a handful of primitives in your
> language of choice?".  First it is not clear that one *can* implement
> these primitives taking either approach.  If the platform, e.g.
> JavaScript in a browser, takes the Squeak plugin approach of preventing
> access to the platform except through a restricted set of facilities,
> then certain functionality will simply be off-limits, whether one has an
> FFI or not.  Second, reimplementing all the C libraries isn't
> obligatory.  If the platform provides an FFI one simply mates to its FFI
> and accesses the underlying libraries.  If it doesn't then that
> functionality is off-limits, but that doesn't mean the rest of the
> system doesn't work.  It also means that Squeak running in that context
> is no less useful than any other platform, because the underlying
> platform (just as Squeak does with plugins)
>
> --
> best,
> Eliot


More information about the Squeak-dev mailing list