[Vm-dev] Cog status & FFI directions [was rearchitecting the FFI implementation for reentrancy]

Eliot Miranda eliot.miranda at gmail.com
Thu Aug 6 18:00:26 UTC 2009


On Thu, Aug 6, 2009 at 10:29 AM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> 2009/8/6 Eliot Miranda <eliot.miranda at gmail.com>:
> >
> >
> >
> > 2009/8/6 Göran Krampe <goran at krampe.se>
> >>
> >> Hi Eliot and all!
> >>
> >> Eliot Miranda wrote:
> >>>
> >>> Hi All,
> >>>    I'm looking at making the Squeak FFI reentrant to support nested
> calls
> >>> and possibly threading.  The current FFI has a couple of issues which
> render
> >>> it non-reentrant.
> >>
> >> The tech stuff is over my head, but I do have three questions related to
> this:
> >>
> >> 1. What about Alien? Shouldn't we try to move towards Alien instead of
> current FFI? Or is that too much work at this point?
> >
> > I intend to merge Alien into the current FFI to allow the current FFI to
> marshal Aliens.  Aliens are fine for modelling external data but the Alien
> FFI call-out mechanism is a little too naive for general use  It works well
> on x86 but has issues on anything with an exotic calling convention (passes
> arguments in integer and/or floating-point registers).  And see the next
> point about callbacks.
> >>
> >> 2. Callbacks has been a sore point in Squeak for a long time. AFAIK
> there is a patch available on www.squeakgtk.org/wiki/download.html, not
> sure what it does or if it is the original patch from Andreas when wxSqueak
> was being built. wxSqueak had a patched VM I recall. Perhaps that stuff is
> not related.
> >
> > One thing that IMO is much better about Alien is the callback mechanism
> which allows one effectively to pass function pointers to blocks.  The
> current FFI's callback mechanism is weak.  It simply does a process switch
> away form the process calling out and requires further work in the image,
> e.g. a process waiting on a semaphore that is signalled by external code, to
> then collect information for performing the callback. So adding in the Alien
> callback mechanism is also something I intend to do.
> >
> >>
> >> 3. Could we possibly ask for a status update on Cog and related
> activities? We are itching for news! :) Also curious about your interest in
> Factor and its lower bowels (definitely cool stuff going on there).
> >
> > The status is as follows.
> > The Cog stack VM os being reviewed for release to the community.  We hope
> to have this done soon, certainly before the end of September, but we're
> busy and this isn't on the critical path.  Once it is released there will
> have to be some integration and merge activities before it is part of the
> standard VMs because we have effectively forked (although not a lot).
> > The first incarnation of the Cog JIT is complete (for x86 only) and in
> use at Qwaq.  We are gearing up for a new server release and the Cog VM is
> the Vm beneath it.  The next client release will include it also.  This VM
> has a naive code generator (every push or pop in the bytecode results in a
> push or pop in machine code) but good inline cacheing.  Performance is as
> high as 5x the current interpreter for certain computer-language-shootout
> benchmarks.  The naive code generator means there is poor loop performance
> (1 to: n do: ... style code can be 4 times slower than VisualWorks) and the
> object model means there is no machine code instance creation and no machine
> code at:put: primitive.  But send performance is good and block activation
> almost as fast as VisualWorks.  In our real-world experience we were last
> week able to run almost three times as many Qwaq Forums clients against a QF
> server running on the Cog VM than we were able to above the interpreters.
>  So the Cog JIT is providing significant speedups in real-world use.
> > I am (clearly) looking at FFI issues right now.  In the Autumn I intend
> to start work on a less naive code generator, a better object model and a
> faster garbage collector, the three of which should raise performance levels
> to VisualWorks levels, i.e. a further 2x to 3x increase over the 4x - 5x
> already achieved for pure Smalltalk execution.
>
> Yes, an FFI is heavily used in Croquet (and Qwaq Forums, i suppose) to
> render graphics using OpenGL. So it is critical for high performance.
> Btw, do you plan to use JIT for generating a callout code?


Eventually yes.  IMO this is the best way to go to get a correct and
portable FFI.  ABIs like x86-64 sysV are too complicated to interpret
efficiently and very complicated to implement in a low-level language.  I
think the right architecture is one where the FFI compiler is written in
Smalltalk and lives in the image.  When the image starts up on a different
platform all the FFI callout methods have their generated code flushed.  The
first time an FFI method is invoked the invocation will fail because there
is no generated code.  e.g. one writes call-outys thusly:

ffiPrintString: aString
        <*cdecl:* char* 'ffiPrintString' (char *) *error:* errorCode>
        ^self externalCallFailedWith: errorCode

A call failing due to no code will return e.g. #'need to compile code' or
perhaps simply #'not yet linked'.  externalCallFailedWith: then invokes the
ABI compiler to compile the FFI spec to some sort of abstract register
transfer language, looks up the function name, stores the info in the
ExternalFunction which, as it is now, is the method's first literal, and
retries the invocation.  The JIT then translates the RTL into actual machine
code and executes it.

One may need an additional layer which is Smalltalk code that exists to
coerce arguments, raising errors for arguments that can't be coerced.  e.g.
ffiPrintString: aString
        ^self ffiPrintStringInner: aString asNullTerminatedCString

ffiPrintStringInner: aString
        <*cdecl:* char* 'ffiPrintString' (char *) *error:* errorCode>
        ^self externalCallFailedWith: errorCode

This kind of approach can move much of the complexity up into Smalltalk
where it can be mastered, and the system extended on the fly, leaving the
lower-level VM the simpler task of generating platform-specifics.  In
particular, lifting the dll/module searching machinery up into the image is
a good idea.

I also like the following idea for accessing platform-specific constants.  I
implemented a prototype of this for VisualWorks but it hasn't been deployed
yet.

"For example, we want to move the socket layer out of the VM almost
entirely.  To do this the VI must be able to reference the correct values
for defines such as O_NONBLOCK which have an annoying habit of having
different values on different unix variants.  One way to do this is to have
the VI spit out a C file containing a table of all the constants it needs,
name to value.  This gets compiled into a dll on each platform and loaded to
retrieve the relevant values.  When developing the VI will need to get hold
of new values, and so new versions of the dll will need to get spat out,
compiled and reloaded.  Again providing recompilation as a service would
enable users to deploy across platforms for which they have no C compiler."

My VW prototype generated a C file from a shared pool.  e.g. here's a
snippet of an autogenerated socketconstants.c
*#include <sys/types.h>*
*#include <sys/socket.h>*
*#include <netinet/in.h>*
*#include <netinet/tcp.h>*
*typedef struct {*
*            char * name;*
*            int value, flags;*
*        } constantTable;*
*constantTable constants[] = {*
*{"AF_DECnet", (int)*
*#ifdef AF_DECnet*
*AF_DECnet, 1},*
*#else*
*0, 0},*
*#endif*
*{"AF_FILE", (int)*
*#ifdef AF_FILE*
*AF_FILE, 1},*
*#else*
*0, 0},*
*#endif*

where flags indicated the size of the field amongst other things.

These files get compiled to dlls which can be loaded and inspected by the
image.  To do a portable distribution of an application one needs to deploy
a dlls  for each platform.  But one only needs to compile it when the set of
constants changes.  The image therefore only loads the constants dlls when
it finds it is starting up on a different operating system than that it was
saved upon.  So a shrink-wrap application for a specific platform need not
be deployed with the constants dll.

My VW prototype automagically generated and compiled the dll when constants
were added to the shared pool.  The C compiler was invoked automatically
using VW's equivalent of OSProcess.  One could imagine providing
compilation-as-a-service or a central library of these constant dlls so that
developers and application deployers didn't need to have the C compiler for
all platforms upon which they wish to deploy.



> > I expect we'll be in a position to release some version of the Cog JIT to
> the community by Christmas.
> > I'll be giving a guided tour of the current Cog JIT VM at SqueakFest LA
> on Monday.
> >>
> >>
> >> regards, Göran
> >
> > Best
> > Eliot
> >
> >
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20090806/1343173a/attachment-0001.htm


More information about the Vm-dev mailing list