[Pharo-project] [Vm-dev] Re: Can OSProcess functionality be implemented using FFI instead of plugin?

Mariano Martinez Peck marianopeck at gmail.com
Sun Jan 17 15:23:39 UTC 2016


Hi Eliot,

Thanks, much clearer now. Sometimes I am slow :)    I was confused because
I was only thinking in libc kind of lib (very kernel and very likely used
by the VM). But when you gave the SQL example, then I did get the general
nature you were trying to explain. So it's clear now.

I would like to add 2 more comments:

1) Do you agree that besides the name / value it would also help having the
result of  sizeof ? Otherwise, I may still find problems when I need to
allocate from FFI and it's not clear size of a struct (as it was my case
same days ago).  So in this case, it would be kind of an array rather than
a key / value pairs.

2) As for the autogenerated C file, do you think X Macros is a good idea?
See
http://stackoverflow.com/questions/264269/what-is-a-good-reference-documenting-patterns-of-use-of-x-macros-in-c-or-possib/265560#265560


Thanks,


On Sun, Jan 17, 2016 at 12:40 AM, Eliot Miranda <eliot.miranda at gmail.com>
wrote:

> Hi Mariano,
>
> On Sat, Jan 16, 2016 at 6:25 PM, Mariano Martinez Peck <
> marianopeck at gmail.com> wrote:
>
>>
>>
>> On Sat, Jan 16, 2016 at 11:02 PM, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Jan 16, 2016 at 6:00 AM, Mariano Martinez Peck <
>>> marianopeck at gmail.com> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> Sorry for reviving an old thread but I thought it was better to
>>>> continue the discussion here because of the context.
>>>> As you may have read, the other day I released a first approeach to a
>>>> subset of OSProcess based on FFI (posix_spwan() family of functions):
>>>>
>>>> https://github.com/marianopeck/OSSubprocess
>>>>
>>>>  And with that in mind, I wanted to share a few things with you. The
>>>> main 2 problems I found with implementing this with FFI was:
>>>>
>>>> 1) We have all already agree and discussed that fork+exec cannot be
>>>> done in separate FFI calls. So at the very min you need either a plugin
>>>> method that does the fork()+exec() OR wrapping a lib like posix_spwan()
>>>>
>>>> 2) The other main problem, is, as you all said (and mostly  Nicolas),
>>>> is the problems with the preprocessor (constants, macros, etc).
>>>>
>>>> With all that said, I was able to get my stuff working. However, I am
>>>> still using some primitives of OSProcess plugin because of 2).
>>>>
>>>> I read Eliot idea and what I don't like is the need of a C compiler in
>>>> the user machine. I think that's a high constrain. Then Igor suggested that
>>>> WE (developers and maintainers of a certain tool) are the ones that
>>>> compiles the little C program to extract constant values etc and then WE
>>>> provide as part of our source code, some packages with some SharedPool
>>>> depending on the platform/OS. And Igor approach looked a bit better to me.
>>>>
>>>
>>>
>>>
>>> You misunderstand the proposal.
>>>
>>
>> I think I did. But let me confirm that below ;)
>>
>>
>>> The C compiler is needed /only when changing the set of constants/, i.e.
>>> when /developing/ the interface.  The C compiler is /not/ needed when
>>> deploying.
>>>
>>> The idea is to
>>> a) at development time, e.g. when a new variable is added to a
>>> SharedPool containing platform constants, a C program is autogenerated that
>>> outputs in some format a description of the names and values of all the
>>> constants defined in the pool.  One convenient notation is e.g. STON.  For
>>> the purposes of this discussion let's assume we're using ston, but any
>>> format the image an parse (or indeed a shared object the image can load on
>>> teh current pkatform) will do.  The output of the autogenerated C program
>>> would be called something like <SharedPoolName>.<PlatformName>.ston, e.g.
>>> UnixConstants.MacOSX64.ston or UnixConstants.Linux32.ston.  The ston files
>>> can easily be parsed by facilities in the Smalltalk image.
>>>
>>> b) when deploying the system to a set of platforms one includes all the
>>> relevant platform-specific ston files.
>>>
>>>
>> OK. But let me ask something. Below you said "be it a plugin or a dll
>> doesn't matter". To autogenerate the C program, I must know which header
>> files to include for each platform and probably a few others things. For
>> example, besides exporting the value,  I would also like to export the
>> sizeof(). At that depends how was the VM compiled, right?   So...my
>> question is...if such a autogenerated C code could be part of the VM
>> building (considering all the settings being assume when building), cannot
>> I reuse the knowledge the VM already has? Like which header files
>> to include, if it was compiled 32 bits or 64 bits, which C compiler to use,
>> etc..
>>
>
> I actually said that using text is easier than a dll.  So I'm saying
>  autogenerate a C program that outputs name-value pairs in some convenient
> textual representation, e.g. ston.  But answering your question...
>
> The knowledge in the VM as to what header files are included *applies only
> to the include files the VM uses*.  The VM uses a subset of the platform.
> It doesn't for example include any headers that define a database
> interface.  It doesn't include header files that define the interface to a
> UI tooklit such at GTK.  Etc, etc.  So in fact the VM *doesn't* include the
> knowledge one needs to determine the set of include files for an arbitrary
> FFI interface.  And even so, the include files that it does use are in the
> VM's platform source files, and that information is not readily accessible.
>
> Let me summarise.  No, the VM cannot be used to determine the set of
> include files needed to generate constants used in an arbitrary FFI
> interface.
>
> What I mean is if it would be easier if I take the SharedPool at VM
>> building time, and from there I autogenerate (and run) the C code that
>> would generate the output. Then, when we "deploy" the VM, we can deploy it
>> with relevant platform specific ston files as you said.
>>
>
> No.  The VM is something that provides an FFI.  It doesn't *define* an
> FFI.   One must be able to develop an FFI interface without needing to
> rebuild the VM.  So computing the values of constants should be *separate*
> from building a VM.  Now let me give you more of an example.
>
> Let's say we define a subclass of SharedPool called FFISharedPool.
> FFISharedPool 's job is to manage autogenerating a C file, compiling it for
> the platform, and organizing parsing the relevant output.  Let's say we use
> a convention like class-side pragmas to define include files, and compiler
> flags.  The VM provides two crucial pieces of information:
>
> 1. the platform name
> 2. the word size
>
> One can't run a Mac OS VM on Linux, and one can't run a 64-bit VM on a
> 32-bit operating system.  So taking this information from the VM accurately
> tells the current system what ABI (application binary interface) to use,
> and that's what's important in generating the right constants.
>
> So we use these two pieces of information to index the method pragmas that
> tell us what specific files to include.
>
> Let's imagine we subclass FFISharedPool to add a shared pool for constants
> for an SQL database.  We might have a class declaration like
>
> FFISharedPool subclass: #MYSQLInterface
> instanceVariableNames: ''
> classVariableNames: 'MYSQL_DEFAULT_AUTH MYSQL_ENABLE_CLEARTEXT_PLUGIN
> MYSQL_INIT_COMMAND MYSQL_OPT_BIND MYSQL_OPT_CAN_HANDLE_EXPIRED_PASSWORDS
> MYSQL_OPT_COMPRESS
> MYSQL_OPT_CONNECT_ATTR_DELETE MYSQL_OPT_CONNECT_ATTR_RESET'
> poolDictionaries: ''
> category: 'MYSQLInterface-Pools'
>
> The job of FFISharedPool is to compute the right values for the class
> variables on every platform we want to deploy the MYSQL interface on.
>
> So we need to know the relevant include files and C flags for each
> platform/word-size combination.  A few of them might look like
>
>
> MYSQLInterface class methods for platform information
> mac32
>     "I describe the include files and C flags to use when developing a
> 32-bit MYSQL FFI interface on Mac OS X"
>     <platformName: 'Mac OS' wordSize: 4>
>     <cFlags: #('-m32') includeFiles: #('/opt/mysql/include32')>
>     ^self "all the info is in the pragmas"
>
> mac64
>     "I describe the include files and C flags to use when developing a
> 64-bit MYSQL FFI interface on Mac OS X"
>     <platformName: 'Mac OS' wordSize: 8>
>     <cFlags: #('-m64') includeFiles: #('/opt/mysql/include64')>
>
> The above might cause FFISharedPool to autogenerate files called
> MYSQLInterface.mac32.c & MYSQLInterface.mac64.c.  And these, when run,
> might output ston notation to MYSQLInterface.mac32.ston &
> MYSQLInterface.mac64.ston (or maybe to stdout which has to be redirected to
> MYSQLInterface.mac32.ston; whatever).
>
> Now, you might use pragmas, or you might answer a Dictionary instance.
> What ever style pleases you and seems convenient and readable.  But these
> methods define the necessary metadata (C flags, include paths, and ...?)
> for FFISharedPool to autogenerate the C program that, when compiled with
> the supplied C flags and run on the current platform, outputs the values
> for the constants the shared pool wants to define.
>
>
> You can get fancy and have FFISharedPool autogenerate the C programs
> whenever one adds or removes a constant name.  Or you can require the
> programmer run something, e.g. MYSQLInterface generateInterfaces.  It's
> really nice if FFISharedPool submits the file to the C compiler
> automatically, but this can only work for e.g. 32 & 64 bit versions on a
> single platform.  You have to compile the autogenerated program on the
> relevant platform, with the necessary libraries and include files installed.
>
> You could imagine a set of servers for different platforms so one could
> submit the autogenerated program for compilation and execution on each
> platform.  That's a facility I'd make it easy to implement.  I could
> imagine that a programmer whose company develops an FFI interface and
> deploys it on a number of platforms would love to be able to automate
> compiling and running the relevant autogenerated code on a set of servers.
> I could imagine the Pharo community providing a set of servers upon which
> lots of software is installed for precisely this purpose. That means that
> people could develop FFI interfaces without even having to have the C
> compiler installed on their platform.
>
> You could also add a C parser to FFISharedPool  that parses the
> post-preprocessed code and extracts function declarations.  But the
> important thing is autogenerating the C program so that it generates easily
> parsable output containing the values for the constants.  You can extend
> the system in interesting ways once you ave this core functionality
> implemented.
>
> So once the program is autogenerated and compiled for the current
> platform, it is run and its output collected in a file whose name can be
> recognised by FFISharedPool.
>
>
> Now the class side of FFISharedPool might be declared as
>
> FFIShardPool class
> instanceVariableNames: 'platformName wordSize'
>
> and on start-up FFIShardPool could examine its subclasses, and for each
> whose platformName & wordSize do not match the current platform, search for
> all the matching FOOInterface.plat.ston files, parse them and update the
> subclasses' variables, and update that pool's platformName & wordSize.  It
> could emit a warning on the Transcript or stdout (headful vs headless)
> indicating which subclasses it couldn't find the relevant
> FOOInterface.plat.ston files for.
>
> But the end result is that
>
> a) providing the system is deployed with FOOInterface.plat.ston files for
> each interface and platform used, a cross-platform application can be
> deployed *that does not require a C compiler*.
> b) providing that a system's FOOInterface files have been initialized on
> the intended platform, a platform-specific application can be deployed for
> a single platform *without needing the ston files*.
>
> Does this make more sense now?
>
> c) at startup the image checks its current platform.  If the platform is
>>> the same that it was saved on, no action is taken.  But if the platform as
>>> changed then the relevant ston file is selected, parsed, and the values for
>>> the variables in the shared pool updated to reflect the values of the
>>> current platform.
>>>
>>> So the C compiler is only needed when developing the interface, not when
>>> deploying it.
>>>
>>>
>> OK
>>
>>
>>>
>>>> Then Nicolas made a point that if we plan to manage all that complexity
>>>> at the image level it may become a hell too.
>>>>
>>>> So.... what if we take a simpler (probably not better) approach and we
>>>> consider the "c program that exports constants and sizes" a VM Plugin?
>>>> Let's say we have a UnixPreprocessorPlugin (that would work for OSX, Linux
>>>> and other's Unix I imagine for the time being) which provides a function
>>>> (that is exported) which answers an array of arrays. For each constant, we
>>>> include the name of the constant, the value, and the sizeof().  Then from
>>>> image side, we simply do one FFI call, we get the large array and we adapt
>>>> it to a SharedPool or whatever kind of object representing that info.
>>>>
>>>
>>>
>>>
>>> This is what I suggestred in teh first place.  That what is
>>> autogenerated is a shared object (be it a plgin or a dll doesn't matter, it
>>> is machine code generated by a C compiler form an autogenerated C program
>>> compiled with the platform's C compiler) that can be loaded at run-time and
>>> interrogated to fetch the values of a set of variables
>>>
>>
>> OK, got it. But still, it would be easier if the "platform" in this case
>> is the "machine where we build the VM we will then distribute" right? i
>> mean, I would like to put this in the CI jobs that automatically builds the
>> VM, and not myself building for each platform.
>>
>
> NO!  For example, why would a company that has some proprietary arithmetic
> package implemented in its secret labs in C or C++ and accessed through the
> FFI want to have that code on the Pharo community's build servers?
>
>
>>
>> *I mean, my main doubt is if this job of autogenerating C code, compile
>> it, run it, export text file, and distribute text file with the VM, could
>> be done as part of the VM building. *
>>
>
> For fuck's sake.  Developing an FFI is not something one does when
> building a VM.  It is something one does wen using the system.  f you want
> to do this you *use a plugin*.  The FFI is a different beast.  It is to
> allow programers to interface to external librarys that are *independent
> from teh VM*.
>
> I'm not going to answer this one again.  OK?
>
>
>
>>
>>
>>
>>> .  But I think that the textual notation suggested above is simpler.
>>> The test files are easier to distribute and change.  Shared objects and
>>> plugins have a habit of going stale, and there needs to be metadata in
>>> there to describe the set of constants etc, which is tricky to generate and
>>> parse because it is binary (pointer sizes, etc, etc).  Instead a simple
>>> textual format should be much more robust.  One could even edit by hand to
>>> add new constants.  It would be easy to make the textual file a versioned
>>> file.  Etc, etc.
>>>
>>>
>>
>> OK. Got it. And do you think using X Macros for the autogenerated C (from
>> the SharedPool) is a good idea?
>> And then I simply write a text file out of it.
>>
>>
>>
>>>
>>>> I know that different users will need different constants. But let's
>>>> say the infrastructure (plugin etc) is already done. And let's say I am a
>>>> user that I want to build something with FFI and I need some constants that
>>>> I see are not defined. Then I can simply add the ones I need in the plugin,
>>>> and next VM release will have those. If Cog gets moved to Github, then this
>>>> is even easier. Everybody can do a PR with the constants he needs. And in
>>>> fact, if we have the infrastructure in place, I think that we each of us
>>>> spend half an hour, we may have almost everything we need.
>>>>
>>>> For example, I can add myself all those for signals (to use kill() from
>>>> FFI), all those from fcntl (to make none blocking pipes), all those from
>>>> wait()/waitpid() family (so that I can do a waitpid() with WNOHANG), etc
>>>> etc etc.
>>>>
>>>> I know it's not the best approach but it's something that could be done
>>>> very easily and would allow A LOT of stuff to be moved to FFI just because
>>>> we have no access to preprocess constants or sizeof()  (to know how to
>>>> allocate). I also know this won't cover macros and other stuff. But still.
>>>>
>>>> If you think this is a good idea, I can spend the time to do it.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 10, 2012 at 10:09 AM, Nick Ager <nick.ager at gmail.com>
>>>> wrote:
>>>>
>>>>> <snip>
>>>>> Well, like opendbx, maybe because opengl has quite standard
>>>>> interface...
>>>>> </snip>
>>>>>
>>>>> and
>>>>>
>>>>> <snip>
>>>>> It's not that it's not doable, it's that we gonna reinvent gaz plant
>>>>> and it gonna be so boring...
>>>>> I'd like to see a proof of concept, even if we restrict to libc, libm,
>>>>> kernel.dll, msvcrt.dll ...
>>>>> </snip>
>>>>>
>>>>> <snip>
>>>>> Is the unix style select()
>>>>> ubiquitous or should I use WaitForMultipleObject() on Windows? Are
>>>>> specification of read/write streams implementation machine independant
>>>>> (bsd/sysv/others...)
>>>>> </snip>
>>>>>
>>>>> Perhaps *a* way forward is to try to find existing projects which have
>>>>> already created cross-platform abstractions for platform specific
>>>>> functionality. Then we can use FFI to access that interface in a similar
>>>>> way to OpenGL and OpenDBX. For example NodeJs works across unixes - perhaps
>>>>> they have a useful cross-platform abstraction, boost  has abstractions of
>>>>> IPC etc
>>>>>
>>>>> Nick
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>>
>>>>
>>>
>>>
>>> --
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>



-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160117/85c3f12b/attachment-0001.htm


More information about the Vm-dev mailing list