[Pharo-project] [Vm-dev] Re: Can OSProcess functionality be
implemented using FFI instead of plugin?
Mariano Martinez Peck
marianopeck at gmail.com
Sun Jan 17 15:23:39 UTC 2016
Thanks, much clearer now. Sometimes I am slow :) I was confused because
I was only thinking in libc kind of lib (very kernel and very likely used
by the VM). But when you gave the SQL example, then I did get the general
nature you were trying to explain. So it's clear now.
I would like to add 2 more comments:
1) Do you agree that besides the name / value it would also help having the
result of sizeof ? Otherwise, I may still find problems when I need to
allocate from FFI and it's not clear size of a struct (as it was my case
same days ago). So in this case, it would be kind of an array rather than
a key / value pairs.
2) As for the autogenerated C file, do you think X Macros is a good idea?
On Sun, Jan 17, 2016 at 12:40 AM, Eliot Miranda <eliot.miranda at gmail.com>
> Hi Mariano,
> On Sat, Jan 16, 2016 at 6:25 PM, Mariano Martinez Peck <
> marianopeck at gmail.com> wrote:
>> On Sat, Jan 16, 2016 at 11:02 PM, Eliot Miranda <eliot.miranda at gmail.com>
>>> On Sat, Jan 16, 2016 at 6:00 AM, Mariano Martinez Peck <
>>> marianopeck at gmail.com> wrote:
>>>> Hi all,
>>>> Sorry for reviving an old thread but I thought it was better to
>>>> continue the discussion here because of the context.
>>>> As you may have read, the other day I released a first approeach to a
>>>> subset of OSProcess based on FFI (posix_spwan() family of functions):
>>>> And with that in mind, I wanted to share a few things with you. The
>>>> main 2 problems I found with implementing this with FFI was:
>>>> 1) We have all already agree and discussed that fork+exec cannot be
>>>> done in separate FFI calls. So at the very min you need either a plugin
>>>> method that does the fork()+exec() OR wrapping a lib like posix_spwan()
>>>> 2) The other main problem, is, as you all said (and mostly Nicolas),
>>>> is the problems with the preprocessor (constants, macros, etc).
>>>> With all that said, I was able to get my stuff working. However, I am
>>>> still using some primitives of OSProcess plugin because of 2).
>>>> I read Eliot idea and what I don't like is the need of a C compiler in
>>>> the user machine. I think that's a high constrain. Then Igor suggested that
>>>> WE (developers and maintainers of a certain tool) are the ones that
>>>> compiles the little C program to extract constant values etc and then WE
>>>> provide as part of our source code, some packages with some SharedPool
>>>> depending on the platform/OS. And Igor approach looked a bit better to me.
>>> You misunderstand the proposal.
>> I think I did. But let me confirm that below ;)
>>> The C compiler is needed /only when changing the set of constants/, i.e.
>>> when /developing/ the interface. The C compiler is /not/ needed when
>>> The idea is to
>>> a) at development time, e.g. when a new variable is added to a
>>> SharedPool containing platform constants, a C program is autogenerated that
>>> outputs in some format a description of the names and values of all the
>>> constants defined in the pool. One convenient notation is e.g. STON. For
>>> the purposes of this discussion let's assume we're using ston, but any
>>> format the image an parse (or indeed a shared object the image can load on
>>> teh current pkatform) will do. The output of the autogenerated C program
>>> would be called something like <SharedPoolName>.<PlatformName>.ston, e.g.
>>> UnixConstants.MacOSX64.ston or UnixConstants.Linux32.ston. The ston files
>>> can easily be parsed by facilities in the Smalltalk image.
>>> b) when deploying the system to a set of platforms one includes all the
>>> relevant platform-specific ston files.
>> OK. But let me ask something. Below you said "be it a plugin or a dll
>> doesn't matter". To autogenerate the C program, I must know which header
>> files to include for each platform and probably a few others things. For
>> example, besides exporting the value, I would also like to export the
>> sizeof(). At that depends how was the VM compiled, right? So...my
>> question is...if such a autogenerated C code could be part of the VM
>> building (considering all the settings being assume when building), cannot
>> I reuse the knowledge the VM already has? Like which header files
>> to include, if it was compiled 32 bits or 64 bits, which C compiler to use,
> I actually said that using text is easier than a dll. So I'm saying
> autogenerate a C program that outputs name-value pairs in some convenient
> textual representation, e.g. ston. But answering your question...
> The knowledge in the VM as to what header files are included *applies only
> to the include files the VM uses*. The VM uses a subset of the platform.
> It doesn't for example include any headers that define a database
> interface. It doesn't include header files that define the interface to a
> UI tooklit such at GTK. Etc, etc. So in fact the VM *doesn't* include the
> knowledge one needs to determine the set of include files for an arbitrary
> FFI interface. And even so, the include files that it does use are in the
> VM's platform source files, and that information is not readily accessible.
> Let me summarise. No, the VM cannot be used to determine the set of
> include files needed to generate constants used in an arbitrary FFI
> What I mean is if it would be easier if I take the SharedPool at VM
>> building time, and from there I autogenerate (and run) the C code that
>> would generate the output. Then, when we "deploy" the VM, we can deploy it
>> with relevant platform specific ston files as you said.
> No. The VM is something that provides an FFI. It doesn't *define* an
> FFI. One must be able to develop an FFI interface without needing to
> rebuild the VM. So computing the values of constants should be *separate*
> from building a VM. Now let me give you more of an example.
> Let's say we define a subclass of SharedPool called FFISharedPool.
> FFISharedPool 's job is to manage autogenerating a C file, compiling it for
> the platform, and organizing parsing the relevant output. Let's say we use
> a convention like class-side pragmas to define include files, and compiler
> flags. The VM provides two crucial pieces of information:
> 1. the platform name
> 2. the word size
> One can't run a Mac OS VM on Linux, and one can't run a 64-bit VM on a
> 32-bit operating system. So taking this information from the VM accurately
> tells the current system what ABI (application binary interface) to use,
> and that's what's important in generating the right constants.
> So we use these two pieces of information to index the method pragmas that
> tell us what specific files to include.
> Let's imagine we subclass FFISharedPool to add a shared pool for constants
> for an SQL database. We might have a class declaration like
> FFISharedPool subclass: #MYSQLInterface
> instanceVariableNames: ''
> classVariableNames: 'MYSQL_DEFAULT_AUTH MYSQL_ENABLE_CLEARTEXT_PLUGIN
> MYSQL_INIT_COMMAND MYSQL_OPT_BIND MYSQL_OPT_CAN_HANDLE_EXPIRED_PASSWORDS
> MYSQL_OPT_CONNECT_ATTR_DELETE MYSQL_OPT_CONNECT_ATTR_RESET'
> poolDictionaries: ''
> category: 'MYSQLInterface-Pools'
> The job of FFISharedPool is to compute the right values for the class
> variables on every platform we want to deploy the MYSQL interface on.
> So we need to know the relevant include files and C flags for each
> platform/word-size combination. A few of them might look like
> MYSQLInterface class methods for platform information
> "I describe the include files and C flags to use when developing a
> 32-bit MYSQL FFI interface on Mac OS X"
> <platformName: 'Mac OS' wordSize: 4>
> <cFlags: #('-m32') includeFiles: #('/opt/mysql/include32')>
> ^self "all the info is in the pragmas"
> "I describe the include files and C flags to use when developing a
> 64-bit MYSQL FFI interface on Mac OS X"
> <platformName: 'Mac OS' wordSize: 8>
> <cFlags: #('-m64') includeFiles: #('/opt/mysql/include64')>
> The above might cause FFISharedPool to autogenerate files called
> MYSQLInterface.mac32.c & MYSQLInterface.mac64.c. And these, when run,
> might output ston notation to MYSQLInterface.mac32.ston &
> MYSQLInterface.mac64.ston (or maybe to stdout which has to be redirected to
> MYSQLInterface.mac32.ston; whatever).
> Now, you might use pragmas, or you might answer a Dictionary instance.
> What ever style pleases you and seems convenient and readable. But these
> methods define the necessary metadata (C flags, include paths, and ...?)
> for FFISharedPool to autogenerate the C program that, when compiled with
> the supplied C flags and run on the current platform, outputs the values
> for the constants the shared pool wants to define.
> You can get fancy and have FFISharedPool autogenerate the C programs
> whenever one adds or removes a constant name. Or you can require the
> programmer run something, e.g. MYSQLInterface generateInterfaces. It's
> really nice if FFISharedPool submits the file to the C compiler
> automatically, but this can only work for e.g. 32 & 64 bit versions on a
> single platform. You have to compile the autogenerated program on the
> relevant platform, with the necessary libraries and include files installed.
> You could imagine a set of servers for different platforms so one could
> submit the autogenerated program for compilation and execution on each
> platform. That's a facility I'd make it easy to implement. I could
> imagine that a programmer whose company develops an FFI interface and
> deploys it on a number of platforms would love to be able to automate
> compiling and running the relevant autogenerated code on a set of servers.
> I could imagine the Pharo community providing a set of servers upon which
> lots of software is installed for precisely this purpose. That means that
> people could develop FFI interfaces without even having to have the C
> compiler installed on their platform.
> You could also add a C parser to FFISharedPool that parses the
> post-preprocessed code and extracts function declarations. But the
> important thing is autogenerating the C program so that it generates easily
> parsable output containing the values for the constants. You can extend
> the system in interesting ways once you ave this core functionality
> So once the program is autogenerated and compiled for the current
> platform, it is run and its output collected in a file whose name can be
> recognised by FFISharedPool.
> Now the class side of FFISharedPool might be declared as
> FFIShardPool class
> instanceVariableNames: 'platformName wordSize'
> and on start-up FFIShardPool could examine its subclasses, and for each
> whose platformName & wordSize do not match the current platform, search for
> all the matching FOOInterface.plat.ston files, parse them and update the
> subclasses' variables, and update that pool's platformName & wordSize. It
> could emit a warning on the Transcript or stdout (headful vs headless)
> indicating which subclasses it couldn't find the relevant
> FOOInterface.plat.ston files for.
> But the end result is that
> a) providing the system is deployed with FOOInterface.plat.ston files for
> each interface and platform used, a cross-platform application can be
> deployed *that does not require a C compiler*.
> b) providing that a system's FOOInterface files have been initialized on
> the intended platform, a platform-specific application can be deployed for
> a single platform *without needing the ston files*.
> Does this make more sense now?
> c) at startup the image checks its current platform. If the platform is
>>> the same that it was saved on, no action is taken. But if the platform as
>>> changed then the relevant ston file is selected, parsed, and the values for
>>> the variables in the shared pool updated to reflect the values of the
>>> current platform.
>>> So the C compiler is only needed when developing the interface, not when
>>> deploying it.
>>>> Then Nicolas made a point that if we plan to manage all that complexity
>>>> at the image level it may become a hell too.
>>>> So.... what if we take a simpler (probably not better) approach and we
>>>> consider the "c program that exports constants and sizes" a VM Plugin?
>>>> Let's say we have a UnixPreprocessorPlugin (that would work for OSX, Linux
>>>> and other's Unix I imagine for the time being) which provides a function
>>>> (that is exported) which answers an array of arrays. For each constant, we
>>>> include the name of the constant, the value, and the sizeof(). Then from
>>>> image side, we simply do one FFI call, we get the large array and we adapt
>>>> it to a SharedPool or whatever kind of object representing that info.
>>> This is what I suggestred in teh first place. That what is
>>> autogenerated is a shared object (be it a plgin or a dll doesn't matter, it
>>> is machine code generated by a C compiler form an autogenerated C program
>>> compiled with the platform's C compiler) that can be loaded at run-time and
>>> interrogated to fetch the values of a set of variables
>> OK, got it. But still, it would be easier if the "platform" in this case
>> is the "machine where we build the VM we will then distribute" right? i
>> mean, I would like to put this in the CI jobs that automatically builds the
>> VM, and not myself building for each platform.
> NO! For example, why would a company that has some proprietary arithmetic
> package implemented in its secret labs in C or C++ and accessed through the
> FFI want to have that code on the Pharo community's build servers?
>> *I mean, my main doubt is if this job of autogenerating C code, compile
>> it, run it, export text file, and distribute text file with the VM, could
>> be done as part of the VM building. *
> For fuck's sake. Developing an FFI is not something one does when
> building a VM. It is something one does wen using the system. f you want
> to do this you *use a plugin*. The FFI is a different beast. It is to
> allow programers to interface to external librarys that are *independent
> from teh VM*.
> I'm not going to answer this one again. OK?
>>> . But I think that the textual notation suggested above is simpler.
>>> The test files are easier to distribute and change. Shared objects and
>>> plugins have a habit of going stale, and there needs to be metadata in
>>> there to describe the set of constants etc, which is tricky to generate and
>>> parse because it is binary (pointer sizes, etc, etc). Instead a simple
>>> textual format should be much more robust. One could even edit by hand to
>>> add new constants. It would be easy to make the textual file a versioned
>>> file. Etc, etc.
>> OK. Got it. And do you think using X Macros for the autogenerated C (from
>> the SharedPool) is a good idea?
>> And then I simply write a text file out of it.
>>>> I know that different users will need different constants. But let's
>>>> say the infrastructure (plugin etc) is already done. And let's say I am a
>>>> user that I want to build something with FFI and I need some constants that
>>>> I see are not defined. Then I can simply add the ones I need in the plugin,
>>>> and next VM release will have those. If Cog gets moved to Github, then this
>>>> is even easier. Everybody can do a PR with the constants he needs. And in
>>>> fact, if we have the infrastructure in place, I think that we each of us
>>>> spend half an hour, we may have almost everything we need.
>>>> For example, I can add myself all those for signals (to use kill() from
>>>> FFI), all those from fcntl (to make none blocking pipes), all those from
>>>> wait()/waitpid() family (so that I can do a waitpid() with WNOHANG), etc
>>>> etc etc.
>>>> I know it's not the best approach but it's something that could be done
>>>> very easily and would allow A LOT of stuff to be moved to FFI just because
>>>> we have no access to preprocess constants or sizeof() (to know how to
>>>> allocate). I also know this won't cover macros and other stuff. But still.
>>>> If you think this is a good idea, I can spend the time to do it.
>>>> On Thu, May 10, 2012 at 10:09 AM, Nick Ager <nick.ager at gmail.com>
>>>>> Well, like opendbx, maybe because opengl has quite standard
>>>>> It's not that it's not doable, it's that we gonna reinvent gaz plant
>>>>> and it gonna be so boring...
>>>>> I'd like to see a proof of concept, even if we restrict to libc, libm,
>>>>> kernel.dll, msvcrt.dll ...
>>>>> Is the unix style select()
>>>>> ubiquitous or should I use WaitForMultipleObject() on Windows? Are
>>>>> specification of read/write streams implementation machine independant
>>>>> Perhaps *a* way forward is to try to find existing projects which have
>>>>> already created cross-platform abstractions for platform specific
>>>>> functionality. Then we can use FFI to access that interface in a similar
>>>>> way to OpenGL and OpenDBX. For example NodeJs works across unixes - perhaps
>>>>> they have a useful cross-platform abstraction, boost has abstractions of
>>>>> IPC etc
>>> best, Eliot
> best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Vm-dev