[Vm-dev] [ANN] FFICHeaderExtractor first milestone (for early code reviewers) [WAS] Can OSProcess functionality be implemented using FFI instead of plugin?

Mariano Martinez Peck marianopeck at gmail.com
Mon Jan 25 22:27:08 UTC 2016

Hi guys,

OK, I have a first working version and so I wanted to share it with you.

I have not yet the time to start writing the doc since I just finished the
first pass on the code. Tomorrow I will start with the doc. But I thought
some of you may be interested in taking a look even without formal "doc"
(and some feedback/iteration may avoid re-writing docs..).

If you have no clue what I am talking about, then this summary is for you:

*When we  use FFI  to call a certain library it's quite common that we need
to pass as argument certain constants (for example, SIGKILL to kill()).
These constants are defined in C header files and can even change it's
value in different paltforms. *
*These constants also are sometimes defined by the C preprocessor and so
there is not way to get those values from FFI. If you don't have the value
of those constants, you cannot make the FFI call. *

I have tested the tool in OSX and CentOS using latest Pharo 5.0. It won't
work in Windows right now.  As usual, all classes and methods have comments
and there are enough tests.

At the end, I decided the C program will output a very naive Smalltalk
literal array kind of thingy. The tool then parses that output and directly
creates a init method (which is compiled into the SharedPool class) for
that platform which is then called automatically at startup (only if
initialization is needed).

As for real examples, I started to write constants for libc:  signal.h (to
use kill()) , wait.h (to use wait() famility), fcntl.h (to use ... xxx()) ,
and errno.h. You can take a look to the package 'FFICHeaderExtractor-LibC'.

Note that for running the tests you need 'cc' findable by path in OSX and
'gcc' in Unix.

To load the code in a latest Pharo 5.0, execute:

Metacello new
    baseline: 'FFICHeaderExtractor';

Any feedback is appreciated.

I will start writing the doc now.

BTW: Big thanks to Eliot Miranda which helped me answering noob questions
and providing useful code and guidelines.


On Sat, Jan 23, 2016 at 1:12 PM, Eliot Miranda <eliot.miranda at gmail.com>

> Hi Denis,
> On Jan 23, 2016, at 7:30 AM, Denis Kudriashov <dionisiydk at gmail.com>
> wrote:
> 2016-01-22 22:35 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:
>> Let's measure this.  Let's say we have 8 platforms (that's an
>> underestimate, because different Linux distributions may have different
>> values for certain constants), but 8, which is 4 basic platforms times 32-
>> & 64-bits.  We have Mac x86 32-bit, Mac x64 64-bit, Windows x86
>> 32-bit, Windows x64 64-bit, Linux x86 32-bit, Linux ARM 32-bit, Linux x64
>> 64-bit, and soon enough there will be more.  Further, there may be
>> different versions over time.
>> So each of those initialization methods has
>> - 1 slot for the global variable to be assigned
>> - 1 slot for the literal value to assign to it
>> - 3 bytes of bytecode per initialization for small methods, 4 for large
>> methods.  Let's say 4.
>> So the overhead in 32-bits is 12 bytes per constant, and in 64-bits is 20
>> bytes.  So the overhead per constant for all platforms is 96 bytes per
>> constant in 32-bits and 160 bytes per constant for 64-bits.  A full system
>> with sockets, files, a database connexion etc could easily exceed 100
>> constants.  I think it would be nearer 1000.  So the overheads are in the
>> 10- to 100-k byte range (100k ~= 0.5% of the image) on 32-bits.  That's low
>> but it's also pure overhead.  Every GC has to visit them.  Every senders
>> and implementors has to visit them, but they offer nothing of value.
>> Whereas the small parser for whatever notation is used to store the
>> constants externally (if they are needed in a given deployment) has a small
>> constant overhead; its simple code.
>> Further, you still need the machinery to export the constants to be able
>> to generate these initialization methods.  If you've got the machinery and
>> you don't need the methods why bother to generate the methods?
>> As the Scots say, many a mickle makes a muckle.
> Thank's Eliot for such detailed explanation. It makes sense.
> But personally I prefer Smalltalk solution although Smalltalk itself is
> pure overhead comparing to C.
> I can see the draw of the pure Smalltalk. Simplicity and brows ability.
> But imagine a tiny headless image deployed on containers, say 2mb.  Now
> 100kb of initialization code doesn't look so good :-).  Anyway I'm beating
> a dead horse.  Mariano is generating initialization methods.
> My question was raised by Mariano idea to save ston files in methods. I
> think it can reduce problems which you described.
> But then literal array syntax can be more suitable than ston.
> I just want to be clear, I'm neutral about the notation used to export
> info from the C file.  Liberal array syntax, chunk source format, ston,
> xml.  It doesn't matter as long as it's convenient at expressing an
> attribute dictionary from names to attributes such as value, size, offset.
> Don't get hung up on the specific notation.  If one were to go with the
> external file the only real requirements are that it be reasonably compact
> and quick to parse.  That might kill xml but leave plenty of other
> candidates.
> _,,,^..^,,,_ (phone)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160125/3690b1ec/attachment.htm

More information about the Vm-dev mailing list