[squeak-dev] FFI interfacing to thin C layers over C++ libraries [was Re: Squeak and Tesseract]

Ben Coman btc at openinworld.com
Sun Nov 4 03:48:40 UTC 2018


While I've done a lot of C programming that is useful for FFI interfacing,
I've not done much C++.  So just sharing something new I learnt today to
help with FFI interfacing to combined C/C++ libraries.  I thought maybe
others in the same boat could be interested in this.
[Original question asked in squeak-dev, cross-posting to pharo-dev]

On Fri, 2 Nov 2018 at 21:06, Ben Coman <btc at openinworld.com> wrote:

>
> On Fri, 2 Nov 2018 at 18:44, Edwin Ancaer <eancaer at gmail.com> wrote:
>
>> As I'm looking at a way to automate the search of documents in my humble
>> administration, I read some articles about OCR. I came along an article
>> about using Python with Tesseract, to transform an scan of a document into
>> text, that is searchable.
>>
>> My question now is if I can do something similar with Squeak. To my
>> inexperienced eye, it seems like I should use FFI to call the functions in
>> the Tesseract API, but this API is in  C++, and I don't know if it is
>> possible to use FFI to call C++ functions?
>>
>
> You are right C++ is difficult because of the name mangling of function
> symbols,
> but good fortune I notice Tesseract has C bindings...
>     https://github.com/tesseract-ocr/tesseract#for-developers
>     https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
> so it looks like you are in the clear.
>

Browsing a deeper I got quite confused for a while.
I could see a typedef definition for TessResultRenderer here...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h#L83
      "typedef struct TessResultRenderer TessResultRenderer"
which I understood to must refer to *existing* struct, but I couldn't find
the definition of that struct anywhere. In particular...
   $ git clone git at github.com:tesseract-ocr/tesseract.git
   $ cd tesseract
   $ find . -type f -name "*h" -exec grep -Hn TessResultRenderer {} \;
but didn't find any struct definitions.

I could only find TessResultRenderer as a class definition...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L45-L139
and the only thing that I guessed could possibly make sense was that C++
classes and structs could be used interchangeably.  My google-fu failed to
find anything useful, so an experiment...
$ vi test.cpp
        #include <stdio.h>
        class SomeClass {
          public:
            int a;
            int b;
        };
        typedef struct SomeClass SomeTypeDef;
        int main()
        {
                SomeTypeDef x;
                x.a = 5;
                x.b = 7;
                printf("Answer is %d\n", x.a + x.b);
        }
$ gcc test.cpp
$ ./a.out
Answer is 12

Now I noticed that the TessResultRenderer member variables were private...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L131-L139
and curious about that I changed my test example from public to private
which somewhat expectedly produced compile errors.

So those TessResultRenderer member variables must only be accessed from a
member function, but how is that C++ member function called from C to
operate on a particular object?
An example is TessResultRendererInsert...
    C Declaration:
https://github.com/tesseract-ocr/tesseract/blob/c375f4fbf73b8f761b2e65e0e3ad6776b9fbee78/src/api/capi.h#L135
    C Definition:
https://github.com/tesseract-ocr/tesseract/blob/c375f4fbf73b8f761b2e65e0e3ad6776b9fbee78/src/api/capi.cpp#L90-L93
    C++ Declaration:
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L52
    C++ Definition:
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.cpp#L59-L70

So in the C Defintion "the C++ member-function insert() as being called via
a function pointer in the struct." (is that a reasonable way to describe
it?)

In this case, because of the private member variables, our FFI would treat
TessResultRenderer as an opaque object, which simplifies things.  I would
guess in-Image direct access to the member variables from would need to
account for the offset due to variables holding the function pointer to the
member functions.

cheers -ben


P.S. for Tesseract FFI it might be good to start with reproducing this
example...
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-using-the-c-api-in-a-c-program
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20181104/5c5daa95/attachment.html>


More information about the Squeak-dev mailing list