While I've done a lot of C programming that is useful for FFI interfacing, I've not done much C++.  So just sharing something new I learnt today to help with FFI interfacing to combined C/C++ libraries.  I thought maybe others in the same boat could be interested in this.
[Original question asked in squeak-dev, cross-posting to pharo-dev]

On Fri, 2 Nov 2018 at 21:06, Ben Coman <btc@openinworld.com> wrote:

On Fri, 2 Nov 2018 at 18:44, Edwin Ancaer <eancaer@gmail.com> wrote:
As I'm looking at a way to automate the search of documents in my humble administration, I read some articles about OCR. I came along an article about using Python with Tesseract, to transform an scan of a document into text, that is searchable. 

My question now is if I can do something similar with Squeak. To my inexperienced eye, it seems like I should use FFI to call the functions in the Tesseract API, but this API is in  C++, and I don't know if it is possible to use FFI to call C++ functions?

You are right C++ is difficult because of the name mangling of function symbols, 
but good fortune I notice Tesseract has C bindings...
so it looks like you are in the clear.

Browsing a deeper I got quite confused for a while. 
I could see a typedef definition for TessResultRenderer here... https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h#L83
      "typedef struct TessResultRenderer TessResultRenderer"  
which I understood to must refer to *existing* struct, but I couldn't find the definition of that struct anywhere. In particular...
   $ git clone git@github.com:tesseract-ocr/tesseract.git 
   $ cd tesseract
   $ find . -type f -name "*h" -exec grep -Hn TessResultRenderer {} \;
but didn't find any struct definitions.

I could only find TessResultRenderer as a class definition... https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L45-L139
and the only thing that I guessed could possibly make sense was that C++ classes and structs could be used interchangeably.  My google-fu failed to find anything useful, so an experiment...
$ vi test.cpp
        #include <stdio.h>
        class SomeClass {
          public:
            int a;
            int b;
        };
        typedef struct SomeClass SomeTypeDef;
        int main()
        {
                SomeTypeDef x;
                x.a = 5;
                x.b = 7;
                printf("Answer is %d\n", x.a + x.b);
        }
$ gcc test.cpp
$ ./a.out
Answer is 12

Now I noticed that the TessResultRenderer member variables were private... https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L131-L139
and curious about that I changed my test example from public to private
which somewhat expectedly produced compile errors. 

So those TessResultRenderer member variables must only be accessed from a member function, but how is that C++ member function called from C to operate on a particular object?
An example is TessResultRendererInsert...  

So in the C Defintion "the C++ member-function insert() as being called via a function pointer in the struct." (is that a reasonable way to describe it?)

In this case, because of the private member variables, our FFI would treat TessResultRenderer as an opaque object, which simplifies things.  I would guess in-Image direct access to the member variables from would need to account for the offset due to variables holding the function pointer to the member functions.

cheers -ben


P.S. for Tesseract FFI it might be good to start with reproducing this example...