[squeak-dev] FFI interfacing to thin C layers over C++ libraries [was Re: Squeak and Tesseract]
Ben Coman
btc at openinworld.com
Sun Nov 4 03:48:40 UTC 2018
While I've done a lot of C programming that is useful for FFI interfacing,
I've not done much C++. So just sharing something new I learnt today to
help with FFI interfacing to combined C/C++ libraries. I thought maybe
others in the same boat could be interested in this.
[Original question asked in squeak-dev, cross-posting to pharo-dev]
On Fri, 2 Nov 2018 at 21:06, Ben Coman <btc at openinworld.com> wrote:
>
> On Fri, 2 Nov 2018 at 18:44, Edwin Ancaer <eancaer at gmail.com> wrote:
>
>> As I'm looking at a way to automate the search of documents in my humble
>> administration, I read some articles about OCR. I came along an article
>> about using Python with Tesseract, to transform an scan of a document into
>> text, that is searchable.
>>
>> My question now is if I can do something similar with Squeak. To my
>> inexperienced eye, it seems like I should use FFI to call the functions in
>> the Tesseract API, but this API is in C++, and I don't know if it is
>> possible to use FFI to call C++ functions?
>>
>
> You are right C++ is difficult because of the name mangling of function
> symbols,
> but good fortune I notice Tesseract has C bindings...
> https://github.com/tesseract-ocr/tesseract#for-developers
> https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
> so it looks like you are in the clear.
>
Browsing a deeper I got quite confused for a while.
I could see a typedef definition for TessResultRenderer here...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h#L83
"typedef struct TessResultRenderer TessResultRenderer"
which I understood to must refer to *existing* struct, but I couldn't find
the definition of that struct anywhere. In particular...
$ git clone git at github.com:tesseract-ocr/tesseract.git
$ cd tesseract
$ find . -type f -name "*h" -exec grep -Hn TessResultRenderer {} \;
but didn't find any struct definitions.
I could only find TessResultRenderer as a class definition...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L45-L139
and the only thing that I guessed could possibly make sense was that C++
classes and structs could be used interchangeably. My google-fu failed to
find anything useful, so an experiment...
$ vi test.cpp
#include <stdio.h>
class SomeClass {
public:
int a;
int b;
};
typedef struct SomeClass SomeTypeDef;
int main()
{
SomeTypeDef x;
x.a = 5;
x.b = 7;
printf("Answer is %d\n", x.a + x.b);
}
$ gcc test.cpp
$ ./a.out
Answer is 12
Now I noticed that the TessResultRenderer member variables were private...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L131-L139
and curious about that I changed my test example from public to private
which somewhat expectedly produced compile errors.
So those TessResultRenderer member variables must only be accessed from a
member function, but how is that C++ member function called from C to
operate on a particular object?
An example is TessResultRendererInsert...
C Declaration:
https://github.com/tesseract-ocr/tesseract/blob/c375f4fbf73b8f761b2e65e0e3ad6776b9fbee78/src/api/capi.h#L135
C Definition:
https://github.com/tesseract-ocr/tesseract/blob/c375f4fbf73b8f761b2e65e0e3ad6776b9fbee78/src/api/capi.cpp#L90-L93
C++ Declaration:
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L52
C++ Definition:
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.cpp#L59-L70
So in the C Defintion "the C++ member-function insert() as being called via
a function pointer in the struct." (is that a reasonable way to describe
it?)
In this case, because of the private member variables, our FFI would treat
TessResultRenderer as an opaque object, which simplifies things. I would
guess in-Image direct access to the member variables from would need to
account for the offset due to variables holding the function pointer to the
member functions.
cheers -ben
P.S. for Tesseract FFI it might be good to start with reproducing this
example...
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-using-the-c-api-in-a-c-program
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20181104/5c5daa95/attachment.html>
More information about the Squeak-dev
mailing list
|