On Fri, 2 Nov 2018 at 18:44, Edwin Ancaer <eancaer@gmail.com> wrote:
Hello list,

As I'm looking at a way to automate the search of documents in my humble administration, I read some articles about OCR. I came along an article about using Python with Tesseract, to transform an scan of a document into text, that is searchable.

My question now is if I can do something similar with Squeak. To my inexperienced eye, it seems like I should use FFI to call the functions in the Tesseract API, but this API is in  C++, and I don't know if it is possible to use FFI to call C++ functions?

You are right C++ is difficult because of the name mangling of function symbols, 
but good fortune I notice Tesseract has C bindings...
    https://github.com/tesseract-ocr/tesseract#for-developers
    https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.


Or should I forget the API and use OSProcess to start the tesseract program? 

FFI will be more flexible.
 

Could anyone point me in the right direction, or just tell  if the whole idea is insane?

I think its a great idea and actually Tesseract FFI is something I've wanted to play with before but not had the time.
I'd be interested to hear how you go with it.

cheers -ben