[squeak-dev] Squeak and Tesseract

Ben Coman btc at openinworld.com
Fri Nov 2 13:06:16 UTC 2018


On Fri, 2 Nov 2018 at 18:44, Edwin Ancaer <eancaer at gmail.com> wrote:

> Hello list,
>
> As I'm looking at a way to automate the search of documents in my humble
> administration, I read some articles about OCR. I came along an article
> about using Python with Tesseract, to transform an scan of a document into
> text, that is searchable.
>
> My question now is if I can do something similar with Squeak. To my
> inexperienced eye, it seems like I should use FFI to call the functions in
> the Tesseract API, but this API is in  C++, and I don't know if it is
> possible to use FFI to call C++ functions?
>

You are right C++ is difficult because of the name mangling of function
symbols,
but good fortune I notice Tesseract has C bindings...
    https://github.com/tesseract-ocr/tesseract#for-developers
    https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.


Or should I forget the API and use OSProcess to start the tesseract
> program?
>

FFI will be more flexible.


Could anyone point me in the right direction, or just tell  if the whole
> idea is insane?
>

I think its a great idea and actually Tesseract FFI is something I've
wanted to play with before but not had the time.
I'd be interested to hear how you go with it.

cheers -ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20181102/ea146b78/attachment.html>


More information about the Squeak-dev mailing list