Hello all,
FFI was *a little *complexer than I had thought. And the Tesseract api was not helping either. But now I think I'm getting closer to make the example Ben proposed (https://github.com/tesseract-ocr/tesseract/wiki/APIExample, the C-program using the C-API) work in Squeak.
Just one thing I cannot find an example for. I have to create the ExternalStructure classes for the structures PIXMAP and RGBA_QUAD. RGBA_QUAD is easy, but the PIXMAP-structure starts with an array of RGBA_QUADs. RGBA_QUAD[] does not seem to be working as a type specification, and RGBA_SQUAD* will reserve place for the first element, but not the whole array. Is there an example for such structures?
From the Header file:
00101 https://tpgit.github.io/Leptonica/struct_pix_colormap.html struct PixColormap https://tpgit.github.io/Leptonica/struct_pix_colormap.html00102 {00103 https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a2a14164dbec38ebab11eee1bea569cbc void *array https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a2a14164dbec38ebab11eee1bea569cbc; /* colormap table (array of RGBA_QUAD) */00104 https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ac40f93eac5fc385f43e17d0b537e40e2 l_int32 https://tpgit.github.io/Leptonica/environ_8h.html#a9085c7874153c280a4171244aa052e4e depth https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ac40f93eac5fc385f43e17d0b537e40e2; /* of pix (1, 2, 4 or 8 bpp) */00105 https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ad4398c00071558f7821c82c897548fa8 l_int32 https://tpgit.github.io/Leptonica/environ_8h.html#a9085c7874153c280a4171244aa052e4e nalloc https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ad4398c00071558f7821c82c897548fa8; /* number of color entries allocated */00106 https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a99005f6c729d84e55143a208b61f99bf l_int32 https://tpgit.github.io/Leptonica/environ_8h.html#a9085c7874153c280a4171244aa052e4e n https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a99005f6c729d84e55143a208b61f99bf; /* number of color entries used */00107 };00108 https://tpgit.github.io/Leptonica/pix_8h.html#ab2fccb09f9188d3e2cc90f8df11b7de7 typedef struct PixColormap https://tpgit.github.io/Leptonica/struct_pix_colormap.html PIXCMAP https://tpgit.github.io/Leptonica/struct_pix_colormap.html;00109 00110 00111 /* Colormap table entry (after the BMP version).00112 * Note that the BMP format stores the colormap table exactly00113 * as it appears here, with color samples being stored sequentially,00114 * in the order (b,g,r,a). */00115 https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html struct RGBA_Quad https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html00116 {00117 https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a57ceb621e5e83bc2d8b9d78cc426cefd l_uint8 https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad blue https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a57ceb621e5e83bc2d8b9d78cc426cefd;00118 https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a32f8a3f2225995fcedfb6d80bb480c05 l_uint8 https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad green https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a32f8a3f2225995fcedfb6d80bb480c05;00119 https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a9ad88fbc3a671fbe8406e608b59563fa l_uint8 https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad red https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a9ad88fbc3a671fbe8406e608b59563fa;00120 https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a0811097c12e668433c357edcb973da76 l_uint8 https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad reserved https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a0811097c12e668433c357edcb973da76;00121 };00122 https://tpgit.github.io/Leptonica/pix_8h.html#ac4b7ee5b0e033dd9df33e464059cdf87 typedef struct RGBA_Quad https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html RGBA_QUAD https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html;
Op zo 4 nov. 2018 om 06:15 schreef Kjell Godo squeaklist@gmail.com:
Can i just write a simple C shared library or DLL which calls the C++ ? So you are repackaging the C++ as a C library? I can’t see how this tack could fail to work. Just repackage C++ as C.
You would have to come up with a procedural less OOP-ish API i guess. You could have C API functions F which take an Object as F’s first input and in this way each C++ Method becomes a C function. You only need wrap as much of the C++ API as you want to use and each C function just calls its C++ Method so making the wrappers is highly simple and mechanical i should think. it could even be automated. But i know some C++ but have never made anything in it.
I suppose that if Smalltalk cannot contain a C++ Object then you could make a C struct which can be in Smalltalk and you have the API function copy this struct into the C++ Object then act on it then copy the Object data back into the struct which is in Smalltalk. But that’s a lot of work. Surely you can have a pointer to a C++ Object in Smalltalk.
Maybe it would be better to have a separate C++ program P that you communicate with by sockets using Object handles H which are just Integer Array indexes into an Array of Objects in P? i suppose there could be a shared lib L that FFI could call which could call back program P if sockets were too slow or something.
I guess Dolphin can input a Smalltalk BlockClosure B into an FFI call to L which could input B into program P which could call B to get back into Dolphin but i haven’t tried it myself.
I guess there is a Smalltalk interface to Python via a socket and then from Python to C++ is easy? Seems like a code generator that has all this stuff figured out could be good. I think VisualWorks is probably good at connecting to C++ via FFI. What about chicken scheme or any of the C based Schemes? What about Smalltalk/X?
borgLisp is an idea to make multiple Lisp dialects each isomorphic to its target language like C or C++ or Python or Ruby or Prolog or java or C# or Scheme or Rust etc any language can have an isomorphic Lisp dialect targeting it in order to bind all the languages into a single borgLisp where you can mix and match all the languages together. Where each Lisp dialect is just a simple Lisp code generator. And so once all the languages are in Lisp then all the Lisp things can be used to mix and match all the languages together and using Nix to set up and configure everything so everything works together one click like. all the different languages. so they can all work together in an easy generative format. So every language becomes Lisp and Lisp becomes every language. Using code generation you could even make a Debugger in Lisp and Smalltalk which could source debug any language like the Smalltalk debugger does for Smalltalk.
but i guess this is off the topic
On Fri, Nov 2, 2018 at 06:07 Ben Coman btc@openinworld.com wrote:
On Fri, 2 Nov 2018 at 18:44, Edwin Ancaer eancaer@gmail.com wrote:
Hello list,
As I'm looking at a way to automate the search of documents in my humble administration, I read some articles about OCR. I came along an article about using Python with Tesseract, to transform an scan of a document into text, that is searchable.
My question now is if I can do something similar with Squeak. To my inexperienced eye, it seems like I should use FFI to call the functions in the Tesseract API, but this API is in C++, and I don't know if it is possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function symbols, but good fortune I notice Tesseract has C bindings... https://github.com/tesseract-ocr/tesseract#for-developers https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h so it looks like you are in the clear.
Or should I forget the API and use OSProcess to start the tesseract
program?
FFI will be more flexible.
Could anyone point me in the right direction, or just tell if the whole
idea is insane?
I think its a great idea and actually Tesseract FFI is something I've wanted to play with before but not had the time. I'd be interested to hear how you go with it.
cheers -ben
1q