[squeak-dev] FFI (Plugin) | Question about multi-dimensional arrays (e.g., char**, int**, void*****...)

Marcel Taeumel marcel.taeumel at hpi.de
Mon Jun 15 11:21:47 UTC 2020

Since multi-dimensional arrays can be coerced via RawBitsArray to simple pointers, I am not so sure anymore that any additional information should be encoded in ExternalType's compiledSpec. Especially since new state combinations in compiledSpec would yield many new instances of ExternalType ... so be managed in the image somehow :-/

So, how would you implement manual interpretation of pointer arrays in Squeak FFI? With a type alias. Then you have your custom subclass of ExternalStructure (or ExternalTypeAlias) in which you can then add methods to read the ExternalData you are aliasing. This would also work for mapping n-dimensional C arrays to a collection or matrix in Squeak.

How could Squeak FFI still help for such cases?

- Accept type names such as "int **" or "int[][]" in FFI-call specs and struct-field specs
- Generate "convenient" accessors for struct fields and a similar mechanism for ExternalData >> #at:(put:)

I am not sure that we can totally avoid having more instances of ExternalType to encode (pointer) indirections and maybe whether it is an array or not to be careful when following those indirections ...

I think that the plugin side cannot do anything to help here. Especially because of the unknown size for each of such indirections/dimensions.

Am 15.06.2020 11:51:40 schrieb Marcel Taeumel <marcel.taeumel at hpi.de>:
One more thing:

int *ptr[NUM] ... is an array of pointers of size NUM, each pointing to an int, which is like "int **ptr"

int (*ptr)[NUM] ... is a single pointer to an array of int of size NUM, which is like "int *ptr" and I think often automatically created if you pass an array into a function that expects a pointer instead of an array


Am 15.06.2020 11:35:55 schrieb Marcel Taeumel <marcel.taeumel at hpi.de>:
Hi Jakob.

> Consequently you cannot correctly pass an int a[2][3] as an int**, which I learned just yesterday. Never ever say "arrays are just pointers in C" again. :-)

Got it! :-) A two-dimensional array (i.e. int[][]) occupies a single contiguous block of memory while an array of pointers (i.e. int**) as an extra level of indirection and is likely to point to multiple (contiguous) memory blocks, one for each "entry" or "array" (here: int* or int[]). Hmm... only for int* vs int[] it does not matter. Hmm...

> multi-dimensional arrays and multiple nested pointers are two different things and I doubt that the FFI can really treat them the same.

In an FFI call, for example, an IntegerArray can be coerced to int*, which includes n-dimensional arrays because that memory is contiguous anyway. If a function expects int** as one argument ... you may try to give a pointer address via ByteArray? I am not sure how to express "&ptr" in Squeak FFI having an ExternalAddress at hand.  ExternalAddress class >> #allocate: does already malloc(). Not sure how to create pointer "in" the heap that is not yet defined, i.e. "int *ptr;"

Anyway, a struct field with "int **" is more fun to think about for now. :-) Because it does not involve an FFI call and argument coercing. And comparing that to "int[][]".

What about encoding not only the ... "levels of indirection" for a pointer but also whether it is an array (= contiguous memory) such as int[][] or has pointers to follow in between such as int**?


Am 15.06.2020 11:19:43 schrieb Jakob Reschke <forums.jakob at resfarm.de>:
Am Mo., 15. Juni 2020 um 10:05 Uhr schrieb Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]>:

> Do you think that the dimensions are always known?

Yes, how would you else be able to write an FFI interface in the first place? If an interface says "int**" and documents "can be int***" from time to time, then I hope it does also give a hint on how to find that out. Is that even possible with C compilers?

Having established that the sizes are not important for the "dimensions", ok. No, the number of effective stars cannot vary except through casting (i. e. pretending wrong things). That question was about knowing the sizes.
> Note that an int** is not a two-dimensional array int[x][y], so it might be misleading to speak of dimensions.

I don't think it makes a difference from the Squeak FFI perspective. Pointer arithmetic for such access is currently implemented in ExternalData >> #at: and #at:put:. I don't think we should use more new terminology than necessary.

But we should also not use wrong or misleading terminology. Is dimension really the word for "level of pointer nesting/number of pointer indirections"?

As you can see in the CredEnum example, multi-dimensional arrays and multiple nested pointers are two different things and I doubt that the FFI can really treat them the same. For example, an int a[2][3] is just syntactic sugar for int b[2 * 3], which you can access with a[1][2] to get the b[1*3 + 2] element. But for an int**p array of indirections, p[1][2] means: dereference the second pointer in my array and get the int at byte offset 2*sizeof(int) from that. In C with int a[2][3], a[1] gives you a pointer to the start of the second slice (like &a[0][1*3]), which makes it look somewhat similar to an array of pointers, but that is not what it is. There is no array of pointers to the slices at &a. It is the start of the first slice. So I suppose the FFI has to access it differently.

Consequently you cannot correctly pass an int a[2][3] as an int**, which I learned just yesterday. Never ever say "arrays are just pointers in C" again. :-)

=> Don't treat arrays of pointers or nested pointers as multi-dimensional arrays, and therefore please reconsider using the word dimension here, unless it is well-understood and established to also describe the number of indirections.

> If you want to remember only the number of dimensions, some of my remarks may not apply.

Of course, only the number of dimensions. The length/size has to be provided somewhere else. Maybe another field in my external struct. :-) Or maybe zero-terminatd if the library's documentation claims so. Then I have to count manually and store it in ExternalData >> #size. After that, I can enumerate the data.

Since there is a lot of "flexibility" here, I suppose the FFI can only help with some of the more common patterns that you mentioned. :-) But of course, the FFI must not presuppose that any of the patterns is used unless that is explicitly declared in some way.

Note that eager counting can be costly. Consider a 1 GB null-terminated char[]. ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200615/93229190/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 17013 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200615/93229190/attachment.png>

More information about the Squeak-dev mailing list