[Vm-dev] Slang newbie question

Andreas Raab andreas.raab at gmx.de
Tue Sep 15 05:34:52 UTC 2009

Hi -

Generally speaking, any question of the form "I already typed X, now 
tell me how to make it Y" is ill-formed ;-)

Passing a string to a C function is about the most unpleasant things to 
do in Squeak since we don't have spare room for the trailing zero. You 
either need to malloc it, or, if you have control over the kind of 
strings passed into the primitive you *could* require the string to be 
zero-terminated and fail the primitive if it isn't, letting the 
Smalltalk code decide how to fix it. For example:

FooClass>>primitiveBar: aString
   "Calls primitiveBar with aString. If aString doesn't include
    a trailing zero, the primitive fails and we retry."
    <primitive: 'bar' module: 'FooPlugin'>
    (aString notEmpty and:[aString last asciiValue = 0])
	ifTrue:[^self primitiveFailed]
         ifFalse:[^self primitiveBar: (aString copyWith: (Character 
value: 0))].

In this case your Slang plugin only needs to check that the input is not 
empty, and contains a trailing zero, and can pass this straight to your 
C function.

OTOH, the FFI is specifically designed to take care of such problems, so 
if you're already interfacing a C library you might just use, e.g.,

FooClass>>primitiveBar: aString
   <stdcall: void 'bar' (char*) module: 'FooPlugin.dll'>
   ^self externalCallFailed

This will do the necessary conversion, both on the way in (from Squeak 
to C) and on the way out (from C to Squeak) if required.

Lastly, the concrete question of "how do I pass / return a string from a 
primitive" is actually fairly simple:

   "Primitive passing and returning the string from a bar() call"
   | stringOop sz stringPtr argPtr resultPtr |
   self var: #stringPtr type: 'char *';
   self var: #argPtr type: 'char *';
   self var: #resultPtr type: 'char *';

   "the usual primitive prologue"
   self export: true.
   interpreterProxy methodArgumentCount = 1
     ifFalse:[^interpreterProxy primitiveFail].

   "Prepare the string. You can put this into a utility method."
   stringOop := interpreterProxy stackValue: 0.
   (interpreterProxy isBytes: stringOop)
     ifFalse:[^interpreterProxy primitiveFail].
   sz := interpreterProxy byteSizeOf: stringOop.
   stringPtr := interpreterProxy firstIndexableField: stringOop.
   argPtr := self malloc: sz+1. "might use alloca() instead"
   0 to: sz-1 do:[:i| argPtr at: i put: (stringPtr at: i)].
   argPtr at: sz put: 0.

   "Call the primitive"
   resultPtr = self cCode: 'bar(argPtr)' inSmalltalk:[nil].

   "free() the arg string"
   self free: argPtr.

   "Not clear whether result == NULL should mean failure or empty.
   Assume failure since it simplifies things."
   resultPtr == nil ifTrue:[^interpreterProxy primitiveFail].

   "Return the string. You can put this into another utility method."
   sz := self strlen: resultPtr.
   "ONLY this call may cause GC"
   stringOop := interpreterProxy instantiateClass: interpreterProxy 
classString indexableSize: sz.
   stringPtr := interpreterProxy firstIndexableField: stringOop.
   0 to: sz-1 do:[:i| stringPtr at: i put: (resultPtr at: i)].
   interpreterProxy failed ifFalse:[
         pop: interpreterProxy methodArgumentCount+1;
         thenPush: stringOop.

The only call that may cause GC in the above is in the string allocation 
via #instantiateClass:indexableSize:. Everything else has been computed 
so it won't be touched.

   - Andreas

Ronald Spengler wrote:
> Yikes. Looks like returning a string from the prim isn't easy either.
> I googled and didn't find anything in particular; is there a generally
> accepted approach to doing this?
> I already have:
> 	self returnTypeC: 'char *'.
> The best I could find was TR suggesting that one find a primitive that
> already knew what to do with a char* (I grok that the pointer is
> unsafe if garbage collection has occured:)
> http://lists.squeakfoundation.org/pipermail/vm-dev/2005-December/000387.html
> On Mon, Sep 14, 2009 at 8:30 PM, Andreas Raab <andreas.raab at gmx.de> wrote:
>> Ronald Spengler wrote:
>>> Okay, reread the code, realized that the comments actually answer
>>> every question I have except "how do I guarantee that the garbage
>>> collector doesn't run"
>> Don't call functions that allocate objects. A GC may happen if you call a
>> function that allocates an object, including, but not limited to,
>> instantiateClassIndexableSize(); makePointwithXValueyValue() and others.
>> This will hopefully change in the future to a scheme where primitives
>> *never* cause GC unless explicitly requested (and rather fail the allocation
>> and have the plugin deal with that failure) but for now, the only thing you
>> can do is to either avoid allocations altogether (which isn't as bad as it
>> sounds since mostly you don't need to allocate Squeak objects from
>> primitives) or do it at the end of the primitive where you can ignore the
>> inputs (since they have been used already) and just construct a result, or
>> do proper remapping of your inputs (listed in increasing number of
>> difficulty).
>> If someone has a list of interpreter proxy functions handy, we can tell you
>> which ones may cause GC and which ones are GC-safe.
>> Cheers,
>>  - Andreas
>>> On Mon, Sep 14, 2009 at 7:36 PM, Ronald Spengler <ron.spengler at gmail.com>
>>> wrote:
>>>> Thanks Dave! That helps a lot. This snippet scares me a little bit
>>>> though:
>>>> "in a section of code in which the garbage collector is guaranteed not to
>>>> run"
>>>> I realize now that:
>>>>  - I don't know how to guarantee that the gc won't run
>>>>  - My C library will take it's sweet time running, and it's runtime is
>>>> a function of it's input, could be forever in the extreme case.
>>>> So, to be safe, on the C side of things, should I copy the string
>>>> ASAP? Or does C code escape the garbage collector? Is it safe to
>>>> malloc()?
>>>> Thanks again for your help, and please forgive my ignorance.
>>>> On Sun, Sep 13, 2009 at 6:56 PM, David T. Lewis <lewis at mail.msen.com>
>>>> wrote:
>>>>> On Sun, Sep 13, 2009 at 01:23:32PM -0700, Ronald Spengler wrote:
>>>>>> Hello everyone.
>>>>>> I have a named primitive, and I need to send a ByteString to it, to be
>>>>>> processed and returned by an external library. To get a string into
>>>>>> Slang,
>>>>>> should I send it #asByteArray, and would that let me treat the bytes as
>>>>>> integers on the stack? I'm basically trying to get a char* on the other
>>>>>> side.
>>>>> You can use the ByteString as a parameter to the primitive, no problem.
>>>>> The only tricky bit is that C expects null terminated strings, so you
>>>>> need
>>>>> to copy the contents of the ByteString into a null terminated array
>>>>> before
>>>>> you can let it be used by the C library as a char *.
>>>>> I'm sure there are lots of examples, but you can look at
>>>>> OSProcessPlugin>>cStringFromString: and
>>>>> OSProcess>>transientCStringFromString:
>>>>> for examples of how to copy the string buffer into a null terminated
>>>>> buffer
>>>>> for use in C. Look at senders of these two methods for examples of
>>>>> primitives
>>>>> that pass strings as parameters. (OSProcessPlugin is on SqueakSource if
>>>>> you
>>>>> do not have it).
>>>>> Following are a couple of examples taken from OSPP. In both cases, a
>>>>> buffer
>>>>> is allocated with size one greater than the string length, and the
>>>>> contents
>>>>> of the Smalltalk string are copied into the buffer space with a trailing
>>>>> null terminator. The #primitiveChdir example allocates a new Smalltalk
>>>>> string to use for the buffer, and #primitivePutEnv uses malloc to
>>>>> allocate
>>>>> the new buffer (because in this case the buffer must be "permanently"
>>>>> valid
>>>>> after the primitive exits).
>>>>> primitiveChdir
>>>>>       "Call chdir(2) to change current working directory to the
>>>>> specified path string. Answer
>>>>>       nil for success, or errno on failure."
>>>>>       | path errno |
>>>>>       self export: true.
>>>>>       self var: 'path' type: 'char *'.
>>>>>       self var: 'errno' type: 'extern int'.
>>>>>       path := self transientCStringFromString: (interpreterProxy
>>>>> stackObjectValue: 0).
>>>>>       (self chdir: path)
>>>>>               ifTrue: [interpreterProxy pop: 2; push: interpreterProxy
>>>>> nilObject]
>>>>>               ifFalse: [interpreterProxy pop: 2; pushInteger: errno].
>>>>> transientCStringFromString: aString
>>>>>       "Answer a new null-terminated C string copied from aString.
>>>>>       The string is allocated in object memory, and will be moved
>>>>>       without warning by the garbage collector. Any C pointer
>>>>>       reference the the result is valid only until the garbage
>>>>>       collector next runs. Therefore, this method should only be used
>>>>>       within a single primitive in a section of code in which the
>>>>>       garbage collector is guaranteed not to run. Note also that
>>>>>       this method may itself invoke the garbage collector prior
>>>>>       to allocating the new C string.
>>>>>       Warning: The result of this method will be invalidated by the
>>>>>       next garbage collection, including a GC triggered by creation
>>>>>       of a new object within a primitive. Do not call this method
>>>>>       twice to obtain two string pointers."
>>>>>       | len stringPtr newString cString |
>>>>>       self returnTypeC: 'char *'.
>>>>>       self var: 'stringPtr' declareC: 'char *stringPtr'.
>>>>>       self var: 'cString' declareC: 'char *cString'.
>>>>>       len := interpreterProxy sizeOfSTArrayFromCPrimitive:
>>>>> (interpreterProxy arrayValueOf: aString).
>>>>>       "Allocate space for a null terminated C string."
>>>>>       interpreterProxy pushRemappableOop: aString.
>>>>>       newString := interpreterProxy
>>>>>               instantiateClass: interpreterProxy classString
>>>>>               indexableSize: len + 1.
>>>>>       stringPtr := interpreterProxy arrayValueOf: interpreterProxy
>>>>> popRemappableOop.
>>>>>       cString := interpreterProxy arrayValueOf: newString.
>>>>>  "Point to the actual C string."
>>>>>       self cCode: '(char *)strncpy(cString, stringPtr, len)'.
>>>>> "Make a copy of the string."
>>>>>       cString at: (len) put: 0.
>>>>> "Null terminate the C string."
>>>>>       ^ cString
>>>>> primitivePutEnv
>>>>>       "Set an environment variable using a string of the form
>>>>> 'KEY=value'. This
>>>>>       implementation allocates a C string using malloc to allocate from
>>>>> the C heap
>>>>>       (using cStringFromString rather than transientCStringFromString).
>>>>> This
>>>>>       is necessary because the C runtime library does not make a copy of
>>>>> the
>>>>>       string into separately allocated environment memory."
>>>>>       | cStringPtr keyValueString |
>>>>>       self export: true.
>>>>>       self var: 'cStringPtr' declareC: 'char *cStringPtr'.
>>>>>       keyValueString := interpreterProxy stackObjectValue: 0.
>>>>>       cStringPtr := self cStringFromString: keyValueString.
>>>>>       ((self putenv: cStringPtr) == 0)        "Set environment
>>>>> variable."
>>>>>               ifTrue: [interpreterProxy pop: 2; push: keyValueString]
>>>>>               ifFalse: [^ interpreterProxy primitiveFail]
>>>>> cStringFromString: aString
>>>>>       "Answer a new null-terminated C string copied from aString. The C
>>>>> string
>>>>>       is allocated from the C runtime heap. See
>>>>> transientCStringFromString for
>>>>>       a version which allocates from object memory.
>>>>>       Caution: This may invoke the garbage collector."
>>>>>       | len sPtr cString |
>>>>>       self returnTypeC: 'char *'.
>>>>>       self var: 'sPtr' declareC: 'char *sPtr'.
>>>>>       self var: 'cString' declareC: 'char *cString'.
>>>>>       sPtr := interpreterProxy arrayValueOf: aString.
>>>>>       len := interpreterProxy sizeOfSTArrayFromCPrimitive: sPtr.
>>>>>       cString := self callocWrapper: len + 1 size: 1.         "Space for
>>>>> a null terminated C string."
>>>>>       self cCode: '(char *) strncpy (cString, sPtr, len)'.    "Copy the
>>>>> string."
>>>>>       ^ cString
>>>>> callocWrapper: count size: objectSize
>>>>>       "Using malloc() and calloc() is something I would like to avoid,
>>>>> since it is
>>>>>       likely to cause problems some time in the future if somebody
>>>>> redesigns
>>>>>       object memory allocation. This wrapper just makes it easy to find
>>>>> senders
>>>>>       of calloc() in my code. -dtl"
>>>>>       self returnTypeC: 'void *'.
>>>>>       ^ self cCode: 'calloc(count, objectSize)'
>>>>> Dave
>>>> --
>>>> Ron

More information about the Vm-dev mailing list