The Do-It-Yourself Guide to Writing Squeak Primitives (DRAFT)

Stephen Travis Pope stp at limbo.create.ucsb.edu
Wed Apr 15 09:43:05 UTC 1998


Hello all,

I'm on a boring cross-country plane flight, so I thought I'd finally
get around to writing a little do-it-yourself guide to adding
primitives to Squeak. Comments are invited. I'll let this simmer for a
few days, and then turn it into a Web page or something more permanent
than an email message (any ideas?).

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rolling Your Own Primitives for Squeak

Introduction

This outline describes how to extend Squeak with your own hand-written
primitives. It's a bit terse (but you'd better be a pretty advanced
Smalltalk and C programmer before attempting this anyway  :-)  ). The
document walks you through the 13 easy steps (well, at least 8 of them
are easy) of creating the Smalltalk and C sides of the primitive
interface, and making a new virtual machine with your extended primitives.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
1: Be sure you Really Need a Primitive

Generally, there are several reasons to hand-code primitives. In the
example I give below, I need to access an OS-level driver for MIDI
input. There's just no way I can do this in Smalltalk. For
performance-optimization primitives (i.e., where the prim's body is
written in Smalltalk and translated to C for performance reasons only),
look at the sound synthesis classes for examples of how to write
low-level Smalltalk code for translation into C. (I don't really go
into that here.)

Please note that if lots of us start writing random and
not-really-well-motivated primitives we won't be able to share any code
at all any more. The namespace of primitives is limited; there is no
formal mechanism for managing that space with multiple
primitive-writers; and merging two virtual machines with different
primitive extensions can be a *real* pain. Do not do this lightly.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2: Design the Smalltalk Interface

For the purposes of this example, I'll take a method from the Siren
MIDI I/O interface. This is the input primitive that reads a MIDI data
packet from the OS-level driver. The details are moot for this presentation.

I have a class MIDIPacket that has inst. vars. as shown in the
following definition.

         Object subclass: #MIDIPacket
             instanceVariableNames: 'length time flags data '
            ...

The first three are integers, the last is a ByteArray (which is
pre-allocated to 3 bytes--the max size of normal MIDI messages--system
exclusive packets are handled specially).

The primitive will live in class PrimMIDIPort and will take a
MIDIPacket and pass it down to the VM, who will fill it in with data
read in from the MIDI driver. The primitive returns the number of bytes
read (the length inst. var. of the packet). Since the primitive does
not use the state of its receiver, it could be put in just about any
class. The argument is the important component.

So, the primitive method will look like
    PrimMIDIPort >> primReadPacket: packet data: data

I pass the packet object and the data byte array separately for
simplicity of the C code and for flexibility (in case I decide to split
them into two Smalltalk objects in the future). (Well, it's also easier
to decompose an object in Smalltalk than it is in C.)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
3: Design the C Interface

The next step is to design the interface for the C-side of the
primitive, and to write a C function prototype for it.
>From the C header file, I did:

    int sqReadMIDIPacket(int MIDIpacket, int dataBuffer);
        // Read input into a MIDI packet. (prim. 614)
        // 'MIDIpacket' is interpreted as (MIDIPacket *) and is
        // written into with the time-stamp, flags, and length.
        // 'dataBuffer' is interpreted as (unsigned char *) and 
        // gets the MIDI message data.
        // Answer the number of data bytes in the packet.

Note that all arguments are passed as ints; you can cast them into
'whatever' at will in the C code.

Most of my primitives return integers (negative values for common
error conditions) and fail only in extreme situations. (This is a
personal preference--I tend to pass the error return values up to
higher levels of code to handle. Other cases might always want to have
a failure handler right in the method that called the prim--see the
discussion below.)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
4: Write the Smalltalk Prim-Calling Method

This is the high-level method that will call the direct primitive
method. It is generally part of one of your "application" classes.
In class PrimMIDIPort, instance side, "primitives" protocol, I have
the following,

    get: packet
        "Read the data from the receiver into the argument (a MIDIPacket)."

        | len | "reads packet header and data, answers amt. of data read"
        len := self primReadPacket: packet data: packet data.
        len >= 0
            ifFalse: [...What to do on bad return value rather than failure...].
        ^len

In Siren, this is called by a read loop that's triggered by a
semaphore coming up from the VM, but that's outside of the scope here.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
5: Write the Smalltalk Primitive Method

This is the actual link to the VM primitive. You need to pick an
unassigned primitive number (in the Interpreter's PrimitiveTable); I
found that 614 was free (I had already used 610-613 for useless drivel).
In class PrimMIDIPort, I added the following method,

    primReadPacket: packet data: data
        "Read a packet from the MIDI driver."
        "Write data into the arguments; answer the number of bytes read."

        <primitive: 614>
        self error: 'MIDI read failed.'

The <primitive: XXX> construct is a primitive call--it's Smalltalk's
way of "trapping" into the VM. The body of the method is the primitive.
The primitive number (614) is an index into the table of all primitives
that's in the Interpreter class. 

If the primitive returns successfully, the statements that follow the
primitive call will never be executed. On the other hand, if the
primitive fails, the Smalltalk code that follows the primitive call
*will* be executed. This is quite hand for cases where you want to try
a Smalltalk implementation (i.e., a good number of primitives fail if
the arguments are not of the default types), or re-try the primitive
with different arguments (i.e., coerce one of the arguments and re-send
the method).

The return value from the primitive (actually, the thing left on the
top of the stack by the glue code--see below) will be the return value
of this method.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
6: Write the Smalltalk "Glue" Method

OK, this is where it gets a bit more complicated. The Interpreter
class is written in Smalltalk, but all of its instance methods get
translated to C and form the core of the Squeak virtual machine (that
huge file named "interp.c" on most platforms). There are class methods
in this class that create the primitive index table (where element 614
will point to our code--see the next step), and the instance methods
whose names correspond to the names given in the class's primitive
table are the actual bodies of the primitives. These typically unpack
the arguments from the stack, call the actual C code of the primitive,
and handle the return values. Look at the example that follows.

There are several stages to this method:
    1) unpack the argument(s) from the stack;
    2) test the arguments for validity (optional);
    3) call the C function that implements the primitive (optional);
    4) pop the arguments (and possible the receiver) off of the stack; and
    5) push the return value onto the stack.

I have annotated the method below with these stages (in parentheses).
Also note that I generally include both the Smalltalk method header and
C function prototype as comments in this method; this makes debugging
it much easier.
In the Interpreter (and/or DynamicInterpreter) class, we have to write,

    primitiveReadMIDIPacket
        "Read a message (a MIDIPacket) from the MIDI interface."
        "ST: PrimMIDIPort primReadPacket: packet data: data"
        "C: int sqReadMIDIPacket (int packet, int data);"

        | packet data answer |
    "Get the arguments"
(1)     data := self stackValue: 0.
(1)     packet := self stackValue: 1.
    "Make sure that 'data' is byte-like"
(2)     self success: (self isBytes: data).
    "Call the primitive"
        successFlag
(3)         ifTrue: [answer := self cCode: 'sqReadMIDIPacket (packet, data + 4)'].
    "Pop the args and rcvr object"
        successFlag
(4)         ifTrue: [self pop: 3.
    "Answer the number of data bytes read"
(5)             self push: (self integerObjectOf: answer)]

For (1), note that the arguments are pushed onto the stack in reverse
order. There are methods (in ObjectMemory) that allow you to get
integers and other kind of things from the stack with automatic
conversion. (Look at the other primitive methods in class Interpreter
for lots of examples.)

Step (2) is a simple example of type-checking on primitive arguments.
The success: message sets the primitive success/fail flag based on
whether the second argument is a ByteArray.

Step (3) uses the message "cCode: aString"; it takes a C function
prototype as its argument and it is here that we actually call out
C-language primitive. Note that I must use the actual variable names
packet and data in the string. The "data + 4" means that the argument
is a ByteArray but that the C code casts it as (unsigned char *); 4 is
the size of the object header, so I skip it to pass the base address of
the byte array's actual (char *) data.

In step (4), we pop the two arguments *and* the receiver object (a
PrimMIDIPort instance) off of the stack if the primitive succeeded.

Step (5) pushes the C function's return value onto the stack as an
integer. There are other coercion functions in ObjectMemory that can be
found used in other primitive methods in class Interpreter.

I have not discussed data sharing between glue code and primitives,
but there are some nifty and flexible facilities for it. Look at John
Maloney's sound primitives, or browse senders of var:declareC: as used
in Interpreter >> primitiveSoundGetRecordingSampleRate.

The glue code method is translated to C when you generate a new
interp.c file (see below) so it is important that you can't just send
arbitrary Smalltalk messages from here. Look at the other primitive
glue code methods in Interpreter (or DynamicInterpreter) for more examples.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
7: Add an Entry to the Interpreter's Primitive Table

Look at Interpreter class's initializePrimitiveTable method; edit it
in-place or add your own init. method.
(Be sure to use the same prim. number you used in step 5 above.)

    ...
    (614 primitiveReadMIDIPacket)
    ...

The init. method is called automagically when you regenerate the interpreter.

Although there is now formal method for registering primitive numbers,
Ward Cunningham's Wiki server does have a page for "voluntary"
reservations. I strongly recommend that you coordinate with other
developers by looking here and telling the world what numbers you're using.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
8: Regenerate the Interpreter

This is where you translate the Interpreter class's instance methods
to C, typically with,

    Interpreter translate: 'interp.c' doInlining: true.

This'll take a while, and will create a file named "interp.c" in the
same directory as the VI. If you haven't already done so, you also need
to write out all the other VM sources by executing,

    InterpreterSupportCode writeMacSourceFiles

or whatever is appropriate on your platform.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
9: Write the C Code

Now we have to actually write the C code for the primitive's
implementation. In my case, I'm just taking some data out of a data
structure I maintain in the VM (it's updated asynchronously from the
MIDI driver call-back routines) and copying it into the objects I
passed down as arguments to the primitive call. The source (somewhat
simplified) looks like the following.

    /***************************************************************
     * sqReadMIDIPacket -- Sent in response to the input semaphore
     * This is to up-load MIDI messages into the arguments (a MIDIPacket
     * and its data ByteArray -- both passed as ints and cast here).
     */
    int sqReadMIDIPacket(int ipacket, int idata) {
        sqMIDIEvent *outPtr;
        unsigned char *cdata;
        int len, i;
        unsigned char *pdata = (unsigned char *)idata;
                    // idata is a byte array (+4 to skip the object header)
                    // The ipacket object is defined as:
                    //     Object subclass: #MIDIPacket
                    //         instanceVariableNames: 'length time flags data '
    
        success(true);              // set the success flag to true
        if (itemsInInQ == 0)        // return immediately if there is no input
            return (0);
        if (sqCallback == 0)        // answer an error code if input is off
            return (-1);
                                    // Get a pointer to the MIDI event structure
        outPtr = &sqMIDIInQ[itemsInInQ2++];
                                    // Print a message for debugging--yes,
                                    // you can use printf() in the VM!
        if(debug_mode) printf("%x %x %x\n", 
            outPtr->data[0], outPtr->data[1], outPtr->data[2]);
        len = outPtr->len;          // copy the response fields 
                                    // copy the driver data into the packet
                                    // inst vars are 1-based
        instVarAtPutInt(ipacket, 1, len);   // copy length, time, flags
        instVarAtPutInt(ipacket, 2, (int)(outPtr->timeStamp));
        instVarAtPutInt(ipacket, 3, (int)(outPtr->flags));
    
        cdata = &(outPtr->data[0]); // copy MIDI message bytes into the packet
        for (i=0; i<len; i++) 
            *pdata++ = *cdata++;
            
        return (len);               // Answer len
    }       // End of fcn

Most of this should be pretty obvious to the seasoned C programmer.
The cast of the idata argument from int to (unsigned char *) will work
because it's actually a ByteArray (+ 4) in Smalltalk. The
instVarAtPutInt() macro is defined as,

    #define longAtput(i, val)   (*((int *) (i)) = val)
    #define instVarAtPutInt(obj, slot, val) \
        longAtput(((char *)obj + (slot << 2)), (((int)val << 1) | 1))

This is nasty, but allows you to stuff 31-bit integers into instance
variables with abandon. If you look into interp.c, there are more
useful macros for primitive writers that would help you if you need to
write floats, etc.

I also use printf() for debugging, On a Mac, printf() from the VM pops
up an output window for the messages. I use the following macros for
debugging primitives,

    Boolean debug_mode = false;     // Enable/Disable printfs (see macros below)
                                    // Debugging macros
    #define dprint1(str)            if(debug_mode) printf(str)
    #define dprint2(str, val)       if(debug_mode) printf(str, val)
    #define dprint3(str, v1, v2)    if(debug_mode) printf(str, v1, v2)
    etc...

(The same could be done with #ifdef, of course.)

The last line of the function returns an integer to the glue code,
which pushes it onto the stack explicitly after popping the arguments
and receiver object.

-----------------------------------------------------------
10: Add the Function Prototype for your C Function(s) to the Squeak Header file

In order for the primitive call (in interp.c) to work, you need to
provide a function prototype (at least if you use a C compiler that
requires them, which you should). In the main Squeak header
file--sq.h--I added,

    /* MIDI Prims */            // Added by STP
    #include "OMS.h"                    // OMS definitions and structs
    #include <MIDI.h>                   // Apple MIDI Libraries
    #include "sqMIDI.h"                 // Squeak MIDI Structs and Prims

and in my package's header file--sqMIDI.h--I have,

        // Read input into a MIDI packet. (614)
        // 'MIDIpacket' is interpreted as (MIDIPacket *) and is written into
        // with the time-stamp, flags, and length.
        // 'dataBuffer' is interpreted as (unsigned char *) and gets the MIDI
        // message data.
        // Answer the number of data bytes in the packet.
    int sqReadMIDIPacket(int MIDIpacket, int dataBuffer);

Note that I have to include another header file for the OMS libraries,
and to include the Apple MIDI library. This would not be necessary for
a simpler primitive that had less (baggage) of its own.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
11: Add Your C File to the VM Make/Project File

So, we're almost done. Depending on your platform, you'll have to add
your new C file to the VM makefile or import it into the VM build
project with the appropriate C development tool (e.g., CodeWarrior).
Depending on the complexity of your C code, you might also have to add
additional libraries to the linker command ( or import them to the
interactive development tool's library list).

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
12: Recompile the VM

Now either say "make" or use the appropriate compiler/linker tool to
rebuild the VM. Make sure it recompiles the interp.c file as well as
your primitive C code, and that you link with any additional libraries
required by your C code.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
13: Test It

If all of the above steps worked, you should now have a new virtual
machine that includes your C primitive! You can start it with a virtual
image that contains the Smalltalk side of your primitive and test it
out. If this is your first foray into adding primitives, I strongly
suggest that you start with a really trivial primitive (e.g., one that
squares its argument or some such nonsense) to run through the process
from start to finish. If you're so experienced that you don't need to,
then why are you reading this note?

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Notes

First off, the entire proceeding discussion is inexcusably C-centric.
You can, of course, write primitives in any language that adheres to
C's calling convention and can be cross-linked with C on your host
platform (so why not use FORTRAN, Pascal, or ADA?).

There's another whole note yet to be written about debugging
primitives, but on most platforms you can simply use the debugger to
put breakpoints in the C primitive methods and single-step through them
(Smalltalk will be frozen all the while, of course).

There is really no net (in terms of memory protection or "safe"
primitives) here; it's quite easy to corrupt Smalltalk's heap or other
memory with C, and to end up with a system that crashes unpredictable
some time after you call your primitive. Be really careful about memory
and stack management.

You can also trigger Smalltalk semaphores from C primitives; see John
Maloney's SoundPlayer class or Siren's PrimMIDIPort for examples. If
you're really clever, you can even create events and post them in the
event input queue.

For more examples: See the socket primitives for a simple interface to
an external API (that passes structures around and coerces between
Smalltalk objects and C structs); see the sound player primitives for
examples of asynchronous I/O; see the AbstractSound classes for
examples of automatically generated primitives.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Epilog: Wouldn't it be nice if....

    Primitives were called by name.
    Glue code was generated automagically from the Smalltalk and C
        function prototypes.
    Primitives could raise exceptions instead of failing.
    We didn't need primitives at all!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Comments are invited.

stp -- 1998.04.14


_ Stephen Travis Pope
_ Center for Research in Electronic Art Technology (CREATE)
_ Department of Music, Univ. of California, Santa Barbara (UCSB)
_ stp at create.ucsb.edu, http://www.create.ucsb.edu/~stp/





More information about the Squeak-dev mailing list