Creating squeak pluggable primitives - a few ideas, some confusion, looking for like minded folks

Mark Mullin mark at vibrant3d.com
Wed Nov 7 20:11:23 UTC 2001


I used Smalltalk extensively in the '80s, everything from "Methods" to
various
XSIS and Parc stuff.  While I may have gotten a little sidetracked in the
last
decade, I'm sure glad to find Squeak these days.

That said, I'm involved in a project to bind our virtual reality engine, V3,
to Squeak.  We've done this before via the MSFT COM interfaces, supporting
Java,
JScript, and a horde of other languages, but our dependence on MSFT wrt Java
cost us a lot of money, we have no desire to repeat it, and given what we've
seen with Squeak, we have a much better option anyway.

So what we have here is a *very* rusty smalltalker taking a very large and
complicated subsystem and binding that to Smalltalk.  I'm aware of Balloon3D
and Alice, but I'm also partial.  I like our stuff better.  Familiarity
breeds
content and all that.  Overall, we've got about 15 main classes with about
200
members to bind first, and then we can consider the supporting characters.
Overall, we need to tie in a very large pile of classes and methods.

In doing my initial research, I found a chunk of useful stuff out there,
like
Greenberg's notes on Pluggable Primitives, Fuhrers patch to support C++ code
generation, and the fact that Alan Kay seems to be an active participant.
The latter is comforting, the former are pretty useful.  On the other hand,
I didn't see much out there about the process of converting the MIDI stuff
to the new plugin architecture, other than the initial plea for someone,
anyone, to do it  :-)

*would you please get to the point*

We're not anxious to give away our own source, but we're also anxious that
Squeak continue making nice forward progress.  We'd like to contribute our
trail of breadcrumbs, discoveries, and code back to the community to ease
the process with other possible integrations, if there is any interest in
that.  We'd also like to avoid becoming defacto experts on every aspect of
the Squeak VM, and would hope that we could rely on a few of the experts
out there to answer some of our questions.

To that end, we just finished the initial tying of Squeak to our system.
We exercised only a few functions, those that allowed us to open one of
our native files, read objects from it, and get the names of the objects
we read.  Whoo-hoo.  On the other hand, when you see your object in Squeak,
you do get a certain warm glow.

The rest of this covers what we've been working with (successfully calling
our C++ core, and exchanging data with it via parameters/selectors), some
essential mods we've made to code generation (C++ support and ability to
get in the *very* first word with the compiler, and what we're currently
confused about.  We'd be very interested in any comments about where we're
going wrong right at the start, it's a lot easier to change directions now.

Regards all,

Mark Mullin
www.vibrant3d.com


Contents
	What we think we learned
	How we have worked to damage the code generator
	Questions that are still bewildering us
	Current change set (attachment)


Here's what we learned thus far

1)  Forget the numbered primitives, as described in Pope's paper on the
same.
This isn't to knock the paper or the primitives, they're both fine. The real
deal is that this really works out to be an area where people actually
extending
the native capabilities of the squeak VM should play, not where other quasi-
independent subsystems should be linked in.

One great reason is that if you use named primitives in shared libraries,
you
don't have to rebuild the squeak machine.  Screw with the internal primitive
numbers, you get to rebuild the squeak machine.

So, lesson learned is that if you are integrating another system into
squeak,
the best way is through named primitives and dynamically loaded libraries.
This way you only need to touch and recompile the things that really are of
interest to you, and Squeak itself doesn't become a moving target.



2)  Don't do built-in plugins for the same reason.  Yeah, I guess you could
build
them in later, we might even do that.  But not till we exit development, as
there's a big difference between rebuilding our  2 pages of glue code in a
dynamic library or the whole system.  Me, I'd pick door # 1 every time.


3) [Here's where it would be nice if some experts chimed in]  In looking at
implementers and senders of Interpreter and plugin methods, there seems to
be
a few standards drifting around, some of which are pretty obviously
orphaned.
I, of course, now can't find these functions since I actually do need them,
but
I do remember being tricked a couple of times by methods that appeared to be
for
converting an interpreter stack value to something more directly useful, and
then finding that there were no implementers.  Basically, our own experience
is
that there's no real substitute for firstIndexableField when trying to snoop
about in the innards of objects. [ see further on for the C++ side macros we
use to do this]

4) It seems that native ints are 31 bits ?  I'm not positive, but several
minor
discoveries about the use of int and bitflags relating to returns from
primitive
functions meant you couldn't do the lazy mans integration of treating
pointers
in the client system (V3 in our case) as magic integers in Smalltalk.  If I
could
just return  C++ instance pointer and treat it like an int in Smalltalk that
would
be nice, as I am lazy by nature.  Instead it seems safer to convert our c++
instance pointers and the like to a 4 byte array, and use that.


5) A word I've always liked is "thunk", which I first heard when msft tried
to
explain how 16 and 32 bit code could lie down like the lion and the lamb.
And
we all know what happened there....

Nevertheless, it does appear that a generalized description of the process
of
interfacing Smalltalk to another systems has a very clear thunking layer,
which
exists to bridge the differing operational natures of the interfaces between
the systems.  The clearest indicator of this is in the primitive call
mechanism
itself, which can be generalized as a call of a procedure which does not
affect
the stack but does change the execution pointer.

Dig this -
V3Object>>getName
	rawGetName: myInternalV3Object

and this -
V3Object>>rawGetName " :anObject"
	<primitive: 'rawGetName' module: 'purehell'>
	self error: 'If you are here you are dead'


I believe that in many integration efforts, especially if the target system
has
a clue about objects, that Squeak will end up using a lot of the native
capabilities of the target, i.e. in this case Squeak *would not* maintain a
separate copy of the name.  Way too much potential for problems...  This
means
that there will be this point where one needs to switch from the high level
squeak object to an internal client system object.  I would argue that this
mechanism could benefit from a formalized generalization, as we could then
construct various helpful code fragments to aid us in these efforts.

	So here's a first crack at it, constrained to those systems which have
their own internal object management functions which squeak must
interoperate
with, not ignore.

Assumptions
	A) Significant objects have class definitions in squeak and in the client
system.
	   The squeak definition primarily focuses on managing the client object id
	   and aggregating members that operate on those client ids.

	SQUEAK LAYER -  Arbitrary squeak methods, selectors, etc are used to issue
calls
			on a thunking method in the private methods of the Squeak side object.

	THUNK LAYER  -  Simple squeak methods containing only a <primitive>
construct
			and trailing error code for those rare cases when the subsystem
			misbehaves, but not so badly the OS restarts.

	CLIENT LAYER  -  Squeak code, oriented towards translation, embedded in the
plugin
			class.  This layer is only called by thunking members of the
			appropriate class.



6) Passing parameters -
Inside the plugin, we can recover parameters via stack manipulation.  These
parameters are all native squeak objects, even if that's a thin gloss over
some more complex structure of the clients.  As a side note, the trick of
allocating a squeak byte array to hold client side structures like MIDI
buffers and file handles is extremely cool, and very useful, but not for us.
We already have an object manager, so our focus is pretty much pointer
oriented.

Our first approaches to recovering pointers led to the creation of a lot of
internal variables of questionable value.  If we had passed a parameter that
was a 4 byte squeak array holding our real object pointer, we ended up
having
to code and manage both the squeak object pointer and our internal pointer.
In many cases we could care less about the squeak object after we'd yanked
our
parameter out.  By examining the generated code, we created some macros that
should work for now and be easily changeable whenever the codegen folks
decide
on a major rev.

So far, we created the following macros

	a) A general purpose macro V3_GETARG(type,basename,idx),
	   which takes the final type of the recovered data, the base name
	   for the variable, and the stack index.  It will create an internal
	   int variable to hold the squeak object pointer, and will create the
	   basename variable to hold the dereferenced value.

#define V3_GETARG(type,basename,idx) \
	int basenameOOP = interpreterProxy->stackValue(idx);\
	if (!(interpreterProxy->isBytes(basenameOOP)))\
		throw 0;\
	type basename = (type) interpreterProxy->firstIndexableField(basenameOOP);


	Limitations on this approach primarily involve the lack of visibility of
	the two variables, and the need to remember an extra level of indirection.
	In the first case, this isn't a problem when your goal is to be able to
	make a call into the client system such as

	self cCode: 'V3_GETARG(CUMF**,theFile,0)'.
	self var: #aReadObject declareC: 'int aReadObject = 0'.
	self cCode: 'aReadObject = (int) (*theFile)->LoadObject()'.

	In the second case, double indirection is just the nature of the beast. If
	you're embedding client object pointers inside squeak, squeak is going to
	give you a pointer to where that pointer is stored via the
firstIndexableField
	method, so you'll need to dereference.  Some argument could be given for
	automatically handling one level of dereferencing inside the macro,
	but I'd like to hear what some codegen folks say first.
	Makes me too nervous right now.

	b)  A shorthand utility macro for getting strings.  There's a bit of fuss
	involved in moving strings between squeak and the client, primarily because
	they're really just dumb arrays (instead of dumb strings).  When you
combine
	that with the fact that our V3 system tends to use STL string class
elements
	where possible, it lead to this little macro,


#define V3_STRING(basename,idx) \
	int basenameOOP = interpreterProxy->stackValue(idx);\
	if (!(interpreterProxy->isBytes(basenameOOP)))\
		throw 0;\
	char* basenameTXT = (char*)
interpreterProxy->firstIndexableField(basenameOOP);\
	int basenameLEN = interpreterProxy->byteSizeOf(basenameOOP);\
	std::string basename(basenameTXT,basenameLEN);

	b.1) Since strings tend to go in both directions, we also implemented a
	basic function in our V3Plugin to handle conversion of internal strings
	to squeak strings.  In doing this we used the basic code from the foreign
	function interface string conversion method.

Current Modifications to the code generation process

	1)  If you use C++ to build your client system, then definitely get
Fuhrer's paper
	   on how he modified the code generator to support C++.  It exactly met
our needs.
	   Paper can be found in the squeak mailing list repository, search for
	   Fuhrer and "C++ code generation"

	2) We (sigh) use MFC from MSFT cause it was a lot cheaper than writing a
bunch of
	   inet oriented classes ourselves and because it's really hard to keep the
damn
	   thing out of your builds anyway.  Given this need, and our extensive set
of compiler
	   configuration flags (try using the STL without disabling function length
warnings)
	   we needed to be able to emit includes before *ANY* code was written to
the file.
	   The existing class function for variable declarations (declarCVarsIn)
wouldn't work,
	   as it puts it's stuff after the main squeak includes.  We'll put up the
change set
	   for this once we untangle it from the Fuhrer modifications.


Here are our questions of the moment-

	1)  Has anyone got notes/doco/whatever on what's current and what's not in
	    the Interpreter and plugin support classes for passing and obtaining
selectors ?
	    I see code that is implemented using selectors to call plugin and other
compiled
	    code, such as the FFI interfaces, but when I have a compiled plugin
method that
	    needs to call another compiled method, I keep having to directly
manipulate the
	    stack with push params, call function, pop params, I can't seem to get
selector
	    calls to work reliably.

	2) When we call translateDoInlining on our plugin, every once in a while we
get an
	   error claiming 'undef objects are not indexable' (duh) at the end of the
codegen
	   process (after file is created).  Rerunning the translate does not
reproduce the
	   error, it just works on the second call.  The problem itself seems
intermittent.

	3) When compiling a number of our primitives, we have variables that have
to be defined
	  cause we are using them to move values to/from squeak.  If we're
accomplishing this
	  primarily through use of clever cCode: calls, the method compiler keeps
asking us
	  whether we want to delete these unused variables, or do we know we're
referring to an
	  uninitialized variable.  We could really use a method to be able to
explain to the code
	  generator that it shouldn't worry because it's clueless, we aren't.

	4)  The getModuleName function, automagically generated by the translator
does not
	   compile in C++, because the moduleName is defined as an int, and the
function is
	   defined as returning a string.  A *much* better way of defining the
function
	   would seem to be -

const char* getModuleName() {
	return "myModuleName";
}
	Yes? No ?  Whatever, right now the translator can't produce compilable C++
code if
	it generates the function as returning the moduleName int,
	cause C++ is fussy about things like that.

	5) and last and least ====  for those using squeak on mswindows,
	   is there an easy way to remap the ALT and CTL keys so I'm not
	   constantly ALT copying in word and control copying in squeak.






-------------- next part --------------
A non-text attachment was scrubbed...
Name: v3integration.1.cs
Type: application/octet-stream
Size: 18498 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20011107/36f0ce05/v3integration.1.obj


More information about the Squeak-dev mailing list