Squeaking abstractly (was Re: Embedded Squeak 1.0 released for Squeak 2.2)

Thu Dec 24 16:27:29 UTC 1998

Steve Dekorte wrote:
> 
> Paul Fernhout <pdfernhout at kurtz-fernhout.com> wrote:
> > A full port [of Squeak] under Newton OS is just at the edge of technical feasibility
> > for the MP2100 or expanded eMate. The single biggest issue with the
> > Newton port is lack of good C++ debugging support in Apple's tools.
> 
> Have you considered using the NewtonScript VM?

Steve -

This is a good idea, and was discussed briefly on the list a very long
time back.

This message is in two parts, one about the concrete issue you raise
(using the NewtonScript VM to implement Squeak) and one about
abstraction (doing Squeak VM generation / porting in a general way,
which entails adding better support for abstraction to Squeak).

There are at least two possibilities - using a Squeak VM written
NewtonScript and using the NewtonScript VM directly as the Squeak VM.

A Squeak VM in NewtonScript would probably run 30 - 200X slower than in
C (probably making it unusable on current hardware). However, this
might be good enough or some tasks - like experimentation or learning
the Smalltalk syntax, especially if the GUI used native widgets. I think
had I taken this approach at the start as a first cut (modifying the VM
C code translator to spit our NewtonScript) at least I might have had
something working right away, even if it was very slow. I took this
approach with NewtonForth and had immediate gratification -- even though
it is not as fast a Forth as a native one written in ARM assembler.
There is another issue that all but the MP2100 and expanded eMate have
only at most 180K of space for NewtonScript objects for all running
programs, which means the image size would still have to be radically
trimmed for such machines (C heap is a little larger - around 350K on
earlier machines).

A Squeak VM built on the NewtonScript VM with a compiler that translated
to NewtonScript VM bytecodes would run faster, however then to take
advantage of the NewtonScript VM, all Squeak objects would have to be
NewtonScript objects (using the NewtonScript garbage collector, etc.).
Also the NewtonScript VM does not support all the byte code operations
the Squeak VM does (and some operations like inheritance are done
differently), so these would have to be emulated primitives or native C.
Also, fundamentally the prototype frames approach take by NewtonScript
is slower than Squeak's class based one (in terms of method dispatch and
instance variable access). Still, this could be made to work (given
Apple's releasing of the VM specs about 18 months ago).

There is a tension here. The current Squeak approach is to make Squeak 
*portable* by porting a common abstraction (the VM). Another approach is
to make Squeak *retargetable*, by having higher level representations of
Squeak and tools to work with those representations. Metacompiled
Forths, like eForth, are more in the class of being retargetable at the
lowest level (although such Forth systems do in practice also typically
port a common abstraction at a higher level on top of the retargeted
base).

Up until now, it seems all this sort of initial targeting work is done
by hand (in effect: "gee, I think it can be done", "let's write a custom
translator", work, work, work, "here it is"). That's great for a first
step, but the next step it to make tools and representations to support
this targeting effort (for retargeting).

I like Alan Kay's suggestion at OOPSLA '97 of moving Squeak/Smalltalk
development to a higher plain (the metaobject protocol). Having the VM
specification written in Smalltalk is a step towards representing Squeak
more abstractly, but perhaps not enough in terms of documenting assumed
information, implementer folklore, and possible variants. 

If the Squeak VM and base image were represented more abstractly and
modularly (as a semantic net perhaps?), and tools existed to represent
other systems (like the NewtonScript VM or JavaVM) in the same
abstraction space, we could think about developing tools to help one
target some subset (or superset) of Squeak functionality to a
fundamentally different abstraction (such as another language than C,
another VM than the bluebook, native machine code, parallel processing
tuplespaces, or as a independent component in a larger system with
different interoperabilty event handling needs like an ActiveX control). 

I see working at a higher level of abstraction as another twist on what
it would take to have a general purpose native code generation tool
suite for Squeak (see an earlier thread) -- ideally one would want to be
able to represent real hardware like the 80X86 (just another VM really)
in some abstraction space (concretely consisting of a simulator,
inspection tools, and semantic-net knowledge base) and then retarget the
Squeak VM bytecodes, primitives, and Compiler to work in that
abstraction space. 

Work by the Squeak team done to create the first Squeak VM and image
under another Smalltalk was in this area, although in it informally.
Andrew Brault's Pocket Smalltalk for the Pilot is a great example of a
system moving in this direction, by using one Smalltalk image (Dolphin)
to generate a very different Smalltalk image (Pocket / embedded Pilot).
I suspect his VM is hand coded in C though; it could move a step up in
abstraction by being written in Smalltalk and using the Squeak C
translator. Putting these two approaches together will increase our
ability to retarget Squeak (such as for memory or I/O constrained
systems, or in other fundamentally different directions).

I think Smalltalk is one of the best general purpose languages designed
for representing abstractions via building complex object sets with
Smalltalk code (such as is typically done by window builder code), so it
is quite feasible to do all this abstraction definition in Smalltalk,
with the abstractions coded as Smalltalk snippets that build the
abstractions.

However, for the past two decades or so I have worked on and off toward
another programming system (called "Pointrel", for POINTers and
RELationships) with a current prototype implementation in Python) that
lets me begin to build tuple space / semantic net like systems using a
simpler syntax without as much punctuation (although admittedly more
text, so perhaps Smalltalk is still better). I have earlier
implementations in various states of doneness of this in Smalltalk,
Forth, Lisp, Delphi, Basic, and C. This representation work is similar
to the ROSE/STAR system described in William Kent's book "Data and
Reality" http://home.earthlink.net/~billkent/ 

In brief, the Pointrel syntax supports defining relationships within
semantic subspaces by code such as:

New label John
New label book
New label Mary
New label "gave to"
John gave to Mary book
New label Fred
New label "the trumpet"
Fred gave to John the trumpet
Mary gave to Fred the box
New label "display on"
New label "the screen"
New label "last relation"
the box class "!Smalltalk at: #Box."
the screen "maps to object" "!Display."
display on "implemented by" "!self displayOn: arg1."
The box display on the screen
do last relation

The idea is to avoid smashedTogether words, by always looking up the
longest previously defined phrase. Explicit quoting overrides this
default behavior.

The lines "New label XYZ" can be omitted as implicit, to produce:

John "gave to" Mary book
Fred gave to John "the trumpet"
John "gave back" Fred the trumpet
Mary gave to Fred "the box"
the box class "!Smalltalk at: #Box."
"the screen" "maps to object" "!Display."
"display on" "implemented by" "!self displayOn: arg1."
The box display on the screen
do "last relation"

Of course, many text strings like "display on" or "last relation" would
probably have been previously defined and would not need quoting the
first time used in this semantic subspace.

After definition, these relationships of text would be then processed
further to create more abstract relationships. For example the concrete
"New label John" is processed into a new object with a label
relationship connecting it to the label "John", and later references to
"John" are indirected to this new object.  All these relationships
(tuples, really, and implemented as such in Python) would be tagged by
other relationships associating them with specific input streams, times
of day, subspaces they are in, and so on, creating a rich sea of
relationships for later analysis. Each iteration of processing can be
seen as "binding" the resulting abstraction into yet another
abstraction.

To apply this to Squeak VM generation:

"Squeak VM" "has byte code" "return true"
Squeak VM has byte code "duplicate top"
Squeak VM has byte code "extended store"
return true "in Smalltalk" "| cntx val | cntx := self sender. val :=
trueObj. self returnValue: val to: cntx."
duplicate top in Smalltalk "self internalPush: self internalStackTop."
duplicate top "in NewtonScript" "vm.internalPush(vm.internalStackTop())"
return true in NewtonScript "vm.returnValueTo(vm.trueObj, vm.sender())"

You can then have programs that use this VM knowledge base, and reason
about it to generate new ports and to assist VM implementors in
maintaining ports. Of course, it would be nice to use a Smalltalk like
browser to browse and maintain this knowledge base.

At this point, I'm beginning to see Pointrel as a concept that
lives on top of a host programming language (whatever the combination of
syntaxes used). However, there are many ways to do this, and many people
who have worked in this field, and Bill Kent's work (predating
Gelernter's) is still for me the definitive work, just like Smalltalk
(developed around the same time) was and still is the definitive
programming system. ;-) 

It is very hard to improve on the Smalltalk syntax as a general purpose
programming language, so I think ultimately a fusion of ideas like
Kent's on relationships and semantic nets with Squeak (along with an
ability to support alternate syntaxes as needed, like described in my
earlier post to the list of "Squeak and the Babel of programming
languages 12/28/96") may prove to be a phenomenal system supporting work
at many levels of abstraction. Knowledge representation techniques such
as those looked into by Pointrel or ROSE/STAR or other AI KR work are
essential to explore if Squeak is going to move to a higher level of
abstraction, and from there, everywhere.

-Paul Fernhout
Kurtz-Fernhout Software 
=========================================================
Developers of custom software and educational simulations
Creators of the GPL Garden with Insight(TM) garden simulator
http://www.kurtz-fernhout.com/squeak