[Squeak-e] Programming the VM
Mark S. Miller
markm at caplet.com
Sun Feb 2 22:54:01 CET 2003
Even though I normally no longer cross-posting my Squeak-E posts to
e-lang, I am cross-posting this one since Colin's question caused me to
finally figure out and write down something that's been kicking around in my
head for a while, and that's just as important for E as it is for Squeak-E.
At 05:09 PM 2/2/2003 Sunday, Colin Putney wrote:
>[...] The goal is for the user
>(ok, a technically sophisticated user) to be able to understand and modify
>any aspect of the system.
>For me this is one of the core virtues of Squeak.
First the bad news: In order to be secure, Squeak-E cannot achieve the level
of self-malleability that Smalltalk has always enjoyed. I am among those who
have enjoyed it, and I agree that it's one of Smalltalk's core virtues. It
will sadden me as well to sacrifice some of this.
Good news: I think we can get closer than one might think.
More bad news: To be a plausible successor to Squeak, Squeak-E must do
better at this than E has done so far. This will involve some new
exploration and research. Fortunately, since Squeak-E is built on Squeak
rather than Java, Squeak-E may have an easier time of this than E. (Notice
how glibly I blame others for my shortcomings as a language designer ;)?)
More good news: The answer to many of these issues lies in another of
Smalltalk's founding premises:
"Make the computer recursive"
One of the ways Alan conceived of the object paradigm is to make the
computer into sort-of a little network of little computers. Little computers
are naturally encapsulated combinations of code and data interacting with
each other by sending messages.
Using Alan's imagery, and with thanks to Norm Hardy and E-Dean Tribble,
here's an explanation of a framework I call "refraction".
1) No machine has reflective access to other machines on the network, or to
the network as a whole.
2) A physical machine does have full reflective access to the virtual
machines it hosts (implements, simulates, evals), and to the virtual
network among them.
3) This recursive-making of networks of objects can proceed to deeper levels
4) When an object, Alice, at virtualization layer N creates a virtual
network of virtual objects at virtualization layer N+1, she can hook up the
edges of the virtual network she hosts to the network she find herself in,
causing the messages they carry to cross the levels transparently.
(Refraction slogan: "Reify eval. Absorb apply.")
5) #4 implies that an object, Bob, at virtualization layer N+1, when
speaking on the network he finds himself in, doesn't know or care at what
virtualization layer are the objects he's speaking to. He can treat them as
if they were at his level.
6) In the simple case where all virtualizers do the hookup described by
Alice in #4, and do nothing else, then virtualizers can be considered the
nodes of a virtualization tree. All the objects in the network they host
are their children in this tree. All the non-virtualizers, at whatever
level, are then leaves of this tree.
7) Given #5 and #6, for many purposes we can ignore the virtualizers and the
virtualization levels. The leaves of the virtualization tree are the
individual fine-grained objects hooked together in an overall distributed
(level-crossing) reference graph. So long as the virtualizers don't use
their special reflective powers over the subgraph they host, the leaves and
the graph among them is the entire story.
8) A virtualizer potentially has full control over its children, and
therefore over all its descendants in the virtualization tree. Therefore,
the virtualization tree represents a hierarchy of ownership over subgraphs
of the #7 graph.
9) A virtualizer that only desires certain kinds of reflective controls over
its children -- those that its own virtual machine offers to help with --
may obtain these reflective controls without paying the cost of an
addition layer of interpretation. Often, without paying hardly any cost
at all. This parallels the logic that makes IBM VM efficient.
Debugging is the most clearly compelling case where support from one's
own virtual machine is called for.
10) Another way to avoid the costs of an additional level of interpretation
is by source-to-source transformation. This is more expensive than #9 but
11) We can mostly stay within this model when dealing with the actual
distributed reference graph among objects distributed among actual machines
on actual networks, since that's the model we're using anyway.
12) I desperately need to draw some pictures.
An example: Smalltalk has a primitive for enumerating all objects in object
memory. This obviously needs to be impossible among objects within a level
of Squeak-E, or all security is lost. However, if Victor the virtualizer
wishes to keep track of all objects allocated in the subgraph he hosts, he
can use technique #10 to rewrite all primitive object allocations in code
he loads to place these objects on a private list of his. Even though the
rewritten objects are accessing this list, the objects don't "think" they
have any ability to access this list, since there's nothing they can say in
their pre-transformed source code that will give them this access. It's just
an internal part of their implementation.
(If Victor wants to keep track only of the non-garbage objects he hosts,
then he simply need use a weak collection.)
Debugging is the obvious example, and does fit this story in a way
compatible with capability security. (More work is needed (the KeyKOS
branding mechanism) to make debugging compatible with capability
confinement, but it still mostly fits with this story.)
I should say again, none of the above is currently implemented in E.
>When programming in Squeak
>you find that all of the mechanisms of the VM are reified and available for
>inspection and manipulation. Avi Bryant compared this to the level of
>control you have in C or even assembly. Squeak gives you the same level of
>control, albeit on a virtualized machine.
"all" is a lot. Is there a list somewhere? With the above story, we would
now have a choice of whether to reify at the same level, as now, or whether
to reify only to one's virtualizer. When there's a security problem with the
first, we can often do the second.
>This level of control isn't just a nice philosophy either, it lets you do
>things that are impractical or impossible in other languages. Avi's Seaside
>frame work for web applications is based on an implementation of
>continuations that Avi was able to write because he had access to the
What can I read about Seaside? Could you summarize the salient points?
Open access to one's own frame is fine. Open access to one's caller's frame
would kill security, except for access according to the above ownership
hierarchy. A debugger must be prepared to encounter frames it cannot open.
>Nathanael Schärli was able to implement his Traits model
>for object composition because he could dynamically manipulate method
>Smalltalk traditionally doesn't have dynamic scoping, but
>Stephen Pair could implement it in RuntimeEnvironments.
>Ok, now let's talk about security. I'll be the first to agree that
>"security" is a good thing. I would love to see Squeaklets being tossed
>around the net. I'd love to be able to consider all code I didn't write
>myself "untrusted" and know that it won't have any capabilities I didn't
>My question to Mark, Rob and the other Squeak-E enthusiasts then is, we can
>reconcile these two virtues?
Some immediately. Much more eventually, but with a lot of work and some
research as vaguely sketched above.
>To this point the discussion has been centered
>around what semantics are desirable for Squeak-E. Personally, I like
>Squeak's existing semantics, but I don't insist on them. I do think it's
>vital, however, that whatever semantics the Squeak-E VM ends up having, the
>mechanisms by which it provides them be reified and available for
>manipulation. That way, people like Avi and Nathanael can continue to work
It really would be good to have a list of these for current Squeak. Then we
can try to work through them and see what stories we find plausible.
Text by me above is hereby placed in the public domain
More information about the Squeak-e