[Squeak-e] Programming the VM

Sun Feb 2 22:54:01 CET 2003

Even though I normally no longer cross-posting my Squeak-E posts to 
e-lang, I am cross-posting this one since Colin's question caused me to 
finally figure out and write down something that's been kicking around in my 
head for a while, and that's just as important for E as it is for Squeak-E.
Thanks, Colin.

At 05:09 PM 2/2/2003 Sunday, Colin Putney wrote:
>[...] The goal is for the user 
>(ok, a technically sophisticated user) to be able to understand and modify 
>any aspect of the system.
>
>For me this is one of the core virtues of Squeak. 

First the bad news: In order to be secure, Squeak-E cannot achieve the level 
of self-malleability that Smalltalk has always enjoyed. I am among those who 
have enjoyed it, and I agree that it's one of Smalltalk's core virtues. It 
will sadden me as well to sacrifice some of this.

Good news: I think we can get closer than one might think.

More bad news: To be a plausible successor to Squeak, Squeak-E must do 
better at this than E has done so far. This will involve some new 
exploration and research. Fortunately, since Squeak-E is built on Squeak 
rather than Java, Squeak-E may have an easier time of this than E. (Notice 
how glibly I blame others for my shortcomings as a language designer ;)?)

More good news: The answer to many of these issues lies in another of 
Smalltalk's founding premises:

           "Make the computer recursive" 
                                --Alan Kay.

One of the ways Alan conceived of the object paradigm is to make the 
computer into sort-of a little network of little computers. Little computers 
are naturally encapsulated combinations of code and data interacting with 
each other by sending messages.

Using Alan's imagery, and with thanks to Norm Hardy and E-Dean Tribble, 
here's an explanation of a framework I call "refraction".

1) No machine has reflective access to other machines on the network, or to 
   the network as a whole. 

2) A physical machine does have full reflective access to the virtual 
   machines it hosts (implements, simulates, evals), and to the virtual 
   network among them.

3) This recursive-making of networks of objects can proceed to deeper levels 
   of nesting. 

4) When an object, Alice, at virtualization layer N creates a virtual 
   network of virtual objects at virtualization layer N+1, she can hook up the 
   edges of the virtual network she hosts to the network she find herself in, 
   causing the messages they carry to cross the levels transparently. 
   (Refraction slogan: "Reify eval. Absorb apply.")

5) #4 implies that an object, Bob, at virtualization layer N+1, when 
   speaking on the network he finds himself in, doesn't know or care at what 
   virtualization layer are the objects he's speaking to. He can treat them as 
   if they were at his level.

6) In the simple case where all virtualizers do the hookup described by 
   Alice in #4, and do nothing else, then virtualizers can be considered the 
   nodes of a virtualization tree. All the objects in the network they host 
   are their children in this tree. All the non-virtualizers, at whatever 
   level, are then leaves of this tree.

7) Given #5 and #6, for many purposes we can ignore the virtualizers and the 
   virtualization levels. The leaves of the virtualization tree are the 
   individual fine-grained objects hooked together in an overall distributed 
   (level-crossing) reference graph. So long as the virtualizers don't use 
   their special reflective powers over the subgraph they host, the leaves and 
   the graph among them is the entire story.

8) A virtualizer potentially has full control over its children, and 
   therefore over all its descendants in the virtualization tree. Therefore, 
   the virtualization tree represents a hierarchy of ownership over subgraphs 
   of the #7 graph.

9) A virtualizer that only desires certain kinds of reflective controls over 
   its children -- those that its own virtual machine offers to help with -- 
   may obtain these reflective controls without paying the cost of an 
   addition layer of interpretation. Often, without paying hardly any cost 
   at all. This parallels the logic that makes IBM VM efficient.
   Debugging is the most clearly compelling case where support from one's 
   own virtual machine is called for.

10) Another way to avoid the costs of an additional level of interpretation 
   is by source-to-source transformation. This is more expensive than #9 but 
   more flexible.

11) We can mostly stay within this model when dealing with the actual 
   distributed reference graph among objects distributed among actual machines 
   on actual networks, since that's the model we're using anyway.

12) I desperately need to draw some pictures.

An example: Smalltalk has a primitive for enumerating all objects in object 
memory. This obviously needs to be impossible among objects within a level 
of Squeak-E, or all security is lost. However, if Victor the virtualizer 
wishes to keep track of all objects allocated in the subgraph he hosts, he 
can use technique #10 to rewrite all primitive object allocations in code 
he loads to place these objects on a private list of his. Even though the 
rewritten objects are accessing this list, the objects don't "think" they 
have any ability to access this list, since there's nothing they can say in 
their pre-transformed source code that will give them this access. It's just 
an internal part of their implementation.

(If Victor wants to keep track only of the non-garbage objects he hosts, 
then he simply need use a weak collection.)

Debugging is the obvious example, and does fit this story in a way 
compatible with capability security. (More work is needed (the KeyKOS 
branding mechanism) to make debugging compatible with capability 
confinement, but it still mostly fits with this story.)

I should say again, none of the above is currently implemented in E.

>When programming in Squeak 
>you find that all of the mechanisms of the VM are reified and available for 
>inspection and manipulation. Avi Bryant compared this to the level of 
>control you have in C or even assembly. Squeak gives you the same level of 
>control, albeit on a virtualized machine.

"all" is a lot. Is there a list somewhere? With the above story, we would 
now have a choice of whether to reify at the same level, as now, or whether 
to reify only to one's virtualizer. When there's a security problem with the 
first, we can often do the second.

>This level of control isn't just a nice philosophy either, it lets you do 
>things that are impractical or impossible in other languages. Avi's Seaside 
>frame work for web applications is based on an implementation of 
>continuations that Avi was able to write because he had access to the 
>activation stack.

What can I read about Seaside? Could you summarize the salient points?

Open access to one's own frame is fine. Open access to one's caller's frame 
would kill security, except for access according to the above ownership 
hierarchy. A debugger must be prepared to encounter frames it cannot open.

>Nathanael Schärli was able to implement his Traits model 
>for object composition because he  could dynamically manipulate method 
>dictionaries.

Traits? Summary?

>Smalltalk traditionally doesn't have dynamic scoping, but 
>Stephen Pair could implement it in RuntimeEnvironments.

RuntimeEnvironments?

>Ok, now let's talk about security. I'll be the first to agree that 
>"security" is a good thing. I would love to see Squeaklets being tossed 
>around the net. I'd love to be able to consider all code I didn't write 
>myself "untrusted" and know that it won't have any capabilities I didn't 
>grant it.
>
>My question to Mark, Rob and the other Squeak-E enthusiasts then is, we can 
>reconcile these two virtues?

Some immediately. Much more eventually, but with a lot of work and some 
research as vaguely sketched above.

>To this point the discussion has been centered 
>around what semantics are desirable for Squeak-E. Personally, I like 
>Squeak's existing semantics, but I don't insist on them. I do think it's 
>vital, however, that whatever semantics the Squeak-E VM ends up having, the 
>mechanisms by which it provides them be reified and available for 
>manipulation. That way, people like Avi and Nathanael can continue to work 
>their magic.

It really would be good to have a list of these for current Squeak. Then we 
can try to work through them and see what stories we find plausible.

----------------------------------------
Text by me above is hereby placed in the public domain

        Cheers,
        --MarkM