At 11:16 AM 1/31/2003 Friday, Anthony Hannan wrote:
Yes, exception handling needs dynamic scope, doesn't it? Do you propose handling exceptions differently?
Exception handing was an aspect of PPS2.5 that I never did learn real well. Where is Squeak's exception handling documented? As I recall, PPS2.5 had both terminating and resumable exception handlers. Let's take these separately.
Terminating Exceptions
E uses only terminating exceptions, on the same model (for present purposes) as C++ and Java. Since I don't remember PPS2.5's terminating exceptions being any different from this, I'll just assume they're the same until I hear otherwise. I will use Java as the reference for this model, as it's probably the most mutually well known.
Java try-catch and try-finally blocks do have some semantics in common with dynamic scoping. They push and pop on the stack according to nested dynamic extents, just as dynamic variable bindings do. When one is needed, the dynamically closest applicable one is looked up, presumably by looking back on the stack, corresponding to a deeply-bound implementation of dynamic scoping.
However, I claim this isn't dynamic scoping. The "throw" is not directly invoking the corresponding "catch". Rather, the stack is being unwound and "finally {..}" clauses are getting run on the way out. If one of these finally clauses itself throws, we proceed to unwind with the new Exception *instead of* the old one. These effects fit poorly into the dynamic scoping model.
Instead, what fits well is a simple extension of the continuation passing model of call-return computation (CPS). When writing in a call-return language, there's always one unstated additional parameter on all calls -- the continuation -- representing the rest of the computation the caller will perform once the callee returns. In denotational semantics, Actors, or Scheme, where CPS originated, the continuation was simply a function of one argument, where this argument was the value to be returned.
(It's a separate matter as to whether the language allows the continuation to be reified, or treats it only as an explanatory device. Smalltalk, Scheme, and Actors do the first. C++, Java, and E do the second. For various reasons http://www.eros-os.org/pipermail/e-lang/2001-July/005418.html I recommend the second, but we can leave this argument to another time.)
As explained in section ii of http://erights.org/elib/concurrency/msg-passing.html , to account for terminating exceptions, we model the implicitly passed continuation as an object with two methods
resolve: result
and
smash: exception
A throw just calls the smash method of its continuation. The peculiar behavior of a try-finally is just the peculiar behavior of the continuation it creates.
(Note: to model E we need three methods in the continuation, but I think we can ignore that for present purposes.)
So, clever implementations aside, the computational model is explained purely in terms of local message sending without any magic reaching up the stack (deep binding) or stateful variables being magically shared (shallow binding). I claim it's not dynamic scoping at all.
Resumable Exceptions
Although I may have programmed in languages that happened to support resumable exceptions, I myself have never used them in any language. So if this is where the "Are Handlers Dynamically Scoped?" issue is, I'll wait until someone explains their semantics, or points me at documentation. If these are indeed dynamically scoped, then we need to ask whether they are a good idea.
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
At 12:28 PM 2/2/2003 Sunday, Allen Wirfs-Brock wrote:
The primary technical difference between Smalltalk exceptions and Java/C++ style exceptions is the sequencing of unwinding the stack and execution of the "handler block". As your know, in the Java model, the stack is unwound and then the handler block is executed. In the Smalltalk model the handler is executed before the stack is unwound. This allows resumable handles to be trivially implemented.
Does it have any other virtue? If we didn't have resumable handlers, would there be any remaining reason to prefer this? Note that the issue isn't when they're executed, but when they're looked up. Of course, they can't be executed until they're looked up.
I'm relatively confident that your CPS model could be extended to accommodate the Smalltalk exception model although it might be some work to do so.
The only way I can imagine reveals that this semantics is indeed a case of dynamic scoping. The way I imagine:
A continuation could have an additional method available, "getHandler: exceptionTypeOrSomething" that either already knows a handler for that exceptionTypeOrSomething, or asks its continuation. This non-destructive looking up the stack is simply a deeply bound implementation model of dynamic scoping. (Alternatively, the continuation could instead have a non-destructive "handle: exception" method which gets delegated back, but it amounts to the same thing.)
The key thing about the Java alternative, unwinding to the handler, is that the continuation is only ever invoked destructively, and is otherwise opaque. So by the time the handler is invoked, it's a handler associated with the immediate continuation, and not one retrieved from further back on the stack.
So even without resumption, if earlier handlers get invoked before later unwind blocks, then I'd agree with Anthony that Smalltalk's exceptions are an instance of dynamic scoping. This isn't to say that it's a bad idea. But it does leave us with the following hypotheses:
1) Resumable handlers are bad.
2) Dynamically scoped resumable handlers (as in Smalltalk) are good, leaving us with at least one case where dynamic scoping is a good idea. If it's a good idea here, there are probably other cases as well.
3) Resumable handlers are good, but a resumable handler shouldn't be looked up by dynamic scoping. (Note: Joule has lexical resumable handlers called "Keepers".)
4) Terminating handler lookup should happen during unwind, like Java.
5) Terminating handler lookup should happen prior to unwind, as in Smalltalk, making this handler lookup arguable another case of dynamic scoping. (In order to make this point separate from #2, let's say "should" even in the absence of the need to support #2.)
6) Terminating handler lookup should happen prior to unwind, but not by looking up the stack. (Presumably, the alternative would be lexical. I know of no systems that do this.)
Smalltalk: #2, #5. Java & E: #1, #4. Joule: #3. (Joule has no stack, and so can't have any conventional notion of termination.)
Does this seem like a useful framework for exploring the issue? What are some arguments for #2 or #5? I'm prepared to argue for #1, #3, and #4.
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
Hi all,
This is a fascinating discussion, and I've been following it with interest. But there's one aspect of the whole Squeak-E idea that's been bothering me. Shane Roberts brought it up briefly, but didn't seem to get a response, so I'll expand on it a bit.
In many ways, Smalltalk and Squeak in particular are the notion of the "personal computer" carried to its logical extreme. The goal is for the user (ok, a technically sophisticated user) to be able to understand and modify any aspect of the system.
For me this is one of the core virtues of Squeak. When programming in Squeak you find that all of the mechanisms of the VM are reified and available for inspection and manipulation. Avi Bryant compared this to the level of control you have in C or even assembly. Squeak gives you the same level of control, albeit on a virtualized machine.
This level of control isn't just a nice philosophy either, it lets you do things that are impractical or impossible in other languages. Avi's Seaside frame work for web applications is based on an implementation of continuations that Avi was able to write because he had access to the activation stack. Nathanael Schärli was able to implement his Traits model for object composition because he could dynamically manipulate method dictionaries. Smalltalk traditionally doesn't have dynamic scoping, but Stephen Pair could implement it in RuntimeEnvironments.
Ok, now let's talk about security. I'll be the first to agree that "security" is a good thing. I would love to see Squeaklets being tossed around the net. I'd love to be able to consider all code I didn't write myself "untrusted" and know that it won't have any capabilities I didn't grant it.
My question to Mark, Rob and the other Squeak-E enthusiasts then is, we can reconcile these two virtues? To this point the discussion has been centered around what semantics are desirable for Squeak-E. Personally, I like Squeak's existing semantics, but I don't insist on them. I do think it's vital, however, that whatever semantics the Squeak-E VM ends up having, the mechanisms by which it provides them be reified and available for manipulation. That way, people like Avi and Nathanael can continue to work their magic.
Cheers,
Colin
Even though I normally no longer cross-posting my Squeak-E posts to e-lang, I am cross-posting this one since Colin's question caused me to finally figure out and write down something that's been kicking around in my head for a while, and that's just as important for E as it is for Squeak-E. Thanks, Colin.
At 05:09 PM 2/2/2003 Sunday, Colin Putney wrote:
[...] The goal is for the user (ok, a technically sophisticated user) to be able to understand and modify any aspect of the system.
For me this is one of the core virtues of Squeak.
First the bad news: In order to be secure, Squeak-E cannot achieve the level of self-malleability that Smalltalk has always enjoyed. I am among those who have enjoyed it, and I agree that it's one of Smalltalk's core virtues. It will sadden me as well to sacrifice some of this.
Good news: I think we can get closer than one might think.
More bad news: To be a plausible successor to Squeak, Squeak-E must do better at this than E has done so far. This will involve some new exploration and research. Fortunately, since Squeak-E is built on Squeak rather than Java, Squeak-E may have an easier time of this than E. (Notice how glibly I blame others for my shortcomings as a language designer ;)?)
More good news: The answer to many of these issues lies in another of Smalltalk's founding premises:
"Make the computer recursive" --Alan Kay.
One of the ways Alan conceived of the object paradigm is to make the computer into sort-of a little network of little computers. Little computers are naturally encapsulated combinations of code and data interacting with each other by sending messages.
Using Alan's imagery, and with thanks to Norm Hardy and E-Dean Tribble, here's an explanation of a framework I call "refraction".
1) No machine has reflective access to other machines on the network, or to the network as a whole.
2) A physical machine does have full reflective access to the virtual machines it hosts (implements, simulates, evals), and to the virtual network among them.
3) This recursive-making of networks of objects can proceed to deeper levels of nesting.
4) When an object, Alice, at virtualization layer N creates a virtual network of virtual objects at virtualization layer N+1, she can hook up the edges of the virtual network she hosts to the network she find herself in, causing the messages they carry to cross the levels transparently. (Refraction slogan: "Reify eval. Absorb apply.")
5) #4 implies that an object, Bob, at virtualization layer N+1, when speaking on the network he finds himself in, doesn't know or care at what virtualization layer are the objects he's speaking to. He can treat them as if they were at his level.
6) In the simple case where all virtualizers do the hookup described by Alice in #4, and do nothing else, then virtualizers can be considered the nodes of a virtualization tree. All the objects in the network they host are their children in this tree. All the non-virtualizers, at whatever level, are then leaves of this tree.
7) Given #5 and #6, for many purposes we can ignore the virtualizers and the virtualization levels. The leaves of the virtualization tree are the individual fine-grained objects hooked together in an overall distributed (level-crossing) reference graph. So long as the virtualizers don't use their special reflective powers over the subgraph they host, the leaves and the graph among them is the entire story.
8) A virtualizer potentially has full control over its children, and therefore over all its descendants in the virtualization tree. Therefore, the virtualization tree represents a hierarchy of ownership over subgraphs of the #7 graph.
9) A virtualizer that only desires certain kinds of reflective controls over its children -- those that its own virtual machine offers to help with -- may obtain these reflective controls without paying the cost of an addition layer of interpretation. Often, without paying hardly any cost at all. This parallels the logic that makes IBM VM efficient. Debugging is the most clearly compelling case where support from one's own virtual machine is called for.
10) Another way to avoid the costs of an additional level of interpretation is by source-to-source transformation. This is more expensive than #9 but more flexible.
11) We can mostly stay within this model when dealing with the actual distributed reference graph among objects distributed among actual machines on actual networks, since that's the model we're using anyway.
12) I desperately need to draw some pictures.
An example: Smalltalk has a primitive for enumerating all objects in object memory. This obviously needs to be impossible among objects within a level of Squeak-E, or all security is lost. However, if Victor the virtualizer wishes to keep track of all objects allocated in the subgraph he hosts, he can use technique #10 to rewrite all primitive object allocations in code he loads to place these objects on a private list of his. Even though the rewritten objects are accessing this list, the objects don't "think" they have any ability to access this list, since there's nothing they can say in their pre-transformed source code that will give them this access. It's just an internal part of their implementation.
(If Victor wants to keep track only of the non-garbage objects he hosts, then he simply need use a weak collection.)
Debugging is the obvious example, and does fit this story in a way compatible with capability security. (More work is needed (the KeyKOS branding mechanism) to make debugging compatible with capability confinement, but it still mostly fits with this story.)
I should say again, none of the above is currently implemented in E.
When programming in Squeak you find that all of the mechanisms of the VM are reified and available for inspection and manipulation. Avi Bryant compared this to the level of control you have in C or even assembly. Squeak gives you the same level of control, albeit on a virtualized machine.
"all" is a lot. Is there a list somewhere? With the above story, we would now have a choice of whether to reify at the same level, as now, or whether to reify only to one's virtualizer. When there's a security problem with the first, we can often do the second.
This level of control isn't just a nice philosophy either, it lets you do things that are impractical or impossible in other languages. Avi's Seaside frame work for web applications is based on an implementation of continuations that Avi was able to write because he had access to the activation stack.
What can I read about Seaside? Could you summarize the salient points?
Open access to one's own frame is fine. Open access to one's caller's frame would kill security, except for access according to the above ownership hierarchy. A debugger must be prepared to encounter frames it cannot open.
Nathanael Schärli was able to implement his Traits model for object composition because he could dynamically manipulate method dictionaries.
Traits? Summary?
Smalltalk traditionally doesn't have dynamic scoping, but Stephen Pair could implement it in RuntimeEnvironments.
RuntimeEnvironments?
Ok, now let's talk about security. I'll be the first to agree that "security" is a good thing. I would love to see Squeaklets being tossed around the net. I'd love to be able to consider all code I didn't write myself "untrusted" and know that it won't have any capabilities I didn't grant it.
My question to Mark, Rob and the other Squeak-E enthusiasts then is, we can reconcile these two virtues?
Some immediately. Much more eventually, but with a lot of work and some research as vaguely sketched above.
To this point the discussion has been centered around what semantics are desirable for Squeak-E. Personally, I like Squeak's existing semantics, but I don't insist on them. I do think it's vital, however, that whatever semantics the Squeak-E VM ends up having, the mechanisms by which it provides them be reified and available for manipulation. That way, people like Avi and Nathanael can continue to work their magic.
It really would be good to have a list of these for current Squeak. Then we can try to work through them and see what stories we find plausible.
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
At 10:54 PM 2/2/2003 Sunday, Mark S. Miller wrote:
Using Alan's imagery, and with thanks to Norm Hardy and E-Dean Tribble, here's an explanation of a framework I call "refraction".
Oops, I forgot to mention the biggest credit-where-due on this: Udi Shapiro and the use of nested virtual meta-reflective interpreters in a network in Flat Concurrent Prolog. He did indeed reify eval and absorb apply (#4) and he did indeed use source-to-source transformation to simulate meta-interpretation cheaply (#10).
As I read over my description, I think it's all there in Udi's work (except for #9, which is obvious anyway). My message is only new imagery for explaining an old idea. I'm embarrassed. Well, at least the idea I almost stole is a good one. ;)
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
One of the ways Alan conceived of the object paradigm is to make the computer into sort-of a little network of little computers. Little computers are naturally encapsulated combinations of code and data interacting with each other by sending messages.
Is this looking at Islands, basically, but then nested? Again the $1000 dollar question arises: by what scope? It seems here that you are talking about machines, therefore about execution environment, therefore about some form of dynamic scopery.
What can I read about Seaside? Could you summarize the salient points?
www.beta4.com/seaside. Basically it keeps the continuation of a web request around to act on it the next time round. It also puts magic cookies in web requests so that it can find the correct continuation if you push 'back' twice, change the form data, and resubmit. Would work like a charm in this design, only needs its 'children's stacks' to play with.
Traits? Summary?
A cool blend of multiple-inheritance, mix-ins, and whatnot. Mostly on class library level, the security impact would be positive if anything because it allows for a better factoring of code (and therefore capabilities)
You should read the paper(s), http://iamwww.unibe.ch/~scg/Research/Traits/
Smalltalk traditionally doesn't have dynamic scoping, but Stephen Pair could implement it in RuntimeEnvironments.
RuntimeEnvironments?
thread-local vars, basically. http://lists.squeakfoundation.org/pipermail/squeak-dev/2002-December/049938....
It really would be good to have a list of these for current Squeak. Then we can try to work through them and see what stories we find plausible.
Reification of context and sends seem to be a very important enabler of Smalltalk magic. If anything, I don't think we want to lose that.
At 12:21 AM 2/3/2003 Monday, cg@cdegroot.com wrote:
Is this looking at Islands, basically, but then nested? Again the $1000 dollar question arises: by what scope? It seems here that you are talking about machines, therefore about execution environment, therefore about some form of dynamic scopery.
Let's say object A is instantiated on "machine" X and object B is instantiated on "machine" Y. Let's say X and Y allow A to call B. The act of A calling executes on X under X's rules, observation, and control. The behavior of B being called (executing the called method) likewise executes on Y. The virtualization/hosting/ownership environment is according to the object's creation context, not its calling context.
This is lexical scoping, not dynamic scoping, just as Alan's metaphor would seem to demand. Object references stretch between "machines". Messages move between "machines" riding references. Objects stay where they're born.
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
This is lexical scoping, not dynamic scoping, just as Alan's metaphor would seem to demand. Object references stretch between "machines". Messages move between "machines" riding references. Objects stay where they're born.
Yeah, I figured that out for myself *after* I posted. Stupid me. <spank target="self"/>
Anyway, that'd probably mean, or could mean, that you'd have an extra slot in these objects pointing to their 'machine', not? When they are instantiated, they get the parent's 'machine' reference unless the parent overrides it (either by setting it after the fact, or more likely by a something like 'Foo newInNewEnvironment').
SqueakVM specialists: how hard is it to add an extra reference to all objects? On Squeak level? On VM level?
At 03:22 AM 2/3/2003 Monday, cg@cdegroot.com wrote:
Anyway, that'd probably mean, or could mean, that you'd have an extra slot in these objects pointing to their 'machine', not? When they are instantiated, they get the parent's 'machine' reference unless the parent overrides it (either by setting it after the fact, or more likely by a something like 'Foo newInNewEnvironment').
"after the fact" would be bad. You'd be in a very weird state until it was set.
Otherwise, yes, exactly. Or, at least, it has to point at an object which is in one-to-one correspondence with the parent 'machine', and which enables this parent to open it up (ie, obtain an instance of Lex's ObjectInspector on it) iff the object is indeed a child of this parent. I need to explain the KeyKOS/EROS branding mechanism.
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
"after the fact" would be bad. You'd be in a very weird state until it was set.
Hey, nothing that some loose references to quantum mechanics cannot solve ;-)
Otherwise, yes, exactly. Or, at least, it has to point at an object which is in one-to-one correspondence with the parent 'machine', and which enables this parent to open it up (ie, obtain an instance of Lex's ObjectInspector on it) iff the object is indeed a child of this parent. I need to explain the KeyKOS/EROS branding mechanism.
Ok. My gut feeling is indeed that this is better than dynamic scoping. So we need an extra slot. Time to bring in the VM hacking squad ;-)
(FYI: the VM object format is destined to change in Squeak 3.5, for 'real' closures or whatever. So this is the ideal time to request such changes).
On Monday, February 3, 2003, at 12:50 PM, cg@cdegroot.com wrote:
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
"after the fact" would be bad. You'd be in a very weird state until it was set.
Hey, nothing that some loose references to quantum mechanics cannot solve ;-)
Otherwise, yes, exactly. Or, at least, it has to point at an object which is in one-to-one correspondence with the parent 'machine', and which enables this parent to open it up (ie, obtain an instance of Lex's ObjectInspector on it) iff the object is indeed a child of this parent. I need to explain the KeyKOS/EROS branding mechanism.
Ok. My gut feeling is indeed that this is better than dynamic scoping. So we need an extra slot. Time to bring in the VM hacking squad ;-)
I think there are quite a few VM hacking squad members with us. :-)
I'll point out that I am using one of the compactClassIndices for the MessageRedirector class. This is the mechanism that pops message lookup into my context classes, and that is how I have implemented eventual references and promises. Any message sent to an eventual reference will be eventually sent.
The next step I had in mind was to actually shove these contexts into the VM, so they weren't accessible in the image - it would require a special, MOPed Inspector to peer inside them. This is my attempt at providing opacity.
Look at MessageRedirector (or MessageRedirectorProxy for non redirection VMs), Redirectionmanager and the ReferenceContext. To get them to somewhat behave, I have a set of immediateCall selectors.
Now I have also used this context approach to implement a Mixin pattern, and more generally, i think that wrapping references in the VM allows one to define all kinds of auxiliary services. It is a managed reference. It could including this concept of a machine and the environment and parent-child relationship it models, if I am following correctly.
Inspectors on eventual references don't work especially well. It has great difficulty in defining DoIt methods on the class side of the inspected object, because it keeps redirecting the #class method... :-) It's like a greased pig!
cheers, rob
Cees,
Ok. My gut feeling is indeed that this is better than dynamic scoping. So we need an extra slot. Time to bring in the VM hacking squad ;-)
(FYI: the VM object format is destined to change in Squeak 3.5, for 'real' closures or whatever. So this is the ideal time to request such changes).
I don't know where you heard that the object format is going to be changed in 3.5 but adding a pointer to every object is unlikely to be done unless you have a Very Good Reason(tm). It will instantly break all plugins and the added space and GC overhead is not to be taken lightly.
Cheers, - Andreas
At 12:41 PM 2/3/2003 Monday, Andreas Raab wrote:
Ok. My gut feeling is indeed that this is better than dynamic scoping. So we need an extra slot. Time to bring in the VM hacking squad ;-)
(FYI: the VM object format is destined to change in Squeak 3.5, for 'real' closures or whatever. So this is the ideal time to request such changes).
I don't know where you heard that the object format is going to be changed in 3.5 but adding a pointer to every object is unlikely to be done unless you have a Very Good Reason(tm). It will instantly break all plugins and the added space and GC overhead is not to be taken lightly.
Ok, here's another way to do it. Instead of modifying the format of a virtualized object, have the virtualized object point at a virtualized class instead of the real one. Make the layout changes in the virtualized class.
Each class "loaded" into a given virtual "machine", ie, each class that will be instantiated by a given virtualizer, ie, each class that is to be as-if interpreted by that virtualizer, is itself first virtualized. This means that a new Behavior object of some kind is allocated, usually sharing method dictionary with the original class object as well as wrapping the original class object. (Can two behaviors in Smalltalk share a method dictionary?) The instance is now just the regular instance, but points at the virtualized class instead of the real one.
Instances of normal classes would be unchanged, would be normal Squeak objects, and would be seen by Squeak-E as primitive objects; just as E sees Java objects.
Could this work? Might it even work without needing any VM changes?
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
Mark,
Don't get me wrong here - if adding an extra word to the object header is the best solution for the problem it might be a worthwhile change to make. I was just trying to point out that such a change really deserves a good reason to make it and from what I can tell this is currently just one of the possible design alternatives (I'm travelling so I haven't entirely caught up with everything going on).
Re: Virtualizing classes You lost me somewhere. I don't exactly understand what the point about "virtualizing objects/classes" really is. So I can't quite comment on the overall issue but it seems to me that what you are describing is technically feasable.
Cheers, - Andreas
-----Original Message----- From: squeak-e-bounces@lists.squeakfoundation.org [mailto:squeak-e-bounces@lists.squeakfoundation.org] On Behalf Of Mark S. Miller Sent: Monday, February 03, 2003 10:01 PM To: Squeak-E - a capability-secure Squeak Subject: RE: [Squeak-e] Programming the VM
At 12:41 PM 2/3/2003 Monday, Andreas Raab wrote:
Ok. My gut feeling is indeed that this is better than
dynamic scoping.
So we need an extra slot. Time to bring in the VM hacking squad ;-)
(FYI: the VM object format is destined to change in Squeak 3.5, for 'real' closures or whatever. So this is the ideal time to
request such
changes).
I don't know where you heard that the object format is going
to be changed
in 3.5 but adding a pointer to every object is unlikely to
be done unless
you have a Very Good Reason(tm). It will instantly break all
plugins and the
added space and GC overhead is not to be taken lightly.
Ok, here's another way to do it. Instead of modifying the format of a virtualized object, have the virtualized object point at a virtualized class instead of the real one. Make the layout changes in the virtualized class.
Each class "loaded" into a given virtual "machine", ie, each class that will be instantiated by a given virtualizer, ie, each class that is to be as-if interpreted by that virtualizer, is itself first virtualized. This means that a new Behavior object of some kind is allocated, usually sharing method dictionary with the original class object as well as wrapping the original class object. (Can two behaviors in Smalltalk share a method dictionary?) The instance is now just the regular instance, but points at the virtualized class instead of the real one.
Instances of normal classes would be unchanged, would be normal Squeak objects, and would be seen by Squeak-E as primitive objects; just as E sees Java objects.
Could this work? Might it even work without needing any VM changes?
Text by me above is hereby placed in the public domain
Cheers, --MarkM
Squeak-e mailing list Squeak-e@lists.squeakfoundation.org http://lists.squeakfoundation.org/listinfo/squeak-e
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
Ok, here's another way to do it. Instead of modifying the format of a virtualized object, have the virtualized object point at a virtualized class instead of the real one. Make the layout changes in the virtualized class.
It could work, however you would be forced to instantiate a virtualized class (probably a proxy?) for every class in every environment.
I'm hoping that this stuff will be shareable, so that e.g. in setting up a new environment you can subtract capabilities from your current environment by handing the new environment the parent (your environment) and a set of deltas. But this is vague, very vague, because I still have no idea of the Something that constitutes the environment and what a parent environment would want to have for child environments in typical cases.
Modifying the object format is an uphill battle, for good reasons, but not one necessarily lost.
Hi Stephen,
I cc'ed you since you have recent experience adding an extra header word to Squeak object headers. The swiki is at http://squeake.net, if you would like to join the list and there are mail archives, of course.
On Monday, February 3, 2003, at 06:22 AM, cg@cdegroot.com wrote:
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
This is lexical scoping, not dynamic scoping, just as Alan's metaphor would seem to demand. Object references stretch between "machines". Messages move between "machines" riding references. Objects stay where they're born.
Yeah, I figured that out for myself *after* I posted. Stupid me.
<spank target="self"/>
Anyway, that'd probably mean, or could mean, that you'd have an extra slot in these objects pointing to their 'machine', not? When they are instantiated, they get the parent's 'machine' reference unless the parent overrides it (either by setting it after the fact, or more likely by a something like 'Foo newInNewEnvironment').
SqueakVM specialists: how hard is it to add an extra reference to all objects? On Squeak level? On VM level?
I would like to hear a little more about what this 'machine' actually is. Is it a lexical closure? Would all objects need to have this 'machine' reference?
Squeak object headers are of variable size. there are 3 different sizes so it uses 2 bits in the baseHeader to define the header size. 1 word, 2 words, or 3 words are the possibilities, and the fourth value is for dead objects, I believe. Stephen added an extra word for his LOOM impl, and basically all the code that reads or writes the headers would need to be changed to deal with it.
In thinking about how my eventual contexts and this machine context could be combined, it may be best to give all objects this context structure, which the VM could pack appropriately, and let it be a dynamic descriptor of the state of the reference. Near, Eventual, Persistent, and Private machine scope attribute could be packed inside of this context.
cheers, rob
Robert Withers squeak-e@lists.squeakfoundation.org said:
I would like to hear a little more about what this 'machine' actually is. Is it a lexical closure? Would all objects need to have this 'machine' reference?
As far as I can see pointer to a Something that provide access to globals and whatever. Somewhere between an Island and an Environment, I figure. With nesting, probably.
On Monday, February 3, 2003, at 10:37 AM, Robert Withers wrote:
On Monday, February 3, 2003, at 06:22 AM, cg@cdegroot.com wrote:
Anyway, that'd probably mean, or could mean, that you'd have an extra slot in these objects pointing to their 'machine', not? When they are instantiated, they get the parent's 'machine' reference unless the parent overrides it (either by setting it after the fact, or more likely by a something like 'Foo newInNewEnvironment').
SqueakVM specialists: how hard is it to add an extra reference to all objects? On Squeak level? On VM level?
I would like to hear a little more about what this 'machine' actually is. Is it a lexical closure? Would all objects need to have this 'machine' reference?
Squeak object headers are of variable size. there are 3 different sizes so it uses 2 bits in the baseHeader to define the header size. 1 word, 2 words, or 3 words are the possibilities, and the fourth value is for dead objects, I believe. Stephen added an extra word for his LOOM impl, and basically all the code that reads or writes the headers would need to be changed to deal with it.
In thinking about how my eventual contexts and this machine context could be combined, it may be best to give all objects this context structure, which the VM could pack appropriately, and let it be a dynamic descriptor of the state of the reference. Near, Eventual, Persistent, and Private machine scope attribute could be packed inside of this context.
Whoa, hang on guys. I may be missing something here, but I think we're still a long way from making these kinds of decisions. I'm still trying to get my head around the idea of multiple levels of virtualization which don't involve multiple levels of interpretation. Let's talk little more about the concepts involved here before we worry about how to lay out object headers in the VM.
Mark, it seems to me that the machines in your virtualization frame work are instances of the reified mechanisms I mentioned in my first post. This would include things like object memory, interpreter, compiler, compiled methods, activation stacks etc. On the hardware level these would be actual CPUs, RAM etc. Then we'd have the abstractions of these provided by the OS: processes, virtual memory etc. Then we have the Squeak-E executable, with *its* abstractions, the things I was referring to originally: Class, Compiler, Processor, Display, thisContext, MethodDictionary, etc.
At deeper levels of virtualization, it looks like we'd need separate reifications of these abstractions, that are limited in scope to the virtualization level that we're dealing with. These would effectively be capabilities for manipulating the virtualization subtree rooted at that (virtual) machine.
Is that making sense?
Colin
Colin Putney squeak-e@lists.squeakfoundation.org said:
work are instances of the reified mechanisms I mentioned in my first post. This would include things like object memory, interpreter, compiler, compiled methods, activation stacks etc. On the hardware level these would be actual CPUs, RAM etc. Then we'd have the abstractions of these provided by the OS: processes, virtual memory etc. Then we have the Squeak-E executable, with *its* abstractions, the things I was referring to originally: Class, Compiler, Processor, Display, thisContext, MethodDictionary, etc.
Ok, we're talking about Squeak-E level, so Class, Compiler, Processor, Display, WorldMorph, ... are the things to be virtualized. Basically stuff in the Smalltalk dictionary...
At deeper levels of virtualization, it looks like we'd need separate reifications of these abstractions, that are limited in scope to the virtualization level that we're dealing with. These would effectively be capabilities for manipulating the virtualization subtree rooted at that (virtual) machine.
I think Mark's idea is that you only need them as far as you need to modify them (restrict them more) in 'deeper levels'. If you make sure that, no matter at what level, you never can peek a level up, then most of the time you can just pass on your own environment with just a couple of tweaks. By making sure you can do this, you avoid performance hits at deeper levels.
Colin,
You're right, of course. You concerns are in my first set of questions, but I am a little afraid of thinking deeply on it at the moment. I was thinking that this defines something I am calling virtualization circuits, connected when an object is created. I am thinking VPN like circuits, here, but it is a sloppy thought. Otherwise, I was answering a vm impl question which will become important (it is important for my eventual refs) and gives us some idea of the implementation costs with doing these things in the VM.
Now back to the originally scheduled program...what is a virtualization layer, etcetera. I thought your first post was dead on and I am glad I didn't answer it, because quite a bit has resulted from your posted concerns. I share the same concerns with you, Colin. There is no way I could have done any of the EventualSending impl without access to the vm.
Do you think Mark is drawing diagrams? He really hates doing that, from what i have heard... ;-)
cheers, rob
On Monday, February 3, 2003, at 02:15 PM, Colin Putney wrote:
Whoa, hang on guys.
At 11:15 AM 2/3/2003 Monday, Colin Putney wrote:
Whoa, hang on guys. I may be missing something here, but I think we're still a long way from making these kinds of decisions.
I agree.
I'm still trying to get my head around the idea of multiple levels of virtualization which don't involve multiple levels of interpretation. Let's talk little more about the concepts involved here before we worry about how to lay out object headers in the VM.
Mark, it seems to me that the machines in your virtualization frame work are instances of the reified mechanisms I mentioned in my first post. This would include things like object memory, interpreter, compiler, compiled methods, activation stacks etc. On the hardware level these would be actual CPUs, RAM etc. Then we'd have the abstractions of these provided by the OS: processes, virtual memory etc. Then we have the Squeak-E executable, with *its* abstractions, the things I was referring to originally: Class, Compiler, Processor, Display, thisContext, MethodDictionary, etc.
At deeper levels of virtualization, it looks like we'd need separate reifications of these abstractions, that are limited in scope to the virtualization level that we're dealing with.
You've now baked this idea more than I have, and I'm not quite following you. I hope I didn't give the impression that I knew where I was going with all this? ;)
In any case, what you say sounds plausible. Let's try fleshing things out and see.
These would effectively be capabilities for manipulating the virtualization subtree rooted at that (virtual) machine.
Yes!
Is that making sense?
The last part, yes. The previous parts I'd have to think about harder than I have time for right now, sorry. (Other, much less interesting things are taking my time right now. I'm not even drawing diagrams ;).)
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
SqueakVM specialists: how hard is it to add an extra
reference to all
objects? On Squeak level? On VM level?
I would like to hear a little more about what this 'machine' actually is. Is it a lexical closure? Would all objects need to have this 'machine' reference?
Squeak object headers are of variable size. there are 3 different sizes so it uses 2 bits in the baseHeader to define the header size. 1 word, 2 words, or 3 words are the possibilities, and the fourth value is for dead objects, I believe. Stephen added an extra word for his LOOM impl, and basically all the code that reads or writes the headers would need to be changed to deal with it.
You are correct about header sizes and bit usages. Coaxing squeak into having extra headers is straightforward, but time consuming. You'll have an additional challenge by adding a pointer field (because you'll need GC to trace it). I'd also recommend that you bone up on the InterpreterSimulator and use SystemTracer2 (on SqueakMap) for making your new images and testing your new interpreter. The InterpreterSimulator is an invaluable debugging tool.
Since it is a pointer field, I'd probably explore adding the word *below* the base header because that will be the least impact on GC...however, it's also something that I've never attempted. I have added non-pointer headers to every object above the base header and I've added the ability to insert a "state manager" between an object and its class (path of least resistance). If you added a pointer field above the base header, you're going to have to do some very tricky things with GC to get it to traverse that field (similar to the way GC currently deals with the class header).
In an ideal world, I'd move the class header below the base header, but this probably wasn't done because not every object has a class header and testing that fact on every inst var access would probably be expensive. If you eliminated compact classes, then it would make sense to move the class header below the base header (note, I have not done that in my Chango VM even though I did eliminate compact classes).
If absolutely every object is going to have this pointer and it is a pointer, I would start with a design that puts this pointer after the base header. If only some of the objects in the image will have this pointer, you might try a similar hack that I did which inserts your object between an object and its class. Finally, you can put a pointer above the header, but it's going to make for some ugly looking code in GC.
- Stephen
On Sun, 2 Feb 2003, Mark S. Miller wrote:
What can I read about Seaside? Could you summarize the salient points?
Open access to one's own frame is fine. Open access to one's caller's frame would kill security, except for access according to the above ownership hierarchy. A debugger must be prepared to encounter frames it cannot open.
Seaside requires access to the entire context stack, but opaquely - it just needs to copy it, not to look at or change it. To put it another way, it needs access not to its caller's frame but to its own continuation, which seems reasonable to me from a security point of view (there was a brief mention earlier of a secured Scheme - did this still have call/cc?).
Avi
Colin Putney squeak-e@lists.squeakfoundation.org said:
My question to Mark, Rob and the other Squeak-E enthusiasts then is, we can reconcile these two virtues?
That, indeed, is the most important question. "Can we integrate the security of E into Squeak without losing Squeakness?". The answer: I don't know. I think that's why we are all here - no-one is interested in throwing out the baby with the bathwater.
As I see it now, and as I have brought it up - as an image - elsewhere, we're likely to end up with a system not unlike a classical OS with three rings: - The inner ring can do literally anything, it is the Squeak VM; - The intermediate ring can do a lot, and be quite harmful, it is SqueakAsWeKnowIt; - The outer ring will be safe, probably in a language that looks a lot like Squeak but have some syntax changes (eventual sends, I hope) and semantics changes (globals etcetera), this is where Squeaklets will live.
The goal is to move as much code to the outer ring as possible, of course. But, similary to what E did with Java, we can bootstrap a lot.
Now, as far as UI goes, I'm confident that the ideas between CapDesk will work in Squeak. If you haven't seen it, the short version is that it is as transparent as possible. If a program wants a capability, it can explicitely ask the user; however, often user actions will implicitely grant capabilities: a Squeaklet would ask the underlying system for a read capability on some file (e.g. in answer of the user clicking 'open'), and by selecting a file, the user would grant read capability on that file to that Squeaklet. The gestures therefore are the same, it's 'just' the semantics that are different: in the 'classical' case, the user could point to 'readme.txt' and the editor could nevertheless open 'secrets.txt' because it posesses all the user's authority to the filesystem, in the 'Squeak-E' case, the user would point to 'readme.txt' and that's the only capability the editor would ever get back from the (privileged) file choosing system.
At 01:48 PM 2/2/2003 -0800, Mark S. Miller wrote:
At 12:28 PM 2/2/2003 Sunday, Allen Wirfs-Brock wrote: ...
In the Smalltalk model the handler is executed before the stack is unwound. This allows resumable handles to be trivially implemented.
Does it have any other virtue? If we didn't have resumable handlers, would there be any remaining reason to prefer this? Note that the issue isn't when they're executed, but when they're looked up. Of course, they can't be executed until they're looked up.
The obvious advantage is that the complete state of the computation is still available during the evaluation of the handler. This is clearly essential for resumption and is certainly useful for debugging. I can speculate that there are non-debugging situations where it is useful for a non-resuming handler to have the computation state at the exception point available (or at least preserved) but I don't have a canonical example at my finger tips. There is quite a few years of experience with this mechanism in the broader Smalltalk community. Perhaps somebody out there can provide a good example.
Stepping back, this appears to me to be a classic early/late binding trade-off. The Java model binds the decision to discard the computation to an early stage in the exception processing sequence. The Smalltalk model defers that decision to a much latter point. As is usually the case, late binding provides more flexibility but carries a price.
The Smalltalk model with resumable exceptions certainly feels more powerful in a way that is consistent with the more dynamic nature of Smalltalk. However, I don't know that I'm prepared to argue that resumable exceptions are essential (rather than just useful).
I'm relatively confident that your CPS model could be extended to accommodate the Smalltalk exception model although it might be some work to do so.
The only way I can imagine reveals that this semantics is indeed a case of dynamic scoping. The way I imagine:
A continuation could have an additional method available, "getHandler: exceptionTypeOrSomething" that either already knows a handler for that exceptionTypeOrSomething, or asks its continuation. This non-destructive looking up the stack is simply a deeply bound implementation model of dynamic scoping. (Alternatively, the continuation could instead have a non-destructive "handle: exception" method which gets delegated back, but it amounts to the same thing.)
That is essentially one (well, actually two) way(s) to implement it. In fact, the ANSI Smalltalk exception mechanism doesn't need to have any particularly unique primitive support. All you need are closures, a primitive unwind mechanism, single-use continuations ([^]), and a single thread-local variable. The original Digitalk implementation used a shallow binding technique. A thread local pointed to a liked list of active handler states. Establishing a protected region adds a new element to the head of the list. Exiting the protected region delete the head of the list. Unwind protection is used to ensure the integrity of the list. Each list element records the exception to be handled, the closure for the handler, and the continuation used to terminate the handler. Signaling an exception is a matter of searching the list for an entry that handles the exception and resetting the list head before evaluating the closure. I personally prefer this implementation technique over deep binding techniques that probe the call stack to find handlers as it doesn't require reification of the stack.
A reasonably high fidelity approximation of the Smalltalk exception system can be implemented in Java. Java has unwind protection as well as thread locals and Java exceptions can be used as the continuation mechanism. Java don't have real closures but anonymous inner classes can be used as an approximation.
If you accept that Smalltak exceptions can be implemented using the above primitives then I'm not sure that you even have to explicitly account for exceptions in your formal model (assuming that you do model the primitives).
The key thing about the Java alternative, unwinding to the handler, is that the continuation is only ever invoked destructively, and is otherwise opaque. So by the time the handler is invoked, it's a handler associated with the immediate continuation, and not one retrieved from further back on the stack.
So even without resumption, if earlier handlers get invoked before later unwind blocks, then I'd agree with Anthony that Smalltalk's exceptions are an instance of dynamic scoping. This isn't to say that it's a bad idea. But it does leave us with the following hypotheses:
Most of the legitimate uses of resumable exceptions that I know of do, indeed, seem to be specialized examples of dynamic scoping.
Resumable handlers are bad.
Dynamically scoped resumable handlers (as in Smalltalk) are good, leaving
us with at least one case where dynamic scoping is a good idea. If it's a good idea here, there are probably other cases as well.
- Resumable handlers are good, but a resumable handler shouldn't be looked
up by dynamic scoping. (Note: Joule has lexical resumable handlers called "Keepers".)
Terminating handler lookup should happen during unwind, like Java.
Terminating handler lookup should happen prior to unwind, as in
Smalltalk, making this handler lookup arguable another case of dynamic scoping. (In order to make this point separate from #2, let's say "should" even in the absence of the need to support #2.)
- Terminating handler lookup should happen prior to unwind, but not by
looking up the stack. (Presumably, the alternative would be lexical. I know of no systems that do this.)
Smalltalk: #2, #5. Java & E: #1, #4.
In the case of Java/C++, I'm not sure that #1 is the motivation for their design. I believe that the Smalltalk style of handler would be difficult to implement (or of very little utility) without first-class closures.
Joule: #3. (Joule has no stack, and so can't have any conventional notion of termination.)
Does this seem like a useful framework for exploring the issue?
Yes!
What are some arguments for #2 or #5? I'm prepared to argue for #1, #3, and #4.
To some degree I've touched upon #5 issues above. I willing to make a case for #2 but it will have to be in another message. I believe I share with you the position that static (lexical) scoping is usually preferable to dynamic scoping. However, I am interested in hearing why you may think that dynamic scoping is never useful.
Allen_Wirfs-Brock@Instantiations.com
Allen Wirfs-Brock squeak-e@lists.squeakfoundation.org said:
However, I don't know that I'm prepared to argue that resumable exceptions are essential (rather than just useful).
Which brings us back at the Turing completeness argument of being essential ;-). I'm prepared to argue that it allows for separation of concerns in a way that is very hard otherwise (like asking the user to retry an operation - a one-liner in Smalltalk, nigh impossible in Java). This sort of 'clean code' functionality *is* important.
At 12:25 AM 2/3/2003 Monday, cg@cdegroot.com wrote:
Allen Wirfs-Brock squeak-e@lists.squeakfoundation.org said:
However, I don't know that I'm prepared to argue that resumable exceptions are essential (rather than just useful).
Which brings us back at the Turing completeness argument of being essential ;-). I'm prepared to argue that it allows for separation of concerns in a way that is very hard otherwise (like asking the user to retry an operation - a one-liner in Smalltalk, nigh impossible in Java). This sort of 'clean code' functionality *is* important.
Could you please explain this example in enough detail that I could try rewriting it using only lexical scoping?
---------------------------------------- Text by me above is hereby placed in the public domain
Cheers, --MarkM
Mark S. Miller squeak-e@lists.squeakfoundation.org said:
Could you please explain this example in enough detail that I could try rewriting it using only lexical scoping?
The idea is along the lines of:
someMethodInTheUserInterface [some operation, goes very deep] on: FloppyNotPresentError do: [:ex | Dialog ask: 'Please insert floppy'. ex restart]
The idea is that 'some operation' may contain side effects, which makes the alternative idom employing a loop and a retry of the whole operation unattractive. You cannot put the Dialog box in the, say, floppy formatting routine, because the floppy formatting routine shouldn't know about the UI. And splitting:
do bit with side effects. while ... repeated loop containing the exception handling
is often not good either, because it is a single logical operation (say 'dumpLogsToFloppy', a domain-level method that gathers logfiles, zips them, formats a floppy, and dumps them to it) that the UI doesn't need to know about to such an extent that it can split off the side effects in order to have the failure-prone bit in a separate non-restartable exception context.
squeak-e@lists.squeakfoundation.org