multiple versions of same package vs. mini-images (Was: Re: Guaging & Squeak/JVM)

Mon Feb 11 10:04:01 UTC 2008

paul

cut your emails into chunk else really few people will read them.

Stef

On Feb 11, 2008, at 7:14 AM, Paul D. Fernhout wrote:

> Igor-
>
> You suggested "enable multiple versions of same package in same  
> image and
> keep track of package dependency". That's been an inspirational  
> suggestion
> for me, and I've been thinking about how to implement it for a  
> Squeak/JVM.
>
> I don't have a definite solution yet, but here are some thoughts on  
> it.
>
> I feel it may come down to either picking one of two paths.
>
> We could make a complex system for supporting multiple global system
> dictionaries (or the equivalent) to allow multiple applications with
> different dependencies to live together in one memory image. That's  
> really
> just an extension of the status-quo in some ways, packing ever more  
> stuff
> into one bigger and bigger image.
>
> Or, we can break the monolithic image into small images which each  
> just
> support one application well (call them "mini-images"). Each mini- 
> image
> might in turn depend upon some other common mini-images for defining  
> common
> classes. This alternative would probably require Spoon-like
>  http://netjam.org/spoon/
> remote development and remote-debugging support to work best (but it  
> doesn't
> absolutely have to, as there easily could be a development tools  
> mini-image
> included by reference even in the tiniest mini-image).
>
> Personally, I think the second approach is ultimately simpler and more
> elegant, and does a better job of bringing Smalltalk forward in a now
> network-oriented world. See:
>  "Principles of Design -- Tim Berners-Lee "
>  http://www.w3.org/DesignIssues/Principles.html
> "Principles such as simplicity and modularity are the stuff of  
> software
> engineering; decentralization and tolerance are the life and breath of
> Internet."
>
> You may well know all these issues, but I just thought I'd put it  
> down for
> others comments as I understand it (in case I was wrong or missed
> something). Probably I'll have outlined some approaches people here  
> know
> about already created for Squeak or other systems, and anyone should  
> feel
> free to point me to them.
>
> Anyway, feel free to stop reading here, but what follows is more  
> details on
> how I came to think about this and arrive at those two possible paths.
>
> =============== how it is now, and a simple approach
>
> The biggest aspect of this is resolving globals. For review, if I  
> recall
> correctly this is traditionally done in Squeak by the VM knowing  
> about a
> SystemDictionary called "Smalltalk" (the VM needs to know about it
> absolutely to resolve a circular dependency of not being able to  
> look up the
> global "Smalltalk". :-).  When a CompiledMethod being executed does
> something like make a new instance of a class, it fetches the current
> instance (typically of a class) associated with the name of the  
> global and
> sends it a message or stores it in a variable. Using named globals  
> allows
> late binding of classes by the compiled method.
>
> If you didn't care about late binding, like in Forth referring to a
> previously defined word, you could just make a hard link as a  
> pointer to the
> class at compile time in the compiled method. But then you could not  
> replace
> or remove the class in its entirety later.
>
> There is room for only one version of a class at a time this normal  
> way --
> just one key in the Smalltalk system dictionary with one value.
>
> The simplest way around this might be to have system dictionary  
> values for
> keys be dictionaries. Then you could tag each item with a version.  
> But the
> executing code would still need to resolve which one it wanted. And  
> I don't
> see how that would be easy. But maybe it might be?
>
> And then there is a deeper problem related to composites of objects  
> which
> might include instances pointing to two or more different versions  
> of the
> same class. But we can ignore that for now. :-)
>
> == A deeper analysis (or, "owww, my brain hurts". :-)
>
> Python has a straightforward way to resolve this -- it supports a  
> sea of
> objects, and when you load code, the old classes get overridden in the
> equivalent of a system dictionary with new classes, but the existing
> instances still point to the old classes so those still hang around  
> but are
> not accessible by name. This makes it difficult to do development in  
> a live
> system, and you end up issuing special code to load things in  
> differently
> (not making new classes) if you want to do Smalltalk-style dynamic
> development. But there is no reason you cannot simply load two  
> version of
> the same module (source file) and hang on to them somehow. Squeak  
> could
> certainly do something similar if it had modules or classes which  
> could
> exist without names.
>
> When I try to generalize this global idea, there are other  
> approaches. In
> PataPata (in Python/Jython, trying to retrofit them with Squeak-like
> capacities) I gave each object (typically Morphic-like GUI  
> components) a
> "world" instance variable. That pointed to what was essentially the
> equivalent of a Smalltalk system dictionary to store globals or key
> functionality. In practice, each major window was in its own world,  
> although
> that wasn't strictly required. Then I could have several worlds in  
> the same
> process, where each was somewhat self-consistent.
>
> But objects could still slip from one to another, typically when  
> opening an
> inspector//browser tool (itself in its own world) on another world  
> and maybe
> copying an object from one place to another. Beyond globals, another  
> reason
> for each object to have a pointer to its "world" was that when I  
> serialized
> a world I just wanted the objects from that world to be written out  
> and no
> others, so I could check that pointer to make sure the serialization  
> wasn't
> wandering into writing out objects from other worlds (I didn't  
> pursue the
> concept of nested worlds, which might have been possible).
>
> I was planning to use unnamed references to parents from prototypes  
> (for
> inherited behavior and constants) in PataPata, based on how Self did
> prototypes and links, but I decided in the end to reference prototypes
> representing parents by by name, for the purpose of documenting  
> intent. But
> that left a global lookup problem, resolved by having *every*  
> prototype have
> a "World" pointer. And there were predictable problems when worlds  
> pointed
> to themselves which I had to work around (especially when loading  
> worlds).
> [Self has a fancier way of getting names for unnamed prototypes I  
> did not
> want to try pursuing based on determining paths from a root.]
>
> Anyway, generalizing on this "object-focused late binding lookup"  
> approach,
> objects can point to a global system dictionary, or they can point  
> to other
> objects in some consistently structured way (typically "parent" or
> "container" or "class") which might in turn allow a path to find a  
> global
> (that process might even percolate up and then back down, say to  
> *search*
> for an object with a certain value; I supported this in PataPata to  
> find
> widgets with a certain name in the same window as a widget executing  
> some
> behavior).
>
> But there is another way to do this, which is to have the thread,  
> process,
> stack frame, or virtual machine hold onto a global system dictionary  
> object
> somehow. This is closer to how Squeak does it with a system  
> dictionary,
> except there might be one system dictionary per process or thread or  
> stack
> frame. The difference is that the entity executing the code knows  
> where to
> look for globals even if the objects being used for executing code  
> do not
> (which presumably saves on memory, and provides a more consistent  
> notion of
> what versions of classes a process want to see, assuming that is a  
> good idea
> :-). In a most extreme case, the user running the program might know  
> the
> object ID or memory location of the global system dictionary and  
> pass it in
> as needed (this might happen in a debugger session). I might call  
> this an
> "execution-focused late binding lookup" approach.
>
> For completeness, there is another approach which is to have globals  
> stored
> in relation to the memory where the objects are stored (or processes
> executed) if memory is partitioned somehow. So if you have an object  
> or
> process memory location, you can find the global system dictionary  
> that goes
> with it by looking somewhere special in that memory chunk  
> (beginning, end,
> standard offset). Deep in the reality of a virtual machine, it might  
> even be
> using this approach in various ways (like making sure the pointer to  
> the
> system dictionary is, say, the first handle in an object memory  
> table).
>
> Probably someone who has a PhD in computer science could tell me the  
> proper
> terms for these approaches towards late binding? :-)
>
> And of course, you can use more than one at a time. NewtonScript, for
> example, found variables by having two different types of lookup,  
> based on a
> parent slot and visual containment. Maybe you could use all of the
> approaches at once in some system just for fun. I don't think I'd  
> want to
> debug anything in it through. :-)
>
> Anyway, this doesn't answer how specifically to do what you propose,  
> but it
> does suggest some possible points of intervention -- mainly  
> instances or
> processes.
>
> But this leads to a deeper point. A Smalltalk VM (or any OO VM  
> system like
> it, like the JVM objects or Python objects) has problem with  
> multiple global
> objects if objects sharing the same VM in different global spaces  
> can point
> at each other directly.
>
> Essentially, if you can have multiple global system dictionaries,  
> you end up
> in a situation where an object from a "module" in one set of  
> interconnected
> versions of modules can be reference by an object in a "module" in   
> another
> interconnected set of different module versions. At that point, what  
> governs
> the objects behavior, specially late binding lookup of globals?  
> Should it be
> governed by the module the object came from? Or should it be  
> governed by the
> module which it is now connected to? Or should it be governed by the  
> process
> executing and calling a method of the object (and that process might  
> lookup
> its globals in yet another way)?  And similarly, when you absorb an  
> instance
> form another module, should its class still point to the old class  
> or should
> it point to the class in the new module?
>
> In general, this issue is a variant of a deeper problem related to OO:
>  http://mail.python.org/pipermail/edu-sig/2007-April/007852.html
> as I feel the idea, that objects can stand alone and be somehow  
> meaningful,
> is at the root of a lot of evil in the Smalltalk universe (e.g.  
> "bitrot". :-)
>
> Anyway, just from random comments here over the years, I get the  
> feeling
> that in their hearts the original Squeak Central people (Dan Ingalls
> especially) understand this and use heavily customized images in  
> practice as
> coherent wholes, but perhaps they have never had the time to  
> generalize this
> idea to a philosophical principle. Certainly just fighting for  
> objects at
> all, as well as messages and VMs and good tools must have taken up  
> lots of
> energy.
>
> Part of this issue may depend on whether you think of an object like a
> single-celled creature like an Amoeba, or whether instead you think  
> of an
> object as part of a biological entirety, like as a protein molecule  
> in a
> cell, or a highly regulated cell in a large multi-cellar entity. If  
> objects
> can't meaningfully stand alone, then it seems like we need some  
> coherent
> philosophical approach to how they fit together into modules or  
> images.
>
> Loading multiple versions of the same classes seems to strain this  
> possible
> coherence, as useful as it might be. It's not that it won;t work,  
> it's just
> that the mental complexity starts increasing to the point where you  
> may have
> to be really clever (and really alert) to keep track of it all. :-)
>
> === two competing approaches
>
> Because of all these difficulties and complexity, I'm inclined to lean
> towards suggesting that images should be smaller, :-) and a VM's could
> either be lightweight or perhaps could support multiple open images  
> at once.
> Then you can load one version of a module into a larger set of other
> modules, and maintain that set for one application. This total image  
> defines
> an ecology of objects, and the objects and their classes all make  
> sense in
> relation to each other (as well as whatever I/O they choose to do  
> through
> the VM to the rest of the world). This is sort of like a living  
> cell. And
> you could then load a different version of code modules into another
> *different* image and maintain that set for a different application.  
> And
> when these applications want to communicate, it will be from one  
> image to
> another, through their different VMs, presumably via sockets or shared
> memory or files or whatever, via some common serialization process.  
> There
> are already several approaches for distributed objects in Smalltalk,  
> so I
> doubt this will be much of a problem, and the JVM and Java offer other
> possibilities for remote procedure calls and such. I think that a  
> minimal
> image ("mini-image") approach might come closest to bringing some  
> sanity to
> the idea of personal images (like Dan Ingalls seems to like). Every  
> image
> would be a custom mix of module versions and hacked up base class  
> code. The
> image would know with a little developer help which objects belonged  
> to
> which modules. To help with this, one would need easy tools to  
> export module
> versions and configurations. An important aspect of such an approach  
> might
> be Spoon-like remote debugging, and remote development of minimal  
> images so
> you could have, say, one image open with your favorite debugging  
> tools and
> over a socket just plug those tools into other images you wanted to  
> modify
> or debug; this isn't strictly necessary -- but conceptually it makes  
> things
> more elegant, especially since then the development tools can have
> different versions of  base classes  than thee system being debugged  
> or
> developed. I get the feeling the Squeak ecosystem has most of the  
> parts of
> all of this, they just haven't been all put together and polished  
> toward
> this end.
>
> Still, for the JVM, which is what interests me right now, all the  
> objects do
> live in one world, and the JVM has a big memory footprint. So, given  
> memory
> footprint and startup time, even with the newer JVM's sharing some  
> memory
> across VM instances, I think we might have to end up living with  
> multiple
> system dictionaries in one JVM unless JVMs improve further? Or maybe  
> if we
> discover they are good enough now? In that case, I end up wondering  
> if a
> "world" instance variable added to every underlying Java object is  
> such a
> bad idea after all. :-) Or the alternative of a "world" instance  
> variable
> stored in each thread (or process) is also possible. Of course,  
> globals are
> rarely looked up, so more indirect ways of storing them might be more
> efficient trading off time for memory. So this is a second alternative
> approach which is closer to the direction you outline.
>
> == best solution long term?
>
> After considering two paths in the previous two paragraphs, I think  
> using
> lightweight images with only one system dictionary are a better way  
> to go
> long term. They are just simpler and already well understood.
>
> If you, say, want a little clock up on your screen implemented in  
> Squeak
> (instead of Lively Kernel :-), you just have a clock image. Ideally,  
> that's
> all it does -- it's a clock. If you want to inspect the clock, you  
> fire up
> your development image in another JVM and connect to that clock JVM  
> (maybe
> using a universal debugging registry service). Maybe your  
> development image
> even gives you a copy of the image of the clock window with drag-and- 
> drop
> overlays on another screen. Or it might put annotations over the  
> original
> window by temporarily inserting a "glasspane" if the clock  
> application was
> using Swing widgets, or by the usual Squeak ways if the Clock  
> application
> used Morphic widgets.
>
> To save space and maybe help with upgrades, perhaps the Clock  
> application
> image depends on another larger base image. I did that in PataPata  
> where
> worlds could require other worlds to be loaded first. Since I stored  
> images
> as textual Python code which could rebuild a world of objects  
> procedurally,
> that worked out OK. Here is an example of simple PataPata world; I  
> would
> expect a Squeak clock image built in a similar fashion would be  
> about the
> same tiny size and also written out as textual source:
> http://patapata.svn.sourceforge.net/viewvc/patapata/tags/PataPata_v204/WorldDandelionGarden.py?revision=315&view=markup
> (One fudge, the bitmap was store outside the image in a file.)
> Note the line:
>  world.worldLibraries = [world.newWorldFromFile("WorldCommon.py")]
> which is what defines the other worlds this world depends on. So, for
> Squeak, this would be like saying your small image depends on other  
> images
> which load first.
>
> Obviously you have to have any supporting images around or you can't  
> load
> your dependent one, but for the most part you just typically depend on
> common downloaded images. If images are stored as text (essentially, a
> Smalltalk program needed to rebuild the image) dependencies are a  
> lot less
> scary since you could always just go in and start cutting and  
> pasting in a
> text editor (but hopefully there would be better tools for this).
>
> How to track and merge changes to base classes in supporting images is
> obviously an issue, and it is not one PataPata tried to solve  
> (beyond the
> fact that prototypes made it easy to override base class behavior  
> for most
> things). But, since at runtime the supporting packages will be  
> loaded, you
> can easily modify it in the live image and then write out a modified  
> version
> of the base image again with a different version number, and hope  
> somebody
> down the road can reconcile your changes if you want them to move  
> forward
> with the supporting image.
>
> In this lightweight approach, images might also become modules  
> stored in
> some source code repository if desired, or really, they might become  
> more
> like (ENVY-ish?) configuration maps on top of available stored  
> modules. So,
> to try to provide an example, you might save your running Clock  
> image as
> module Clock-1.1.4 which also depends on BaseClasses-3.4.2. (This  
> would
> require a worldwide way to identify Squeak modules uniquely.) Of  
> course you
> might not store Clock-1.1.4 on a server; it might be stored on a  
> local drive
> (perhaps in a Jar file, leading to Java classpath problems, but  
> nothing is
> perfect :-). You might open up Clock-1.1.4, modify it using Spoon-like
> remote tools, and maybe even save it back under the same version  
> number if
> no one else depended on it (perhaps with an automatic minor sequential
> internal revision number bump just in case). These names and version  
> numbers
> might also be more like human readable suggestions than absolutes --  
> for
> example each "image" "module" could have a unique UUID (plus perhaps  
> save
> sequence) and dependencies could be expressed as lists of acceptable  
> UUIDs
> as well as names, with some sort of sophisticated matching algorithm  
> to
> trying resolve dependency issues and search for modules various  
> places.
>
> For this Clock example, when you work on the clock you might pull up  
> another
> image of development tools (browser, debugger, inspector, and so  
> on). But
> the versions of these (or the base classes they depend on) don't  
> really
> matter to the clock application. All that matters is that somehow  
> the two
> JVMs (or JVM processes) agree on how to talk to each other to add new
> methods, return results, single step code, follow object references,  
> and so
> on. Presumably one could have a fairly standard protocol for that --  
> maybe
> even an extensible one (perhaps Spoon has this?). Let's say  
> something odd is
> happening with the Clock. You want to see how an older version  
> works. Well,
> you just open up that older clock image. Then you might even open up a
> "image comparing" utility image :-) which lets you connect to both the
> running Clock images simultaneously and compare versions of all the  
> classes
> looking for differences. Still unsatisfied, maybe you clone the  
> older image
> (to start a third clock running) and bit by bit copy classes or  
> modules from
> the new image to the copy of the old until you find where the clock  
> starts
> to behave oddly. Then you make a change (remotely) to the first  
> clock image
> and see if it fixes the problem. Perhaps it turns out your code is  
> perfect
> but the anomaly is due to a really deep problem in code supporting
> Squeak/JVM -- so you drop down a level conceptually and pull up a JVM
> debugger image, or maybe even just Eclipse, :-)
>  http://www.eclipsezone.com/eclipse/forums/t53459.html
> connect to the JVM supporting that Clock image directly, and start  
> swearing
> as you try to figure out what the Squeak/JVM maintainers did wrong  
> this
> time. :-) If you wish, all of your actions with the multiple Squeak- 
> ish VMs
> could have been logged to some common history repository somewhere  
> to replay
> the entire multi-VM development session back to everyone who doesn't  
> believe
> you that it's a JVM level issue. :-) Presumably one could build  
> testing
> tools for this architecture as well.
>
> And Squeak in C could go down this mini-image route too.
>
> As I think a little more about this, I am still perhaps stuck with the
> problem that even in these mini-images, there would need to be some  
> way to
> link specific objects back to specific modules so a modified module  
> could be
> written back out with all its related objects. This is because a  
> mini-image
> is not just code, it is code plus live objects. And so when objects  
> are
> created, they would have to be assigned somehow to a specific module  
> or
> source mini-image. So, perhaps this mini-image solution needs to  
> have a
> "world" field (or "module" or "segment") in every object anyway,  
> just so the
> modified objects can be written back out into the right mini-image or
> module?  Or, if this was implemented in C, the image would be carved  
> up into
> memory segments, with new objects allocated to the chunk of memory  
> going
> with the specific min-image that was loaded.
>
> Squeak already has an image segment effort:
> http://wiki.squeak.org/squeak/1213
> "ImageSegments and project swapping are still in the experimental  
> stage"
> But it is binary, not textual source. And it is based on specific  
> roots, not
> some sort of tag for each object. I guess both might take about the  
> same
> amount of space -- instead of tagging each item with its segment  
> (world),
> you have a big array which points to each object in the segment.  
> Maybe you
> might want both? So objects know their segment and segments know their
> objects? And I find it a little amusing I am putting up windows in  
> PataPata
> defined by textual mini-image files of 3688 bytes (assuming a bitmap  
> loads
> off the network or from a local file :-) while they are talking  
> about binary
> image segments of 10s of megabytes.
>
> And as I read more on modular Squeak, I'm realizing that with mini- 
> images
> the idea of a "project" would probably go away entirely.
>
> And any tool which compared mini-images would have to have some way of
> representing objects in two different mini-images so it could look for
> similarities and differences. At the very least, maybe like Les  
> Tyrrell's
> OASIS project:
>  http://wiki.squeak.org/squeak/1056
> But there is a big difference between loading representations of  
> objects
> (instances or their classes) to look at them and loading objects to  
> use them.
>
> Anyway, no easy solution. But I still think this second mini-image  
> approach
> is simpler conceptually than attempting to keep different versions  
> of the
> same things in the same VM. Both are possible, of course.
>
> ===
>
> Anyway, maybe someone reading this might have a better suggestion or a
> better (simpler, clearer) way of looking at this issue.
>
> --Paul Fernhout
>
> Igor Stasenko wrote:
>> Ken Causey wrote:
>>> [snip]
>>> Within this community I've come to feel that the only day to day
>>> practical solution is to do it and then ask for forgiveness when  
>>> it goes
>>> all pear shaped (badly).  Of course when that happens it really  
>>> helps
>>> when it is something that can be readily reversed with no harm done.
>>> And that's where it seems we have a problem because the current  
>>> release
>>> management schemes don't well-support removing something readily  
>>> and in
>>> such a way that few if any are inconvenienced.  I don't have a ready
>>> solution to that, it is something I find myself thinking about  
>>> more and
>>> more.
>>> [snip]
>>
>> There is a solution: enable multiple versions of same package in same
>> image and keep track of package dependency.
>> So, when you loading an updated package, all code which worked  
>> before,
>> continues to work in same way as it was before.
>> We need a way to be able developer to choose, what parts of system  
>> can
>> use new version and what should use older version due to
>> incompatibility reasons by simply checking dependencies and updating
>> dependency links.
>>
>> Also, this would help a lot in maintaining packages: a package author
>> can easily keep track of his package dependencies, and may or may not
>> wish to release his package with updated dependencies, which use
>> latest versions of packages, his package depends from.
>>
>> Of course, this is somewhat idealistic, and there is many caveats,  
>> but
>> if done well, will allow us to mix things without fear that something
>> will not work due to incompatibilities.
>
>