[squeak-dev] second call for feedback on Naiad design

Thu Nov 20 20:47:44 UTC 2008

I can only say I look forward to testing this.
One question:
When this system works, will not image size be a issue, like a ever
growing web browser cashe that have no size limit ?

Karl

On 11/19/08, Craig Latta <craig at netjam.org> wrote:
>
> Hi--
>
>       This is another call for feedback on the design of Naiad[1], a
> Smalltalk module system I'm writing for Squeak as part of the Spoon
> project[2].
>
>       On the theory that I'll get more of a response by including the
> whole text rather than a link to it, here it is... :)
>
> ***
>
> 2008-10-20, 1946 GMT
>
> Copyright (c) 2008 Craig Latta. All rights reserved.
>
>
> Hi--
>
>       I've been on a quest to make Squeak smaller and more modular, the
> Spoon project[1]. Part one was making the object memory small. Part
> three is about making the virtual machine small. This message is about
> part two, making a module system suitable for adding new behavior to a
> minimal system in an organized way, and for transferring behavior
> accurately between running systems.
>
>       Spoon's module system is called "Naiad", which is an acronym for
> "Name And Identity Are Distinct". It keeps track of the development
> history of a system (what the "sources" and "changes" files are for
> now), and makes it available for exchange with other systems. I think
> keeping classes' names and identities separate is critical for
> this. Following are some notes on its design and use, including the
> object model[1].
>
>       At this point I'd like to emphasize I am the author of this
> design, that I intend to release its implementation under an MIT-style
> license, and that I'd like to pursue a graduate degree with it (I'm
> open to invitations :).
>
> ***
>
> motivation
>
>       A traditional Smalltalk system uses source code to express both
> development history and changes exchanged between systems. The precise
> meaning of source code depends on the current state of the system
> compiling it. Since a Smalltalk system is dynamic, source code is an
> inherently ambiguous medium across time.
>
>       The most problematic system artifacts in light of this ambiguity
> are classes. All activity in a Smalltalk system is the result of
> sending messages to objects. The sending of a message invokes the
> execution of a method, a sequence of instructions for a virtual
> processor. Some of these instructions manipulate the state of the
> object receiving the message. Classes define the structure of that
> state. Therefore, when those class definitions change, the source code
> for the methods of those classes may become meaningless.
>
>       One may confront this situation when trying to recompile source
> code for an old version of a method whose class definition has changed
> in the meantime. Similarly, source code from one system may not be
> meaningful on another, since corresponding class definitions on each
> system may change independently (or be removed entirely).
>
>       This means that the accurate exchange of behavior requires manual
> labor, hindering the propagation of useful fixes and new code. It also
> means that interpretation and use of historical code is more difficult
> than necessary. So we pay twice for this problem: when learning the
> system, and when trying to share our work with others. By separating
> class name from identity, Naiad makes Smalltalk more approachable for
> newcomers, and more productive for developer and user communities.
>
>
> editions
>
>
>       Using Naiad, each development system consists of two object
> memories: one containing developed code, and another containing
> "editions" which describe that code. I'll call the first one the
> "subject memory" and the other the "history memory".
>
>       An Edition is a description of some artifact in the subject
> memory at some point in time, currently an author, comment, tag,
> class, method, module, checkpoint, or edit. Each edition has a
> reference to that artifact's next state in the future (the next
> edition) and in the past (the previous edition), as well as an author
> edition, a collection of licenses, and a timestamp.
>
>       An Edit represents the activation of some edition at a point in
> time. For example, there may be a method created in 2005 that is
> removed in 2006 and reactivated in 2007. There would be an Edit for
> each of those three events, but only two method editions (one
> representing the method becoming active, and one representing it being
> removed).
>
>       The history memory replaces the current changes and sources
> files. It has an instance of EditHistory corresponding to the subject
> memory, which records the active (current) editions for the classes,
> method, modules, and authors in the subject memory. It also keeps the
> subject memory's id and the last Edit made to the subject memory.
>
>       Every time the subject memory adds, changes, or removes a class
> definition, method, author, comment, tag, or module, or makes a
> checkpoint (i.e., makes an edit), it adds the appropriate editions to
> the history memory via remote messages. The history memory snapshots
> itself after every edit, so as to provide crash recovery support.
>
>       The subject memory keeps a remote reference to the history
> memory's instance of EditHistory as a class variable of the local
> EditHistory class, and interacts with it using utility messages sent
> to the local EditHistory class. The history memory also keeps that
> EditHistory instance as a class variable of its local EditHistory
> class, but as a local reference.
>
>       An edition typically elides some of its references when it is
> transferred out of a history memory. For example, a transferred
> edition will usually omit the references to its next and previous
> editions. The requesting subject memory can calculate the ID of those
> editions and obtain them with a separate request, if necessary.
>
>       A subject memory may elect to keep its EditHistory instance as a
> local object, such as in a situation where one wants some limited
> immutable history for debugging purposes, and no crash recovery
> support. Whether in this scenario or in normal development the same
> EditHistory utility messages suffice, since no special code need be
> written to support remote objects. If no edits will be made during
> deployment, and no history retrieval is required, one may simply
> jettison the history memory. One may always reconnect the subject and
> history memories at a later time and continue development.
>
>       The subject memory has tools for browsing and activating the
> editions, wherever they are located. This means that no special tools
> are needed to browse the artifacts of multiple subject systems; one
> uses the same tools as for browsing the artifacts of the local subject
> memory. Each subject memory may connect to multiple history memories
> concurrently (if allowed).
>
>       For that matter, the history memories of multiple systems may
> connect to each other directly, to aggregate editions from multiple
> people, for example.
>
>
> class and method IDs
>
>
>       Each class in the subject memory has a universally-unique
> identifier[3], or UUID. The classes in the minimal subject memory are
> assigned UUIDs before the initial release, and all subsequent classes
> are assigned UUIDs when created. Rather than use the single word
> "class" to refer to either a metaclass or to its sole instance, Spoon
> introduces the term "protoclass". For example, (Array class) is a
> metaclass, and its sole instance, Array, is a protoclass. Each
> metaclass and protoclass has its own UUID, called a "base ID". This is
> supported by a new instance variable in ClassDescription.
>
>       Each version of each class is identified by a ClassID, a byte
> array with segments for the class's baseID, author UUID, and a
> sixteen-bit version. This means we can uniquely identify, for each
> author, 65,535 versions of each class in the system. Since we identify
> authors by UUID, the number of possible authors is very large.
>
>       Each version of each method is identified by a MethodID, a byte
> array which contains a ClassID and segments for the method's selector,
> author UUID, and a sixteen-bit version. This means we can uniquely
> identify, for each author, 65,535 versions of each method in each
> version of each class in the system.
>
>
> method editions and method literal markers
>
>
>       Each MethodEdition holds a reference to the corresponding
> ClassEdition, the method source code, and the information needed to
> reconstruct the corresponding CompiledMethod directly, without need of
> the compiler (the method header, initial and final program-counter
> values, method literal markers, and instructions). If one will never
> use the history memory to install methods in a subject memory that
> lacks a compiler, one could drop the compiled method information to
> save space.
>
>       Method literal markers are used to transmit a compiled method's
> literal frame values between object memories. There are method literal
> marker classes to support references to classes, class variables,
> other pool variables, and literal objects, and to support methods
> which perform class-side super-sends. Each method literal marker
> instance knows how to serialize itself as part of Spoon's remote
> messaging system. In particular, when a method literal that refers to
> a class transmits itself, it transmits the ClassID of that class, not
> the name of the class.
>
>       This gets at the namesake concept of Naiad, "Name And Identity
> Are Distinct". When referring to a class, we never need to use its
> name. Each version of each class is an object with a distinct
> identity. By using ClassIDs to refer to each of them, we can avoid
> using class names at all when storing history or distributing
> code. This means that name of each class can be anything, as far as
> the system is concerned.
>
>       With every class name unconstrained, there is no need for
> "namespaces" to distinguish between classes which happen to have same
> name at some point in time. Each class effectively has its own
> namespace, since it is uniquely identifiable regardless of its
> name.
>
>       Developer tools armed with this information can resolve ambiguity
> for humans browsing and changing the system. If a developer writes a
> method which uses a name shared by multiple classes, the system can
> present more information about each of those classes (such as the
> author, time of creation, version, and module association), so that
> the developer can choose the intended one. When browsing such a
> method, the system can distinguish the aliased class name visually,
> indicating that there is disambiguating information available.
>
>
> class editions and shared variables
>
>
>       Each ClassEdition holds the editions for all the method versions
> currently active in the corresponding class in the subject
> memory. Since every edition keeps a reference to its previous and next
> editions, one can trace the history of any method by starting at the
> active edition. Removed methods are represented by method editions
> which have the same MethodID as a normal previous method edition, but
> with the rest of the fields set to nil.
>
>       Each ClassEdition also holds the information needed to
> reconstruct the corresponding class directly, without need of the
> class builder. For all classes, this includes the format, instance
> variable names, and superclass ID. For protoclasses, it also includes
> the class pool keys, class name, and received pool IDs.
>
>       In Spoon, every shared variable pool is the responsibility of
> some class in the system. There is no global variables pool ("system
> dictionary"). Each class that defines a pool is said to "publish" that
> pool; classes which use that pool "receive" it. Spoon adds an instance
> variable to Class to map published pools to their names. Each
> ReceivedPoolID that a protoclass edition uses is a byte array which
> contains a class ID and a published pool name.
>
>
> checkpoints and modules
>
>
>       A Checkpoint edition is simply a named marker of a particular
> point in time. A developer may use checkpoints to indicate various
> interesting states of development, and use the tools to regress or
> replay edits made before or after that time.
>
>       The largest unit of work is represented by module editions. They
> are named collections of method IDs, indicating the specific versions
> of methods which comprise a module, along with sets of child, parent,
> prerequisite, and postrequisite module editions. When a module edition
> is transferred out of a history memory, those edition references are
> transmitted as ModuleIDs. Each module edition also has an
> "antimodule", a module edition calculated at installation time by a
> receiving system which, if applied, would undo the changes made by
> installing the original module. Finally, each module edition has a URI
> by which someone at a remote site may install the module.
>
>       That URI represents a command to a Spoon system running on a
> requestor's local machine; it refers to a standard port on
> localhost. Its path is a text-encoded action, containing an
> instruction (in this case "install a module"), the hostname and port
> of a Spoon system providing the module, and the module's ID. The
> receiving system uses this information to request the module from a
> providing history memory, which then transmits editions as
> necessary. Exactly which editions are transmitted depends on the state
> of the receiving system; this is a two-way conversation between the
> providing and receiving systems. This is often more time and space
> efficient than simply providing all of a module's code, which is what
> happens with traditional static representations like change sets.
>
>       The URIs may be cited on ordinary webpages, which are indexed by
> search engines like Google. A person in search of a module for a
> particular purpose can search for it with a web browser, using those
> search engines. Having found a module's URI, the person can click on
> it, establishing a connection to an embedded webserver in their local
> Spoon system, which carries out the URI's command.
>
>       This mechanism for code distribution avoids storing code in
> static files. It's a deparature from Smalltalk's traditional "fileout"
> mechanism.
>
>       The encoded URIs can serve other functions as well, such as
> listing a system's installed modules, removing an installed module,
> making a snapshot, and quitting the system. In this way one can use a
> web browser to interact with a Spoon system for several basic tasks;
> this is especially useful when the system is headless (e.g., in its
> initial minimal state).
>
>
> comments and tags
>
>
>       Editions for authors, classes, methods, checkpoints, edits, and
> modules each have their own comment and tag editions. This means each
> one of those artifacts has a comment and tags, and the changes in both
> are recorded over time. Comments are as we've already been using them:
> they're explanatory prose about the artifacts. Tags may be familiar to
> you from the web; they are short semantic markers used for grouping
> similar artifacts.
>
>       I intend for tags to replace class and method
> categories. Nominally, we've been using class and method categories to
> establish semantic hierarchies, but the hierarchies have turned out to
> be quite shallow. Although we can form hierarchies with tags as well,
> I think we would do better to apply the sorts of algorithms that
> search engines use, and not concern ourselves with memorizing an
> artifact's semantic markers. The computational cost this incurs for
> the tools might have been high in the early days of Smalltalk, but it
> is quite modest now.
>
>
>       Thanks for reading! Please let me know of any questions or other
> feedback, and feel free to discuss this on the Spoon and Squeak-dev
> mailing lists.
>
>
> -C
>
> [1] http://netjam.org/spoon/naiad
> [2] http://netjam.org/spoon
> [3] http://en.wikipedia.org/wiki/Universally_Unique_Identifier
>
> --
> Craig Latta
> improvisational musical informaticist
> www.netjam.org
> Smalltalkers do: [:it | All with: Class, (And love: it)]
>
>
>
>