Feedback on Naiad documentation

Tue Dec 30 01:36:47 UTC 2008

Hi Jerome--

 > > By separating class name from identity, Naiad makes Smalltalk more
 > > approachable for newcomers, and more productive for developer and
 > > user communities.
 >
 > Expand on this.

      Well, that's what I attempt to do in the rest of the document. :) 
The first four paragraphs are an abstract, not a conclusion. I don't 
expect anyone to simply believe what I've said by that point; I'm 
summarizing what I intend to describe.

      But, expanding... By separating class name from identity, we 
remove a critical source of ambiguity, and we can transfer methods more 
accurately, with less manual labor. We can do this transfer both between 
an author's system and other people's systems, and from an author's 
system to itself at another point in time. This makes every person's 
work more amenable to study by anyone else in the community. It also 
makes it easier for each person in the community to express their ideas 
clearly through their work. I think this combination of transparency and 
expressiveness would make Smalltalk more approachable for newcomers, and 
more productive for developer and user communities.

 > What has identity?

      In this paper I'm referring to the identity of classes.

 > On what does the identity depend.

      I'm taking advantage of the fact that every class, simply by being
a Smalltalk object, has its own identity by definition.

 > How does the Class name get detached from identity.

      I detach the name of a class from the identity of that class by
offering an alternative to source code as the medium of code transfer.

 > What does the class name represent?

      The name of a class represents a class, but exactly which one is
ambiguous in source code. One needs additional context to identify the
class to which a name refers at some point in time. The mechanisms
described in the rest of the paper provide that context, and go so far
as to make source code optional when transferring code.

 > > An Edit represents the activation of some edition at a point in
 > > time.
 >
 > Describe activation.

      When an edition is activated, the compiled method to which it
corresponds becomes active in the object memory whose history we are
tracking.

 > > The history memory replaces the current changes and sources
 > > files
 >
 > I need diagrams.

      Okay, I'll make diagrams. (I'm not against them, but my time is 
limited and I think the text is a higher priority. Or, put another way, 
I think diagrams without text would be harder to understand than text 
with no diagrams, and getting one out before both are done is better 
than waiting for both.)

 > What does subject memory id represent? When in time do subject memory
 > id's change?

      A subject memory's system ID identifies the subject memory. A
subject memory's ID is set when the memory is considered distinct from
other memories. I set the system ID of the minimal memory before
releasing it, and I imagine that memory will change its ID when next 
saved by someone receiving it. Traditionally, the identity of an object 
memory remains the same after a snapshot via "save", and it changes 
after a snapshot via "save as...". I expect the system ID to reflect 
that; generally, the system ID for a memory doesn't change.

 > I.e. The subject memory with two subject ids, how do they differ?

      There's no such thing as a subject memory with two system IDs
(please let me know if I slipped up and wrote otherwise somewhere).

 > > The subject memory keeps a remote reference to the history
 > > memory's instance of EditHistory as a class variable of the local
 > > EditHistory class, and interacts with it using utility messages sent
 > > to the local EditHistory class. The history memory also keeps that
 > > EditHistory instance as a class variable of its local EditHistory
 > > class, but as a local reference.
 >
 > Explain Remote messaging as it applies here.

      I'm using the term in the usual sense: an object in one memory is 
sending messages to an object in another memory, by interacting with a 
special proxy object that represents the remote object. Since everything 
in Smalltalk happens by sending messages to objects, the proxy object is 
indistinguishable from the remote object under normal circumstances.

 > Who or what receives the messages?

      An instance of EditHistory in the history memory receives the 
messages.

 > Why are [the messages] Remote?

      By design, all the historical information for the subject memory 
is kept in a separate history memory. This is both to enable crash 
recovery, and to ease separation of a deployed system from its 
historical information when the time comes.

 > What is a History memory snapshot?

      The history memory is a bunch of objects describing all the 
editions of a subject memory's classes, methods, authors, modules, 
checkpoints, comments, and tags.

 > Where is the History memory located?

      The history memory is usually located alongside the subject 
memory, but it can be on any net-accessible machine.

 > Where is the snapshot located?

      The subject memory is located wherever the developer wants to put 
it, as now.

 > > An edition typically elides some of its references when it is
 > > transferred out of a history memory. For example, a transferred
 > > edition will usually omit the references to its next and previous
 > > editions.
 > >
 > > The requesting subject memory can calculate the ID of those
 > > editions and obtain them with a separate request, if necessary.
 >
 > How?

      In general, a next or previous edition's ID differs only in 
version number, and version numbers form a simple linear sequence over time.

 > Also if it can do that then you haven't really elided the
 > references only made them implicit?

      I'm referring to how the system leaves the reconstructed edition's 
nextEdition and previousEdition fields set to nil, rather than going to 
the trouble of serializing further editions. This is because they're 
often not relevant to the original edition request. Following the 
nextEdition and previousEdition references exhaustively during 
serialization would take a significant amount of time and net traffic, 
for no good reason in most cases.

 > > A subject memory may elect to keep its EditHistory instance as a
 > > local object, such as in a situation where one wants some limited
 > > immutable history for debugging purposes, and no crash recovery
 > > support. Whether in this scenario or in normal development the same
 > > EditHistory utility messages suffice, since no special code need be
 > > written to support remote objects.
 >
 > How do histories know about each other?

      Someone using a history memory can decide to connect it to another 
history memory, if the person using the latter memory approves. 
Typically, someone who has discovered a module (using mechanisms 
described in the "checkpoints and modules" section) requests a 
connection with a history memory that has the module. (Once connected, 
the history memories synchronize so that the receiving system has the 
module, with a minimum of traffic.)

 > Can histories move about in space and time?

      You can put them in any net-accessible place. I'm not sure what 
you mean about time here...

 > Clone themselves?

      Yes (you might want to do this when discarding rarely-used 
editions, for example).

 > How is the mesh kept track of?

      Each system keeps track of the other systems to which it is 
connected, and can report this information if permitted by the person 
using the system. Typically, one can ask any system in a connected set 
for the IDs of the whole set.

 > What is [the] minimal subject memory?

      The minimal subject memory contains the objects needed to start 
and extend the system (and nothing else).

 > How (and who) releases it? initially?

      I'm preparing it for release, to be available from websites and 
other download locations. I expect this work to be easier, doable by 
more people, and less frequent as the module distribution mechanism 
matures. Over time, I envision nearly all activity taking place in 
modules which are added to and removed from the minimal memory, not in 
the minimal memory itself. At some point, I expect analysis to prove it 
minimal, and that further work on it will be rare.

 > How easily (frequently) do versions get created?

      One may create them as easily as before, and faster than one is 
ever likely to create them from interactive use. Human activity will be 
the rate-determining step. I don't expect developers' traditional 
artifact-creation rates to change radically with the introduction of 
this system (it's the quality of the artifacts that I expect to change, 
  for the better :).

      The traditional maximum UUID allocation rate (as cited by the 
Leach/Salz UUID specification, for example) is ten million per second 
per machine. Since UUIDs are used to identify classes and authors, I 
don't think we'll have a problem distinguishing one class from another, 
or one author from another[1]. Each author may create 65,535 versions of 
each class in the system, as fast as remote messages can be sent between 
the subject and history memories on a single machine. That rate varies 
by hardware; for the sake of argument, let's say it's something very 
slow, yielding 100 classes created per second. I think that's far more 
than was ever likely to occur from interactive use in the past, and even 
acceptable as part of the automated installation of an application.

      For each version of each class that each author has created, the 
author may create 65,535 versions of each method. This means that, for 
example, if you add an instance variable to class Foo, you can now 
create 65,535 more versions of every single method that existed for Foo 
before (because you've made a new version of class Foo). So for each 
class/selector pair, you can create 4,294,836,225 distinct method 
versions (four billion versions of Array>>new, for example). And each 
other developer could create their own four billion total versions of 
that one method.

      I think it's very unlikely that anyone has created even ten 
thousand versions of any particular method across *all* versions of the 
class that holds it, much less for only one version of that class. It 
also seems unlikely that any class in Smalltalk has gone through even a 
thousand versions from the same author in the entire history of the 
system so far.

      I understand it's tempting to draw a comparison between these 
claims and previous predictions which turned out to be wrong. (I suppose 
the most infamous example is Gates' apocryphal[2] exclamation that 640K 
of memory should be enough for anyone.) But I think there's an important 
difference here: we're estimating the capacity of people to produce, not 
to consume. Also, the medium is relatively obscure; we're talking about 
development artifact indices here, not the size of the artifacts 
themselves (although we have some practical limits imposed on us there, 
too).

      All that said, one could easily make variable-length versions. I 
just think it's space overhead on every ID (and therefore time overhead, 
during transmission) that's not worth paying.

 > How big is a MethodEdition? a ClassEdition?

      The size of a MethodEdition varies depending on whether the 
method's instructions or source are included, and, if so, how big they 
are. The size of a ClassEdition depends on how many methods are 
associated with the class it describes. I assume you're asking for 
aggregate sizes, following all references. Otherwise, they're on the 
order of 32 bytes each.

 > When in time and space do they reside?

      They occupy long-term space in the history memory, and briefly 
occupy space in the subject memory (in response to queries by 
development tools).

 > How does one construct the compiled method directly?

      You can create a compiled method given a desired header value and 
number of instructions. Then you set the literal values appropriately 
(in this case, from a method edition's literal markers), and set the 
instructions. After that you can install it in a class and run it. It's 
the literal markers which make this all work. Just as one can create 
compiled methods directly, one can also create each kind of method 
literal directly. Literal markers contain the information needed for 
that construction, and can carry it out.

 > What is missing is the basic "How do I start from a seed and build a
 > sunflower". <How do things start and grow the metephor refers to a
 > current squeak project of mine>

      Well, I don't think that's missing. You start with a system that 
has enough methods in it to start and accept more methods, then add 
methods written by yourself or others. The essence of it is not complicated.

 > > Method literal markers are used to transmit a compiled method's
 > > literal frame values between object memories. There are method
 > > literal marker classes to support references to classes, class
 > > variables, other pool variables, and literal objects, and to support
 > > methods which perform class-side super-sends. Each method literal
 > > marker instance knows how to serialize itself as part of Spoon's
 > > remote messaging system. In particular, when a method literal that
 > > refers to a class transmits itself, it transmits the ClassID of that
 > > class, not the name of the class.
 >
 > I still at this point don't understand why this is a good thing. Why
 > would I, as a human, not want to know the difference between, say,
 > True and False?

      I'm not trying to obscure the identity of anything. On the 
contrary, I want to introduce identifiers which are less ambiguous than 
the textual ones we've used before. When a person says "class Foo", it's 
not clear to another person, much less a machine, which version of class 
Foo they mean, and by which author. Also, I don't intend for developers 
to ever see things like ClassIDs unless they want to. At any given 
moment, the browsers we've been using before can display an appropriate 
textual equivalent for an ID.

 > > This gets at the namesake concept of Naiad, "Name And Identity
 > > Are Distinct". When referring to a class, we never need to use its
 > > name. Each version of each class is an object with a distinct
 > > identity. By using ClassIDs to refer to each of them, we can avoid
 > > using class names at all when storing history or distributing
 > > code. This means that name of each class can be anything, as far as
 > > the system is concerned.
 >
 > Ok then. For human understanding how do I get back to names?

      You will never use or see ClassIDs in method sources (the browsers 
see to that, aided by the information in class editions). ClassIDs are 
what machines use to refer to classes when transferring compiled methods.

 > > ...every shared variable pool is the responsibility of
 > > some class in the system.
 >
 > If I wanted to refer to a pool variable how would I do so?

      When composing method source, the same rules as before would 
apply. The benefit I propose is always being able to select a shared 
variable that you see in method source, then having the browsers tell 
you which class is responsible for it (and, in turn, which *authors* are 
responsible for it).

 > Where in time and space do Checkpoints live?

      Checkpoints are the same as other editions in that regard. They 
occupy long-term space in the history memory, and briefly occupy space 
in the subject memory (in response to queries by development tools).

 > What are postrequisite [modules]?

      A postrequisite module is a module that must be loaded as a 
consequence of loading some other module. Conceptually, it's the 
complement of a prerequisite.

 > How does a module know them?

      Each module has an instance variable which is a collection of 
postrequisite modules (see the object model summary in the Naiad design 
document). There's another instance variable for prerequisites, too.

 > Who transfers a module out of a history memory? <What dialogue is
 > taking place at this point and among which parties?>

      A module is transferred from a providing history memory to a 
consuming history memory, when the consuming history memory requests it. 
The transfer is done via remote message-sending.

 > > ...each module edition has a URI by which someone at a remote site
 > > may install the module. That URI represents a command to a Spoon
 > > system running on a requestor's local machine; it refers to a
 > > standard port on localhost. Its path is a text-encoded action,
 > > containing an instruction ([for example] "install a module"), the
 > > hostname and port of a Spoon system providing the module, and the
 > > module's ID.
 > >
 > > ...
 > >
 > > The encoded URIs can serve other functions as well, such as
 > > listing a system's installed modules, removing an installed module,
 > > making a snapshot, and quitting the system.
 >
 > How?

      The instruction part of command URI can indicate one of several 
commands, such as the ones mentioned above. For example, an instruction 
value of one could mean "install a module", two could mean "list the 
local system's installed modules", and so on. For each instruction, 
there are parameters one needs to carry out the instruction (for 
installing a module, those parameters are the hostname and port of a 
Spoon system providing the module, and the module's ID). The instruction 
and all the parameters are concatenated and encoded as text to form the 
path of the command URI. Any system that implements the specification 
can interpret the command URI and carry out its instruction.


      thanks for the comments!

-C

***

[1] Consider, for example, the number of humans who have ever lived. Two
     hundred billion seems to be a generous guess currently; see, e.g.,
     http://tinyurl.com/9f7s5x (wikipedia).

[2] www.itbusiness.ca/it/client/en/CDN/News.asp?id=48924

--
Craig Latta
improvisational musical informaticist
www.netjam.org
Smalltalkers do: [:it | All with: Class, (And love: it)]


-- 
Craig Latta
improvisational musical informaticist
www.netjam.org
Smalltalkers do: [:it | All with: Class, (And love: it)]