[e-lang] CapTP serialization format (was Re: Alan Kay in the News)

Tue Apr 30 17:47:41 UTC 2002

Mark,

    As the CORBA apologist, I just have to chime in...  :-/

> 1) By "platform neutral" I don't mean that it can't favor or be based on
one
> platform (such as Squeak), but rather that it not impose a level of pain
on
> other platforms significantly beyond what a truly neutral format would
> impose.  For example, Java's RMI, despite all its faults, is a better
> "platform neutral" system than Corba.

I think our assumptions about "better" weight different factors...

> While RMI is built to support Java
> specifically, it is less painful for other platforms than is Corba.

I'd appreciate references to experiments that demonstrate this. It could
well be true. It is also worth noting that RMI is also lighter-weight and
therefore less general than CORBA. Depending on system constraints, this
will obviously favor one over the other.

> Corba
> imposes similar pain on everyone, and that level of pain is higher for
> everyone.  I'm more interested in standards which reduce the general level
> of pain than in ones which allocate pain "fairly".

I object to the use of the word "standard" in this context. I think you mean
"platform" here.

> A platform independent
> system, like Corba, is just covertly yet another platform,

Nothing covert - OMG has recognized CORBA as an independent platform at
least since I started participating in 1996!

> and usually one
> less well designed than the platforms it's trying to be independent of.

Of Java I would agree, too :-D  At least CORBA is designed - Java is cobbled
together under unreasonably tight deadlines, which leads to it being timely,
but not necessarily well built. All (except timeliness) complaints that are
often aimed at CORBA and MS as well.

The statement as spoken is true; however, as a long-time observer of the
CORBA development process, I would claim that the implicit assertion that
CORBA is (intentionally) less well-designed is untrue. I would agree that it
may not be as effective as more recent equivalent protocols (RMI is NOT an
equivalent protocol - merely similar). But this is due more to the fact that
GIOP is older (~1993) and doesn't benefit from more recent research.

In general, I have the impression that you're not making a sufficient
distinction between the run-time CORBA infrastructure (the ORB) - which
isn't particularly relevant to this discussion - and the Common Data
Representation used as the normalized format in CORBA message - which is
VERY relevant to this discussion. Whether or not you choose it, I think that
you should consider the CDR as a possible serialization format. More
specifically, after looking at the requirements below, I think that you
should consider CDR-encoded OMG Objects-by-Value (IDL valuetypes) as an
option. The URL for this aspect of the CORBA spec is
http://www.omg.org/cgi-bin/doc?formal/01-12-43

> So don't be shy about proposing Squeak-based or Parcels-based
serialization
> formats.
>
> 2) From previous discussion on e-lang, I feel we must have both a textual
> and binary serialization format, designed together, or one derived from
the
> other, such that conversion in either direction is meaning preserving.  A
> connection will start out textual, and will switch to binary only after a
> text-based negotiation about how to proceed.  That way we enable text-only
> processors (humans on a telnet) to play.

With the possible exception of the telnet requirement, I think that
valuetypes would support this, since there is the standard CDR encoding, as
well as a valuetype-to-XML mapping specified :-)

> 3) Both formats must provide good support for upgrade (or class
evolution).
> Java serialization, for all its awful complexity, does this, and I'd like
> one that does at least this well.  My understanding is that Parcels does
an
> excellent job at upgrade, but I'm much less familiar with it.

If this is code evolution, then the valuetype spec has the concept of being
able to identify the "codebase" for which the valuetype is designed. If the
issue is state evolution, the valuetypes include inheritance semantics;
therefore, they can evolve in a backwards compatable way.

> 4) The textual format must be human readable and human editable.  This
> combined with the "meaning preserving in both directions" requirement
means
> that one can edit a binary serialization graph by translating to text,
> editing, and translating back.  Likewise, it enables a logger to log
binary
> network traffic as binary, and then later present it as textual if there's
> interest.  An edited textual log can then be used as a basis of test
> cases.

This is probably hard in the CDR case, but certainly would be possible in an
XML marshalled message.

> 5) The binary format must be able to perform reasonably.  My understanding
> is that Parcels is unmatched here on reading speed, but I have no idea
what
> price it pays on writing.  For CapTP's use, each graph will be written
once
> in order to be read once.  For persistence use, many more graphs will be
> written (one per checkpoint) than will be read (one per revival).  Both of
> these are different than the tradeoffs Parcels was optimized for.

I'll admit that I can't speak to this.

> 6) Both formats must be simple, and easy to document.  Java serialization
> fails this test.

The CDR format is well specified and straightforward. XML should be as easy
as it ever is (opinions seem to vary in this forum).

> 7) The format must be compatible with our security requirements.  In
> particular, since E is only prepared to trust mobile E code, any need for
> executable code in the serialization format must be E code.  Parcels as is
> fails this test of course.

The valuetype serialization carries no code, but does provide a slot to
identify downloadable code that implements the valuetype.

-DMC