[e-lang] [RFP] cross-language object serialization (E <---> Squeak-E)

Tyler Close tyler at waterken.com
Tue Jan 21 12:26:32 UTC 2003


On Monday 20 January 2003 22:33, Mark S. Miller wrote:
> Tyler, I would like to hear your list of technical advantages.

I only recently studied CDR in depth, so this is not an exhaustive
list.

1.  CDR does not support a streaming protocol.

Any <value> in a CDR stream can be a <value_ref> that refers to a
previously serialized <value>. This means that the receiving side
must buffer all previously received <value>s, in case a future
<value_ref> refers to it. This makes it impossible to model a
messaging connection as a single stream of objects, since the
receiving side would be unable to garbage collect any of the
received objects.

Consequently, CDR is really only useable for one-shot
transmissions. A CDR messaging connection must be modeled as a
series of these one-shot transmissions. This effectively
eliminates the "compact" feature of a binary encoding, as all the
meta data must be repeated in each transmission. As a result, I
suspect that a typical CDR message is no more compact than an XML
representation would be.

The doc-code encoding support a streaming protocol. The moral
equivalent of a <value_ref> is a
file://waterken.com/doc/pointer/Embed node refering to an object
bufferred on the sender-side, not the receiver-side.

2.  The "chunking" support in CDR is a hack.

In CDR, a transmitted value is prefixed by the length of its
encoding. This means that after encoding a value, an encoder must
backtrack to fill in the length prefix for the value. This means
that the encoder must buffer the entire serialization until it is
completed.

To alieviate this buffering requirement, you can wrap chunking
syntax around the encoding.  A chunk must terminate at a value
encoding boundary.  It is not possible to end a chunk in the
middle of a primitive value, such as a string.  This makes it
impossible to chunk a large primitive, such as a long text, or an
image.

The doc-code encoding does not require backtracking during
encoding. You can break and transmit a doc-code encoding at any
arbitrary point.

3.  Unicode strings can only be encoded in their fixed length 2
bytes per char encoding. UTF-8 is not supported.

doc-code supports arbitrary charsets.

4.  CDR has a large number of built-in primitive types, most of
which are not applicable to communications between E and Squeak.
Nevertheless, you'ld still have to implement them in order to be
CDR compatible.

doc-code has no hard-coded types. An application need only support
the types it actually uses.

5.  CDR doesn't provide a standard for transmitting many of the
types that E and Squeak do use, such as BigInteger.

The Doc Schema specification provides standard encodings for a
base set of types that are useful for E to Squeak communication.

I am going to stop studying CDR now, but I think the list to date
is already very damning.

Tyler



More information about the Squeak-dev mailing list