[squeak-dev] [OT] What is this called?

Colin Putney colin at wiresong.com
Tue Dec 29 20:46:32 UTC 2015


Hi folks,

Sorry for the off-topic post. I'm posting it here because I know there are
lots of high-powered comp-sci folks lurking, and I hope to benefit from
their wisdom.

I have in mind a simple problem. Imagine we have a data structure in the
memory of some program. The exact structure it has doesn't matter, but does
have structure; it's not just a buffer full of bytes. For argument's sake,
we'll assume it's a tree.

Our task it to copy this tree. We want to make another tree in memory that
is logically equivalent to the first. This is pretty straightforward. We
need to walk the tree, allocating new nodes and copying over any internal
values they contain, then recursing into the children. So far so good. But
what if we generalize the problem?

One way to do that would be to make the copy more distant from the
original. We could copy into a different address space, in another process
or on another machine. We could make the copy more distant in time. Perhaps
we need to reclaim the memory that our tree uses, and the reconstruct it
later.

This means we'll have to do IO of some kind. The simplest adaptation of our
tree-walking algorithm might be to allocate space for the nodes on disk,
rather than in memory. This is really simple if we have access to raw disk
storage, but gets a little more complicated (and slooow) if we're using
files in a file system. We can gain back some simplicity by putting all the
nodes in one file, and having some kind of internal structure to the file
that lets us recover the nodes and links between them. We could also
allocate storage space via communicating with another process. That might
be over a network, or using some kind or IPC mechanism supplied by the
operating system. Once again we'll need to transmit the data within the
nodes, along with some metadata describing the connections between them.

We can also loosen our definition of "logically equivalent". We might want
to construct an equivalent tree in a process running a different version of
our program. Or another program entirely, potentially written in another
programming language. This forces us to raise the level of abstraction. We
can't just copy each node as a blob of raw memory, and assume that the
"receiving" program will know how to interpret it. We need some semantic
definitions agreed upon by the two programs, and some means of representing
those semantics as byte sequences that can be copied between them.

Now, this is not an intractable problem. We do it all the time. In fact,
I'd say a large fraction of the code I've written in my career has been a
solution to some version of this problem. I'm sure many of you can relate.
:-)

So what is this problem called? What theory describes the possible
solutions? Are there classes of solutions that have similar trade-offs?
Where can I learn more?

Colin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20151229/0265c071/attachment.htm


More information about the Squeak-dev mailing list