<div dir="ltr">Hi folks,<div><br></div><div>Sorry for the off-topic post. I&#39;m posting it here because I know there are lots of high-powered comp-sci folks lurking, and I hope to benefit from their wisdom.</div><div><br></div><div>I have in mind a simple problem. Imagine we have a data structure in the memory of some program. The exact structure it has doesn&#39;t matter, but does have structure; it&#39;s not just a buffer full of bytes. For argument&#39;s sake, we&#39;ll assume it&#39;s a tree.</div><div><br></div><div>Our task it to copy this tree. We want to make another tree in memory that is logically equivalent to the first. This is pretty straightforward. We need to walk the tree, allocating new nodes and copying over any internal values they contain, then recursing into the children. So far so good. But what if we generalize the problem?</div><div><br></div><div>One way to do that would be to make the copy more distant from the original. We could copy into a different address space, in another process or on another machine. We could make the copy more distant in time. Perhaps we need to reclaim the memory that our tree uses, and the reconstruct it later. </div><div><br></div><div>This means we&#39;ll have to do IO of some kind. The simplest adaptation of our tree-walking algorithm might be to allocate space for the nodes on disk, rather than in memory. This is really simple if we have access to raw disk storage, but gets a little more complicated (and slooow) if we&#39;re using files in a file system. We can gain back some simplicity by putting all the nodes in one file, and having some kind of internal structure to the file that lets us recover the nodes and links between them. We could also allocate storage space via communicating with another process. That might be over a network, or using some kind or IPC mechanism supplied by the operating system. Once again we&#39;ll need to transmit the data within the nodes, along with some metadata describing the connections between them. </div><div><br></div><div><div>We can also loosen our definition of &quot;logically equivalent&quot;. We might want to construct an equivalent tree in a process running a different version of our program. Or another program entirely, potentially written in another programming language. This forces us to raise the level of abstraction. We can&#39;t just copy each node as a blob of raw memory, and assume that the &quot;receiving&quot; program will know how to interpret it. We need some semantic definitions agreed upon by the two programs, and some means of representing those semantics as byte sequences that can be copied between them.</div></div><div><br></div><div>Now, this is not an intractable problem. We do it all the time. In fact, I&#39;d say a large fraction of the code I&#39;ve written in my career has been a solution to some version of this problem. I&#39;m sure many of you can relate. :-)</div><div><br></div><div>So what is this problem called? What theory describes the possible solutions? Are there classes of solutions that have similar trade-offs? Where can I learn more?</div><div><br></div><div>Colin</div></div>