[OT] Insight on distributed computing wanted

Wed Jun 16 10:08:41 UTC 2004

Hi Martin.

This is what I worked on for my master's project - at dpon.sourceforge.org 
(big mess, will clean one day) and written using Squeak. I never got it 
working but I'm still trying to get some basic things working.

On 14 May 2004 17:55:37 +0200, Martin Drautzburg 
<martin.drautzburg at web.de> wrote:

> I've been pondering over the question:
>
>         "information is no much easier to transport than material
>         objects, so why is distributed computing so difficult".

Because the context of information is very specific, whereas the context 
of material objects (i.e. a physical world, 3 dimensions plus time, plus 
all the rules of physics) is quite big. Material objects can be moved 
anywhere in 3-D space because its still the same context.

> 	Objects ultimatively reference their worlds. A Canvas object
> 	eventually references my graphics card, my monitor and the
> 	eyes and the brain of me, the user. None of these objects can
> 	be transported easily, so the boundary between a Canvas object
> 	and its world is definitely above the graphics card.  The
> 	graphics card and all other material objects are
> 	"immobile". Maybe even the Canvas object itself is immobile.

My solution was to couple each object with a ReplicationAlgorithm object. 
All messages sent to the object are captured and processed first by its 
ReplicationAlgorithm. This meant that the distributed "aspect" of that 
object (i.e. behaviour and state of the replication) is separated from the 
normal.. er.. behaviour of the object.

I'm not very familier with the Canvas class yet... I haven't done any 
graphics programming in Squeak. If the Canvas object is the wrapper around 
the physical device, then yes, you'll need to use remote invocation. If 
its just another layer of abstraction and it doesn't need any access to 
plug-ins or hardware, you could use another replication algorithm, such as 
having replicas placed on any computer that wants to use that object, and 
using some consistency protocol keep the replicas up to date with each 
other.

It does get hairy quickly though.

> 	In any case there are mobile and immobile Objects.

Well, there can be replicated objects, where there's usually one replica 
per machine. Examples of replication algorithms are:
- remote invocation / call-by-reference (one central replica, pack the 
message up, send it off, wait for reply).
- migration / call-by-value (fetch the object, send it a message locally)
- master/slaves (read from a local replica, write to a central master - 
for state only).
- broadcast (send all messages to all replicas

and then of course your imagination is the limit.

> 	When the world of an object is replaced the object needs to
> 	attach itself to a new world. If the object should expose a
> 	"similar" behaviour in the new world, then the two worlds must
> 	be reasonably alike. The same is true for material objects.

Yea, this is the hairy bit. Essentially, if you migrate an object, then 
everything it has a relationship with must also be replicated. At the 
object's destination, every reference the object owns must be converted to 
some form of remote reference.

Be careful with your 'physical objects' analogy. The word "Object" was 
perhaps a poor choice of words in Smalltalk. A better name for the 
entities in Smalltalk would be "Concept". The physical "Object" could be 
considered a sub-class of a "Concept" which can only exist in a 3-D 
physical world. Concepts themselves consist only of relationships with 
other concepts, with exceptions for things like numbers and characters. It 
is those relationships which form the context of that concept. If you take 
the concept out of that context, it becomes meaningless because its 
essence - the relationships - no longer have meaning.

The solution is to ensure that a Concept (/Object)'s relationships remain 
valid after moving or migrating that Concept/Object.

> * HIDING INFORMATION
>
> 	You typically don't want to expose too much information. An
> 	object on the sender side may "know more" than is relevant for
> 	the receiver. Likewise the object on the receiver side may
> 	know more about the receiver than is relevant for the sender.

Good point. This is one of the hard parts of distributed computing - it 
can be difficult to encapsulate the distributive nature of an object. Lots 
of bad things happen in distributed systems - networks aren't as reliable 
as local processes and memory. Simple things like message sends aren't 
guaranteed to happen, meaning that every invocation could cause a 
distributed exception. Also, like you say, an object sometimes needs to be 
aware of its migration, and adjust itself accordingly. It needs to be able 
to serialize itself and unserialize itself, storing only the relevant 
information it needs to remake itself, and be able to reconstruct its 
context at the destination. This gets quite involved. Serialization is 
hard - what do you do with a 32-bit local reference to another object? My 
solution was to replicate each object that is referenced, and then ask the 
coupled replication algorithm for a serialized reference to it.

> 	This means that objects have to "mutate". They can be in one
> 	of three states:
>
> 	- objects attached to the sender's world
> 	- object detached
> 	- object attachted to the receiver's world

This is essentially serializing and deserializing an object.

Michael.