Squid plan

Fri May 16 19:47:54 UTC 2003

>
>
>>In a previous design I had "segment owners", as suggested by Michael. My 
>>policy was:
>>
>> - if the owner of a read-only (for others) segment was not currently 
>>logged in the local network, then the segment is replicated
>>
>> - if the owner is logged in and tries to access an object, the segment 
>>is moved to his node(s) [(replicas are removed)]
>>
>> - if the owner is logged in and another user tries to access and 
>>object, the message is forwarded to the current location of the segment
>>
I missed this email. I'll be studying replication in lots of detail for 
virtual machines / programming languages "real soon now". I'm currently 
discovering E and how amazing it is.

I don't think it's a good idea to make "policies" quite just yet. That 
is done, e.g. in Orca (Henri Bal et. al?? umm.. www.cs.vu.nl/orca, I 
think), which I've got a bit of experience with. It locks you into one 
form of replication, which admittingly is quite well done, but only 
suitable for low-latency high-speed networks like the DAS cluster at our 
uni (Gigabyte myrinet is *all* good :-) ). If you have high-latency 
networks like the Internet, you need a different form of replication.

Migrating read-only segments may be a bad thing. E.g. it may be the case 
that you want to specify a 'home location' for that segment, e.g. on a 
trustworthy server. Should that object migrate to the PHB's (bosses) 
notebook before he/she takes it home, it could be a bad thing. There are 
also other reasons. Maybe that segment contains sensitive information, 
and you don't want it migrating to your computer over a public wireless 
network at a cracker's festival.

Anyway, to make a long story short, just give the user a choice of 
replication algorithms and enough room to make his own. I'm busy working 
out how to do this, and I'm hopefully planning to write something up 
soon. You need, for starters:

- a way for the user to catch reads and writes to instance variables. 
This is used for most replication algorithms. You've already got this.

- a way for the user to catch messages to a particular object (i.e. 
using a 'proxy' object inheriting from ProtoObject and using 
doesNotUnderstand:, or by hacking a kernel aka Squeak-E.). This is used 
for active replication.

- something like a "ClassServer" which allows a user to download 
classes. This may also use the same replication framework (after all, 
Classes are Objects, which is quite handy when you want to replicate 
them :-) ).

I think this is what you need as base language support, although I'm not 
entirely sure yet. I have to try it first. A proper replication pattern 
(e.g. the one used in Globe) is probably needed.

>I like it.  But I'm leaning towards replicating an object and its free
>children, not whole segments.  I know an object and its free children
>could actually define a segment, but I am defining a segment more
>broadly as any cut/region/subgraph of the global object graph.  Each
>segment resides on a specific machine.  Instead of segments being
>magically distributed/replicated, objects are.  This allows us to
>implement the magic in the object domain instead of at a lower-level
>(another example of moving thing out of the VM and into the image).  If
>we still want to group objects, we can add owner fields to the objects
>like you are suggesting.
>
Hmm.. owner fields. That would mean that *every* object has an owner 
field. Would it be fair to say that a typical Smalltalk object is quite 
small? I.e. a few instance varibles, a class pointer, and a few other 
things? An ownership field would be a lot of overhead. I admit I don't 
know a way to get rid of it. Maybe an optimisation would be possible - a 
segment ("a group of objects with a common owner") can exist in a 
contiguous block of defined memory and be moved around as such. That 
would make migration and persistance dead easy - just slam that block of 
memory to disk or across a network (after GCing+compacting, resolving 
remote references and checking endian-ness). That requires help from 
whatever memory manager thing you use. Oh, and this would also enable 
yet another form of replication - just pause a segment, update it's 
underlying data and restart the segment.

Another advantage of replicating segments instead of sole objects is for 
performance. Objects are small - too small for migration. Your 
application would thrash across the network, migrating object after 
object as they are needed. By collecting and migrating objects together, 
you can help reduce the time lost to network latency. Segments are also 
easier to replicate. If the majority of communication is within a 
segment (think list traversals) then only a small fraction of 
communication is with objects not in that segment graph. Thus, less 
replica updates.

And a small observation I just made: migration is a form of replication.

>Instead of a single roots array, there is one per outside segment that
>references it.  So each segment knows every other segment that has
>references to its objects.  When an object is no longer referenced
>locally but is still referenced from the outside it is moved to one of
>these segments and the others are updated to point to it.  If there are
>no more outside references then it is simply garbage collected.  Hence
>we have distributed garbage collection.
>
Cool! Gooey segments :-). Might need to work it all out on paper first 
though...

>>  http://www.merlintec.com:8080/software/8
>>
Interesting. Don't know if compressing segments is worth the effort 
though. Computers have lots of memory and big hard disks; if you're not 
being overly wasteful then a bit of overhead doesn't hurt anybody.

Michael / Mikevdg.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20030516/6f4873ce/attachment.htm