MC, P2P, distributed collections

Cees de Groot cg at cdegroot.com
Thu Mar 17 08:10:21 UTC 2005


On Wed, 16 Mar 2005 13:01:11 -0500, Colin Putney <cputney at wiresong.ca>  
wrote:
[zooming in on the meat like a fly going for a nice fresh heap of ....]:
> Instead of a UUID, we use a combination of SHA1 hash and timestamp to  
> identify each version. Each version also has an ancestry, which is a Set  
> of all the hashstamps of the versions that precede it.
>
So, you carry the whole ancestry around, like in MC1.

> For example, if we assume that each node has a consistent clock, (though  
> not necessarily synchronized with other nodes), we can trim ancestry  
> that is older than say, one hour.
>
Something like that might work.

Ok, looks like my list was fairly complete, then. Does anyone mind my  
thinking aloud? No? Good ;)

Situation:
- S1: user U asserts on node N that A is true ("U is a member of community  
X")
- S2: user U asserts on node N that A is not true ("U is not a member of  
community X")
- (user drinks coffee, reconsiders, logs in on the coffeeroom computer)
- S3: user U asserts on node M that A is true ("U is a member of community  
X")
These assertions are broadcast on the network.

Problem:
- On node K, the assertions can arrive in any random order.

Solution 1: the 'mc solution'
- S1 comes with H(S1,timestamp)=VS1
- S2 comes with VS1,H(S2,timestamp)=VS2
- S3 comes with VS1,VS2,H(S3,timestamp)=VS3
This is enough information to let node K find out which assertion is the  
latest one. However, the 'freak' situation:
- S1 and S2 didn't arrive to the coffeeroom computer in time. So user  
wasn't a member on the coffeeroom computer and thus the coffeeroom  
computer accepted the assertion. This is less an exception if you consider  
the coffeeroom computer being off-line - one of the requirements, of  
course, is that you can work while you're off-line.
So:
- S1 - VS1
- S2 - VS1,VS2
- S3 - VS3
Now, how does node K find out what the latest assertion was? S2 or S3?  
This can only be broken either by human intervention of having some global  
notion of time (regarding both S1 and S3 as having a 'null' parent, this  
becomes a regular merge conflict). Some mechanism of bringing the various  
node's clocks together to something "close enough" might be "good enough"  
to help out in solving these conflicts (exchange your local time with your  
peers; note average deviation; make "Vx=H(x,timestamp),timestamp" and let  
timestamp be corrected for this deviation. As it is only for coarse timing  
and exceptional cases plus inter-node time comparisions only (intro-node  
goes with the ancestry mechanism), we could let clocks run backwards when  
correcting, etcetera, which makes synchronisation a whole lot easier).

The solution that you keep a tab on the time per node doesn't seem to make  
a difference, either. And people aren't hopping in and out of communities  
regularly, so carrying the whole version history for just a single  
assertion around is probably not too much trouble - the usual case is that  
you have at most 2 assertions (an initial postive assertion and a later  
retraction; after some time, both will probably be cleaned up). Ditto for  
MC, probably: you assert "method M has body B", then "method M has body  
C", ... but for the vast majority of methods this'll only go throuh a four  
or five revisions.

Lots of verbosity and thinking aloud - but did I understand you correctly  
here, Colin?



More information about the Squeak-dev mailing list