MC, P2P, distributed collections
Cees de Groot
cg at cdegroot.com
Thu Mar 17 08:10:21 UTC 2005
On Wed, 16 Mar 2005 13:01:11 -0500, Colin Putney <cputney at wiresong.ca>
wrote:
[zooming in on the meat like a fly going for a nice fresh heap of ....]:
> Instead of a UUID, we use a combination of SHA1 hash and timestamp to
> identify each version. Each version also has an ancestry, which is a Set
> of all the hashstamps of the versions that precede it.
>
So, you carry the whole ancestry around, like in MC1.
> For example, if we assume that each node has a consistent clock, (though
> not necessarily synchronized with other nodes), we can trim ancestry
> that is older than say, one hour.
>
Something like that might work.
Ok, looks like my list was fairly complete, then. Does anyone mind my
thinking aloud? No? Good ;)
Situation:
- S1: user U asserts on node N that A is true ("U is a member of community
X")
- S2: user U asserts on node N that A is not true ("U is not a member of
community X")
- (user drinks coffee, reconsiders, logs in on the coffeeroom computer)
- S3: user U asserts on node M that A is true ("U is a member of community
X")
These assertions are broadcast on the network.
Problem:
- On node K, the assertions can arrive in any random order.
Solution 1: the 'mc solution'
- S1 comes with H(S1,timestamp)=VS1
- S2 comes with VS1,H(S2,timestamp)=VS2
- S3 comes with VS1,VS2,H(S3,timestamp)=VS3
This is enough information to let node K find out which assertion is the
latest one. However, the 'freak' situation:
- S1 and S2 didn't arrive to the coffeeroom computer in time. So user
wasn't a member on the coffeeroom computer and thus the coffeeroom
computer accepted the assertion. This is less an exception if you consider
the coffeeroom computer being off-line - one of the requirements, of
course, is that you can work while you're off-line.
So:
- S1 - VS1
- S2 - VS1,VS2
- S3 - VS3
Now, how does node K find out what the latest assertion was? S2 or S3?
This can only be broken either by human intervention of having some global
notion of time (regarding both S1 and S3 as having a 'null' parent, this
becomes a regular merge conflict). Some mechanism of bringing the various
node's clocks together to something "close enough" might be "good enough"
to help out in solving these conflicts (exchange your local time with your
peers; note average deviation; make "Vx=H(x,timestamp),timestamp" and let
timestamp be corrected for this deviation. As it is only for coarse timing
and exceptional cases plus inter-node time comparisions only (intro-node
goes with the ancestry mechanism), we could let clocks run backwards when
correcting, etcetera, which makes synchronisation a whole lot easier).
The solution that you keep a tab on the time per node doesn't seem to make
a difference, either. And people aren't hopping in and out of communities
regularly, so carrying the whole version history for just a single
assertion around is probably not too much trouble - the usual case is that
you have at most 2 assertions (an initial postive assertion and a later
retraction; after some time, both will probably be cleaned up). Ditto for
MC, probably: you assert "method M has body B", then "method M has body
C", ... but for the vast majority of methods this'll only go throuh a four
or five revisions.
Lots of verbosity and thinking aloud - but did I understand you correctly
here, Colin?
More information about the Squeak-dev
mailing list
|