[Box-Admins] RE: Losing rsync mirror of squeak.org

Ken Causey ken at kencausey.com
Thu Apr 29 18:39:09 UTC 2010


> -------- Original Message --------
> Subject: Re: Losing rsync mirror of squeak.org
> From: Göran_Krampe <goran at krampe.se>
> Date: Wed, April 28, 2010 3:45 pm
> To: Ken Causey <ken at kencausey.com>
> Cc: box-admins Support <box-admins at lists.squeakfoundation.org>,  Squeak
> Oversight Board <board at lists.squeakfoundation.org>
> 
> 
> On 04/28/2010 06:32 PM, Ken Causey wrote:
> > OK, I understand.  Thanks for providing this service for so long.  So
> > this eliminates our remote backup.  Does anyone have any ideas for a
> > replacement?  An rsync/rsnapshot setup would be nicest (easiest).
> 
> I wonder how much traffic this is, I mean, I do have a server here at 
> home that could sync it at night time. And so could probably someone 
> else too.
> 
> regards, Göran

Related to this I've been thinking more and more that rsync/rsnapshot
are wasteful, especially for us with images that are often saved daily
if not more often.  The problem is that the granularity of rsnapshot is
at the file level.

Another problem is that rsnapshot continues to retain files after they
have been deleted.  At first glance this is very desirable.  But in
practice this means that the snapshot continuously grow and contain lots
of files which we don't expect to ever see again.

I've looked into the idea of using something like git instead (see
eigenclass for example) which should be significantly better.  But this
is complex and I, like you, only have limited time to look into this. 
If anyone has related experience I would love to hear about it.

Regarding size...  Some quick estimates:

I believe as a rough estimate the entire installation, excepting the
backups themselves, comes to about 40GB currently.  So that is something
of an upper estimate, say if the entire thing was transferred each day.

As another estimate the sizes of the local backups (once a day, 7 days
worth) are:

box2:~# rsnapshot du
49G     /var/cache/rsnapshot/daily.0/
2.0G    /var/cache/rsnapshot/daily.1/
2.0G    /var/cache/rsnapshot/daily.2/
2.6G    /var/cache/rsnapshot/daily.3/
2.1G    /var/cache/rsnapshot/daily.4/
2.0G    /var/cache/rsnapshot/daily.5/
2.0G    /var/cache/rsnapshot/daily.6/
61G     total

The .0 is the most recent and represents a complete backup, along with
the accumulated history of past backups.  The others represents the
files that have changed from the previous day.  So in theory if only
changed files are transferred in whole, it should average less than 3GB
per day.  If instead we had a system which transferred only the changes,
it would be far far less.

As far as using rsnapshot itself goes I'm not clear on what is
transferred.  Each of these backups represent a complete backup, it's
simply that when the file is not modified a hard link is created.  I'm
not certain when a remote backup is made on which side the decision is
made, whether or not every file is transferred and it is only on the
backup server that the decision to hard link is made, hopefully not.

You will note that .0 backup is quite a bit larger than my estimate of
the actual size of the current content on disk, this shows the problem
with the continued collection of deleted files.  My research of git
based solutions indicates that this is a problem there as well.  The
only solution is to clean out files that really should be forgotten.  I
would expect the problem to be far less significant with a system that
stores inter-file diffs though.

Ken



More information about the Box-Admins mailing list