-------- Original Message -------- Subject: Re: Losing rsync mirror of squeak.org From: Göran_Krampe goran@krampe.se Date: Wed, April 28, 2010 3:45 pm To: Ken Causey ken@kencausey.com Cc: box-admins Support box-admins@lists.squeakfoundation.org, Squeak Oversight Board board@lists.squeakfoundation.org
On 04/28/2010 06:32 PM, Ken Causey wrote:
OK, I understand. Thanks for providing this service for so long. So this eliminates our remote backup. Does anyone have any ideas for a replacement? An rsync/rsnapshot setup would be nicest (easiest).
I wonder how much traffic this is, I mean, I do have a server here at home that could sync it at night time. And so could probably someone else too.
regards, Göran
Related to this I've been thinking more and more that rsync/rsnapshot are wasteful, especially for us with images that are often saved daily if not more often. The problem is that the granularity of rsnapshot is at the file level.
Another problem is that rsnapshot continues to retain files after they have been deleted. At first glance this is very desirable. But in practice this means that the snapshot continuously grow and contain lots of files which we don't expect to ever see again.
I've looked into the idea of using something like git instead (see eigenclass for example) which should be significantly better. But this is complex and I, like you, only have limited time to look into this. If anyone has related experience I would love to hear about it.
Regarding size... Some quick estimates:
I believe as a rough estimate the entire installation, excepting the backups themselves, comes to about 40GB currently. So that is something of an upper estimate, say if the entire thing was transferred each day.
As another estimate the sizes of the local backups (once a day, 7 days worth) are:
box2:~# rsnapshot du 49G /var/cache/rsnapshot/daily.0/ 2.0G /var/cache/rsnapshot/daily.1/ 2.0G /var/cache/rsnapshot/daily.2/ 2.6G /var/cache/rsnapshot/daily.3/ 2.1G /var/cache/rsnapshot/daily.4/ 2.0G /var/cache/rsnapshot/daily.5/ 2.0G /var/cache/rsnapshot/daily.6/ 61G total
The .0 is the most recent and represents a complete backup, along with the accumulated history of past backups. The others represents the files that have changed from the previous day. So in theory if only changed files are transferred in whole, it should average less than 3GB per day. If instead we had a system which transferred only the changes, it would be far far less.
As far as using rsnapshot itself goes I'm not clear on what is transferred. Each of these backups represent a complete backup, it's simply that when the file is not modified a hard link is created. I'm not certain when a remote backup is made on which side the decision is made, whether or not every file is transferred and it is only on the backup server that the decision to hard link is made, hopefully not.
You will note that .0 backup is quite a bit larger than my estimate of the actual size of the current content on disk, this shows the problem with the continued collection of deleted files. My research of git based solutions indicates that this is a problem there as well. The only solution is to clean out files that really should be forgotten. I would expect the problem to be far less significant with a system that stores inter-file diffs though.
Ken
As an addendum: What is actually more useful from a remote backup standpoint is weekly backups. We have a week's worth of daily backups locally. In fact I think Göran was previously keeping something like 4 weeks worth of weekly backups. I expect the changes from one week to the next, at the level of complete files, is not much more than the daily changes, although a large update to the FTP site could of course change that.
Ken
On Thu, 2010-04-29 at 11:39 -0700, Ken Causey wrote:
-------- Original Message -------- Subject: Re: Losing rsync mirror of squeak.org From: Göran_Krampe goran@krampe.se Date: Wed, April 28, 2010 3:45 pm To: Ken Causey ken@kencausey.com Cc: box-admins Support box-admins@lists.squeakfoundation.org, Squeak Oversight Board board@lists.squeakfoundation.org
On 04/28/2010 06:32 PM, Ken Causey wrote:
OK, I understand. Thanks for providing this service for so long. So this eliminates our remote backup. Does anyone have any ideas for a replacement? An rsync/rsnapshot setup would be nicest (easiest).
I wonder how much traffic this is, I mean, I do have a server here at home that could sync it at night time. And so could probably someone else too.
regards, Göran
Related to this I've been thinking more and more that rsync/rsnapshot are wasteful, especially for us with images that are often saved daily if not more often. The problem is that the granularity of rsnapshot is at the file level.
Another problem is that rsnapshot continues to retain files after they have been deleted. At first glance this is very desirable. But in practice this means that the snapshot continuously grow and contain lots of files which we don't expect to ever see again.
I've looked into the idea of using something like git instead (see eigenclass for example) which should be significantly better. But this is complex and I, like you, only have limited time to look into this. If anyone has related experience I would love to hear about it.
Regarding size... Some quick estimates:
I believe as a rough estimate the entire installation, excepting the backups themselves, comes to about 40GB currently. So that is something of an upper estimate, say if the entire thing was transferred each day.
As another estimate the sizes of the local backups (once a day, 7 days worth) are:
box2:~# rsnapshot du 49G /var/cache/rsnapshot/daily.0/ 2.0G /var/cache/rsnapshot/daily.1/ 2.0G /var/cache/rsnapshot/daily.2/ 2.6G /var/cache/rsnapshot/daily.3/ 2.1G /var/cache/rsnapshot/daily.4/ 2.0G /var/cache/rsnapshot/daily.5/ 2.0G /var/cache/rsnapshot/daily.6/ 61G total
The .0 is the most recent and represents a complete backup, along with the accumulated history of past backups. The others represents the files that have changed from the previous day. So in theory if only changed files are transferred in whole, it should average less than 3GB per day. If instead we had a system which transferred only the changes, it would be far far less.
As far as using rsnapshot itself goes I'm not clear on what is transferred. Each of these backups represent a complete backup, it's simply that when the file is not modified a hard link is created. I'm not certain when a remote backup is made on which side the decision is made, whether or not every file is transferred and it is only on the backup server that the decision to hard link is made, hopefully not.
You will note that .0 backup is quite a bit larger than my estimate of the actual size of the current content on disk, this shows the problem with the continued collection of deleted files. My research of git based solutions indicates that this is a problem there as well. The only solution is to clean out files that really should be forgotten. I would expect the problem to be far less significant with a system that stores inter-file diffs though.
Ken
box-admins@lists.squeakfoundation.org