So it was not as bad as I had feared.
An incremental backup from 7/6/10 to today: approximately 5GB of additional filespace used, backup time over 10mbit cable connection (DE to US) about 2 hrs 40 min.
Again, this is not the entire filesystem but it covers nearly all custom content.
Ken
-------- Original Message -------- Subject: RE: [Box-Admins] Permissions From: "Ken Causey" ken@kencausey.com Date: Sun, February 06, 2011 10:33 am To: "Squeak Hosting Support" box-admins@lists.squeakfoundation.org
Forget the figures below regarding the time it has taken me to do an incremental. I screwed up and did not do an incremental but a complete copy (pathing mistake). I'm going to start it again (right I hope) and I'll let you know.
Ken
-------- Original Message -------- Subject: RE: [Box-Admins] Permissions From: "Ken Causey" ken@kencausey.com Date: Sun, February 06, 2011 10:27 am To: "Squeak Hosting Support" box-admins@lists.squeakfoundation.org
-------- Original Message -------- Subject: Re: [Box-Admins] Permissions From: Göran Krampe goran@krampe.se Date: Sun, February 06, 2011 2:15 am To: box-admins@lists.squeakfoundation.org
Hi all!
Yes, we used rsnapshot earlier, but all though it uses rsync (good for partial file changes) it ends up using hardlinks etc on the target and thus consumes quite a bit of space there.
For my local needs (laptop onto a USB vfat drive) I ended up picking Duplicity the other day. But either way we want a solution that transfers only modifications.
I would be happy with a solution that simply keeps an offsite "mirror". Rsnapshot gave us "history" too, and sure, if we can use it then it is probably a good choice. Ken suspected transfers were large, but I am unsure, I hope they are not and that it uses rsync for the actual transfer.
There is no doubt that it is using rsync, it's just a fancy shell script really and it logs the major commands it executes in the log file (/var/log/rsnapshot.log). As such my early speculation that whole files are being sent is probably wrong, unless some special invocation of rsync is required for deltas to be used (I can't claim to be an rsync expert.)
However, as I mentioned separately I'm trying to update my local backup of the primary server content in much the same way rsnapshot does, but manually (using rsync). I have an, in theory, 10mbit Internet connection, of which I really get more like 5-7mbit downloads. The incremental (since last July) of /home/ took over 6 hours and actually it doesn't look right to me, I'm going to have to do it again. I've started a sync of the other stuff now and we'll see how that goes and then I'll get back to /home/. My connection is asynchronous like most home connection (probably 3/4 mbit), so I wonder if there is a lot of two way traffic involved which could explain the time it is taking. I haven't done the math though to see what is reasonable.
Currently I don't have a proper broadband, but if we set up something that can run "night time" off to people "home computers", then that should be enough I think. The reason we used a server that my former employer had was that it was also a Hetzner box so the connection was really good.
regards, Göran
Hi all!
Been investigating this part and it seems to me that the three most interesting and mature tools for this is rdiff-backup, rsnapshot and duplicity.
Rsnapshot ========= Rsnapshot is what we have used so far. It is a bit tricky to configure IMHO and I must say slightly confusing with the "backup levels". It needs a working rsnapshot on the destination, and since it uses hard links to create the illusion of "full snapshots" it will consume much more space if we have common changes to large files, like say Squeak images - or hey, even worse, VirtualBox/VMWare images :) It also has issues with "file selection quirks" based in oddities in rsync - like sensitivity to trailing slashes etc.
Rdiff-backup ============ Also relies on being on destination like rsnapshot. Saves metadata separately from the mirror, does not use hardlinks but instead maintain the mirror and store the increments from older snapshots on the side. Some report it is rather slow. Does not have "file selection quirks" but does also not have "backup levels". Seems simpler to use though. Cosumes less space on destination since it does not rely on hard links. Can be used with archs (http://code.google.com/p/archfs/) to create a FUSE illusion of full snapshots like rsnapshot has.
Duplicity ========= Does NOT rely on being installed on destination, thus can use "dumb servers" through scp/ftp etc. Does NOT maintain a mirror on destination, but instead keeps file data in compressed tar files typically taking less space than rsnapshot/rdiff-backup. Can use encryption, which the above tools do not. Very simple to use and understand, although restores are of course not just an "scp yadda". Speed should IMHO be good compared to rsnap/rdiff since it keeps signatures etc on source and should not need to do ANY round tripping - not verified though, we would need to test.
I picked Duplicity because I wanted to use an external USB drive with vfat on it as destination, and Duplicity splits very large files so can easily backup my VirtualBox 12Gb files onto it, which of course rdiff-backup failed.
Conclusion ========== If we want people to give us a bit of their harddrives for nightly run backups it seems to me that Duplicity might fit the bill best:
- Smaller size on destination (compressed) - Does not need duplicity on destination, just a dumb server - Good speed (we would need to test and compare) - We could use encryption to make it less "scary" to replicate stuff like "/etc" to boxes not under our direct control.
It does not handle hard links though, but perhaps not a big issue for us?
regards, Göran
I hope I'm not adding too much noise to the conversation, but what do folks think about using something like S3 for backups? We'd have to find funding for it somewhere, but it might be cheaper than buying terabyte drives, etc, and the thing is pretty darned reliable for backup.
Also makes sure that the bottleneck for data transfer isn't a home connection.
Just a thought.
On Feb 6, 2011, at 4:26 PM, Göran Krampe goran@krampe.se wrote:
Hi all!
Been investigating this part and it seems to me that the three most interesting and mature tools for this is rdiff-backup, rsnapshot and duplicity.
Rsnapshot
Rsnapshot is what we have used so far. It is a bit tricky to configure IMHO and I must say slightly confusing with the "backup levels". It needs a working rsnapshot on the destination, and since it uses hard links to create the illusion of "full snapshots" it will consume much more space if we have common changes to large files, like say Squeak images - or hey, even worse, VirtualBox/VMWare images :) It also has issues with "file selection quirks" based in oddities in rsync - like sensitivity to trailing slashes etc.
Rdiff-backup
Also relies on being on destination like rsnapshot. Saves metadata separately from the mirror, does not use hardlinks but instead maintain the mirror and store the increments from older snapshots on the side. Some report it is rather slow. Does not have "file selection quirks" but does also not have "backup levels". Seems simpler to use though. Cosumes less space on destination since it does not rely on hard links. Can be used with archs (http://code.google.com/p/archfs/) to create a FUSE illusion of full snapshots like rsnapshot has.
Duplicity
Does NOT rely on being installed on destination, thus can use "dumb servers" through scp/ftp etc. Does NOT maintain a mirror on destination, but instead keeps file data in compressed tar files typically taking less space than rsnapshot/rdiff-backup. Can use encryption, which the above tools do not. Very simple to use and understand, although restores are of course not just an "scp yadda". Speed should IMHO be good compared to rsnap/rdiff since it keeps signatures etc on source and should not need to do ANY round tripping - not verified though, we would need to test.
I picked Duplicity because I wanted to use an external USB drive with vfat on it as destination, and Duplicity splits very large files so can easily backup my VirtualBox 12Gb files onto it, which of course rdiff-backup failed.
Conclusion
If we want people to give us a bit of their harddrives for nightly run backups it seems to me that Duplicity might fit the bill best:
- Smaller size on destination (compressed)
- Does not need duplicity on destination, just a dumb server
- Good speed (we would need to test and compare)
- We could use encryption to make it less "scary" to replicate stuff like "/etc" to boxes not under our direct control.
It does not handle hard links though, but perhaps not a big issue for us?
regards, Göran
On 02/07/2011 07:41 PM, Casey Ransberger wrote:
I hope I'm not adding too much noise to the conversation, but what do folks think about using something like S3 for backups? We'd have to find funding for it somewhere, but it might be cheaper than buying terabyte drives, etc, and the thing is pretty darned reliable for backup.
We don't need tera byte drives, although those don't cost much these days :). But sure, Duplicity supports S3 as a targer, so that would work.
Also makes sure that the bottleneck for data transfer isn't a home connection.
That is no big deal if running at night, and IMHO we would then set up 2-3 different ones so if one fails one night it wouldn't matter.
Upside is of course that it would cost 0.
regards, Göran
I thought I saw a message from Ken saying total data footprint was ~ 500GB. My memory may be failing though.
On Feb 7, 2011, at 11:30 AM, Göran Krampe goran@krampe.se wrote:
On 02/07/2011 07:41 PM, Casey Ransberger wrote:
I hope I'm not adding too much noise to the conversation, but what do folks think about using something like S3 for backups? We'd have to find funding for it somewhere, but it might be cheaper than buying terabyte drives, etc, and the thing is pretty darned reliable for backup.
We don't need tera byte drives, although those don't cost much these days :). But sure, Duplicity supports S3 as a targer, so that would work.
Also makes sure that the bottleneck for data transfer isn't a home connection.
That is no big deal if running at night, and IMHO we would then set up 2-3 different ones so if one fails one night it wouldn't matter.
Upside is of course that it would cost 0.
regards, Göran
box-admins@lists.squeakfoundation.org