Rambling around foo: backup

Showing posts with label backup. Show all posts

Friday, 6 March 2015

Occasional Rsnapshot v1.3.1

I was writing in the previous post about Occasional Rsnapshot and how I ended up writing it.

Just before releasing v1.2.1 I realized it would make sense Semantic Versioning which, in just a few words means:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

I just released Occasional Rsnapshot v1.3.1 which mainly fixed issue 4:

When deciding to do a backup for interval INT, there should be a check that the oldest snapshot in INT-1 interval is older than the threshold for the INT interval. Otherwise the INT interval will be populated with backups more frequent than desired and it is possible older backups in INT interval to completely lost.
The condition should be:
ts(oldest(INT-1)) - ts(newest(INT)) >= threshold(INT)
For example:

if weekly.0 is from 15th of February and daily.6 is from 17th of February, weekly should not be triggered, but

if weekly.0 is from 15th of February and daily.6 is from 23rd of February, weekly shall be triggered

This extra check should probably added in can_backup_interval.

It was a small bug, but it might have lead to losing important older backups because newer and more frequent backups would be pushed from hourly, interval by interval up to the yearly interval, in spite of the fact that the distance between backups wouldn't have been respecting the upper interval minimum distance in time.

There was also a small syntax bugfix, but functionally nothing has changed because bash was doing the right thing even with that small error.

If you have started using Occasional Rsnapshot, you definitely want now Occasional Rsnapshot v1.3.1. If you haven't started and you don't have backups, please start doing backups, and while you're at it, you might want to try Occasional Rsnapshot (with Rsnapshot).

Wednesday, 25 February 2015

Occasional Rsnapshot v1.3.0

It is almost exactly 1 year and a half since I came up with the idea of having a way of making backups using Rsnapshot automatically triggered by my laptop when I have the backup media connected to my laptop. This could mean connecting a USB drive directly to the laptop or mounting a NFS/sshfs share in my home network. Today I tagged Occasional Rsnapshot the v1.3.0 version, the first released version that makes sure even when you connect your backup media occasionally, your Rsnapshot backups are done if and when it makes sense to do it, according to the rsnapshot.conf file and the status of the existing backups on the backup media.

Quoting from the README, here is what Occasional Rsnapshot does:

This is a tool that allows automatic backups using rsnapshot when the external backup drive or remote backup media is connected.

Although the ideal setup would be to have periodic backups on a system that is always online, this is not always possible. But when the connection is done, the backup should start fairly quickly and should respect the daily/weekly/... schedules of rsnapshot so that it accurately represents history.

In other words, if you backup to an external drive or to some network/internet connected storage that you don't expect to have always connected (which is is case with laptops) you can use occasional_rsnapshot to make sure your data is backed up when the backup storage is connected.

occasional_rsnapshot is appropriate for:

laptops backing up on:

a NAS on the home LAN or

a remote or an internet hosted storage location

systems making backups online (storage mounted locally somehow)

systems doing backups on an external drive that is not always connected to the system

The only caveat is that all of these must be mounted in the local file system tree somehow by any arbitrary tool, occasional_rsnapshot or rsnapshot do not care, as long as the files are mounted.

So if you find yourself in a simillar situation, this script might help you to easily do backups in spite of the occasional availability of the backup media, instead of no backups at all. You can even trigger backups semi-automatically when you remember to or decide is time to backup, by simply pulging in your USB backup HDD.

But how did I end up here, you might ask?

In December 2012 I was asking about suggestions for backup solutions that would work for my very modest setup with Linux and Windows so I can backup my and my wife's system without worrying about loss of data.

One month later I was explaining my concept of a backup solution that would not trust the backup server, and leave to the clients as much as possible the decision to start the backup at their desired time. I was also pondering on the problems I might encounter.

From a security PoV, what I wanted was that:

clients would be isolated from each other
even in the case of a server compromise:

the data would not be accessible since they would be already encrypted before leaving the client
the clients could not be compromised

The general concept was sane and supplemental security measures such as port knocking and initiation of backups only during specific time frames could be added.

The problem I ran to was that when I set up this in my home network a sigle backup cycle would take more than a day, due to the fact that I wanted to do backup of all of my data and my server was a humble Linksys NSLU2 with a 3TB storage attached on USB.

Even when the initial copy was done by attaching the USB media directly to the laptop, so the backup would only copy changed data, the backup with the HDD attached to the NSLU2 was not finished even after more than 6 hours.

The bottleneck was the CPU speed and the USB speed. I tried even mounting the storage media over sshfs so the tiny xscale processor in the NSLU2 would not be bothered by any of the rsync computation. This proved to an exercise in futility, any attempt to put the NSLU2 anywhere in the loop resulted in an unacceptable and impractically long backup time.

All these attempts, of course, took time, but that meant that I was aware I still didn't have appropriate backups and I wasn't getting closer to the desired result.

So this brings us August 2013, when I realized I was trying to manually trigger Rsnapshot backups from time to time, but having to do all sorts of mental gymnastics and manual listing to evaluate if I needed to do monthly, weekly and daily backups or if weekly and daily was due.

This had to stop.

Triggering a backup should happen automatically as soon as the backup media is available, without any intervention from the user.

I said.

Then I came up with the basic concept for Occasional Rsnapshot: a very silent script that would be called from cron every 5 minutes, would check if the backup media is mounted, if is not, exit silently to not generate all sorts of noise in cron emails, but if mounted, compute which backup intervals should be triggered, and trigger them, if the appropriate amount of time passed since the most recent backup in that backup interval.

Occasional Rsnapshot version v1.3.0 is the 8th and most recent release of the script. Even if I used Occasional Rsnapshot since the day 1, v1.3.0 is the first one I can recommend to others, without fearing they might lose data due to it.

The backup media can be anything, starting from your regular USB mounted HDD, your sshfs mounted backup partition on the home NAS server to even a remote storage such as Amazon S3 online storage, and there are even brief instructions on how to do encrypted backups for the cases where you don't trust the remote storage.

So if you think you might find anything that I described remotely interesting, I recommend downloading the latest release of Occasional Rsnapshot, go through the README and trying it out.

Feedback and bug reports are welcome.
Patches are welcomed with a 'thank you'.
Pull requests are eagerly waited for :) .

Tuesday, 22 January 2013

(Rsnapshot) backup and security - I see problems

In my previous post I was asking for suggestions for backup solutions that would be open/free software, do backups over the network to a local HDD, be cross platform to allow Windows and Linux clients and not be too CPU/memory hungry (on the server).

Several people suggested rsnapshot, BackupPC, areca-backup, and rsync. Thank you for all your suggestions, you have been a tremendous help. I have decided to give rsnapshot a try since it was suggested to me that it would actually do what is supposed to do for Windows clients, too (which was initially my perceived show stopper for rsnapshot).

Still, when getting to the implementation, I was a little disappointed by the very permissive access that needs to be provided on the client machines, since the backup is initiated from the backup server. Even the so called more secure suggested solutions seem way too permissive for my taste, since losing the control over the backup system means basically giving total access to the data from all client machines, which is quite a big problem in my opinion.

The data-transfer mechanism employed by rsnapshot is simply

S ==(connects and reads all data)==> C
S stores data in the final storage area

Am I the only one seeing a problem with this idea? If the server can connect to all your client machines and read all areas as it pleases, even if you restrict it to some directories, the data is already compromised when the backup server is compromised (think .ssh private keys, files with wireless network passwords and so on; I won't say card information - you don't keep credit/debit card information on your computer, or at least not in plain text, do you?).

What I would consider a better alternative would be a server-initiated dialogue which goes a little like this (S is server, C is client, '=' represent connections via ssh):

S ---(requests backup initiation procedure)---> C
S waits for a defined period of time that C connects back to send (already encrypted) data; if it doesn't arrive, it aborts
S <===(sends encrypted data to be backed up)=== C
S <-(signals the completion of data transfer)-- C
S stores the data in the final storage area

This way, the server can allow access from the clients only in designated areas (even a chroot is possible) from designated clients, access can even be provided only after a port knocking procedure and only during the backup time frame (since the server initiates the negotiation, it can expect only then the knocks, but only then), so the server is quite well secured. The connection to the server can even be done through an unprivileged account, it can even be one account per client machine which can be limited to a scponly shell, if you care for that level of security.

On the other hand, the client information is secure since it can be encrypted directly on the client machine and sent only after encryption, the client machine can decide and control what it sends, while the backup server can only store what the client provides. Also, if the server is compromised, the clients' data and system aren't compromised at all, since the data is on the backup machine, but is encrypted with a key only known on the client (and a backup copy of it can be stored somewhere safe).

I am aware this approach can be problematic for permission (user/group preservation), but it doesn't happen if there is a local <-> remote user mapping or simply the numeric IDs are kept.

I am also aware this means smarter clients and might mean the Windows machine might not be able to implement this completely, but a little more security than "here is all my data" can still be achieved, can't it?

What do other people think? Am I insane or paranoid?

I think I can implement this type of protocol in some scripts (at least one for server and one for clients) and use the backup_script feature of rsnapshot to keep this clean and nice within rsnapshot.

What might prove problematic with this approach is that rsync spedup is lost (might be?) because the copy is done to a temporary directory which, I assume, is empty, so tough luck. Another problem seems to be that every time the backup is done, the client has to encrypt each of the files to backup, which seems to be a real performance penalty, especially if the data to be backed up is quite large.

Is there an encryption layer that does this automatically at file level in the same/similar manner that LUKS does for entire block devices? Having the right file names, but with scrambled/encrypted contents seems to be the ideal solution from this PoV.

Thanks for reading and possible suggestions you might point me to.

P.S.: I just thought of this, if there was an encryption layer implemented with fuse which is mounted in some directory on the client machine, the default rsnapshot mechanism could actually work, and this would mitigate the data accessibility issue and the performance issue since that file system could be contained within a chroot and the encryption/scrambling would be done transparently on the client, so no data is plainly accessible. Does anybody know such a FUSE implementation that does on-the-fly file encryption?

P.P.S.: EncFS does exactly what I want with its --reverse option which is exactly designed for this purpose:

Normally EncFS provides a plaintext view of data on demand. Normally it stores enciphered data and displays plaintext data. With --reverse it takes as source plaintext data and produces enciphered data on-demand. This can be useful for creating remote encrypted backups, where you do not wish to keep the local files unencrypted.

Great!

Rambling around foo