Synchronize files between a Macbook and an Ubuntu machine using rsync via ssh

Have you ever considered mirroring/synchronizing your important files from your Mac to your Linux machine to decrease the risk of data loss because of a hardware crash? One very effective way to do this is using two open source tools available on your Mac and your Linux machine called rsync and ssh. This document will lead through the necessary steps. The main reason for this document was to have a reference for myself as it wasn't trivial. Maybe it can be helpful for others.

As a precondition, it is assumed, that you have two computers that are connected via a network. I am using a Macbook with Mac OS X 10.4 Tiger and an Ubuntu PC with Ubuntu 7.10 Gutsy connected via a WLAN where I have verified all the steps. Some of the steps described are Mac / Ubuntu specific. Most information should be valid for other operating systems where ssh and rsync can be installed (e.g. other Linux distributions) but you might check out the distribution specifics if something should not work. Many ideas came from (6).

Make sure, that the ssh client and server are installed and running

For general information about SSH see (1). If your don't mind sending your data unencrypted (e.g. if your are using a private cable network connection) you don't need SSH and you can leave out this step.


Start on the destination machine (Ubuntu = server). Set the Power Management Preferences so that the computer will never put to sleep when inactive (hard drive and display can go to sleep).

Change to the source machine (Mac OS = client). As user ("Steffen" in my case) try to login to the destination machine (Ubuntu) using ssh, specifying either an ip address or a hostname (if the hostname appears in /etc/hosts).


Note: You can add a host name with the command:

sudo nano /etc/hosts

You'll be asked for a password (Mac OS). Enter the IP address and the host name in the list. Of course, instead of nano you can use your preferred editor (e.g. Emacs or Vim).


To login into a remote computer (destination machine) running a ssh-server, open a terminal and log in with ssh <user>@<server> like this:

ssh steffen@myubuntu

You'll be asked for a password. This is the password for the user on the destination machine (ubuntu), not the local password.

Note: Replace "
steffen@myubuntu" with whatever your username@hostname/ip is. It would be a good idea to use the same login names on both machines, because than you could login just with ssh myubuntu.

Type "yes" to the authentication message. Enter your password. If this works than ssh works and you should type “exit” to return to the source machine. If not than test ssh on the server machine (Ubuntu) with the following command:

ssh localhost

If you can login, ssh-server is running.


If you get the following error message: "ssh: connect to host localhost port 22: Connection refused"


Install OpenSSH-server and OpenSSH-client with the command:

sudo apt-get install ssh

Note: This is Ubuntu specific. You might need a different command to install ssh if you have a different server system.


Try again. It should work now and if you can login with ssh from your client into your server, you can proceed with the next step.

Setup public/private key pair

On the client (Mac OS) generate a private and a public passkey by typing:

ssh-keygen -t rsa

Follow the prompts. This will yield the id_rsa.pub and id_rsa files (the public and private key pair):

...Generating public/private rsa key pair. [Enter]
...Enter file in which to save the key (/home/ross/.ssh/id_rsa): [Enter]
...Created directory '/home/ross/.ssh'. [Enter: you might not see this message]
...Enter passphrase (empty for no passphrase): [Enter a passphrase]
...Enter same passphrase again: [Enter a passphrase]
...Your identification has been saved in /Users/Steffen/.ssh/id_rsa.
...Your public key has been saved in /Users/Steffen/.ssh/id_rsa.pub.


Password and passphrase do different things. The password is saved in the /etc/passwd of the target system. The passphrase is used to decrypt your private key on your system. The actual security of public key authentication over password authentication is that two things are needed to get access:

  • your (encrypted) private key

  • your passphrase (which is needed to decrypt the private key)

So if you would choose no passphrase at all (which is possible and used when ssh transfers will be used in scheduled scripts see (8) ) you would have even less security than using a password alone. Therefore, and because there are other options (see SSH Key Management), I decided to use a passphrase.

Copy the public key to the destination machine

On the Mac type:

scp ~/.ssh/id_rsa.pub steffenadmin@myubuntu:~/.ssh/authorized_keys

Note: Replace "steffenadmin@myubuntu" with whatever your username@hostname/ip is.


You should now be logged in to the remote machine (Ubuntu). Log off with

exit

On Ubuntu logged in as administrator copy this also to other accounts where necessary, e.g.:

sudo cp ~/.ssh/authorized_keys /home/steffen/.ssh/authorized_keys

Note:Replace "/home/steffen/" with whatever your accounts are.

An alternative to using scp and cp could have been:

ssh-copy-id -i ~/.ssh/id_rsa.pub steffen@myubuntu

This alternative didn't work for me (permission denied, I guess, because I didn't use an administrator account to log into Ubuntu).


Another alternative instead of using scp could be (but I haven't tried that yet):


cat ~/.ssh/id_rsa.pub | ssh steffenadmin@myubuntu "cat >> ~/.ssh/authorized_keys"

Use the new keys with ssh

On the Mac type:

ssh steffen@myubuntu

Note: Replace "steffen@myubuntu" with whatever your username@hostname/ip is.

You should no longer be asked for the password but for the passphrase. Enter your passphrase, and provided your Ubuntu machine is configured to allow key-based logins, you should then be logged in. If it works, it will work for all ssh connections for that user. If not, take a look at (2) (also helpful for make your Ubuntu machine's ssh-server more secure) or check the error message on Google.


Password based authentication is enabled per default in Ubuntu. If you want to stop users from logging in remotely using passwords, disable password authentication manually, by setting "PasswordAuthentication no" in the file /etc/ssh/sshd_config. Do not forget to restart your ssh server after changing the configuration (sudo /etc/init.d/ssh restart).

Using rsync via ssh to backup files from Mac OS (source) to Ubuntu (destination)

Rsync is a free file transfer program capable of efficient remote update via a fast differencing algorithm distributed under GNU General Public License. In order to use rsync to mirror files from a source machine to a destination machine via ssh both ssh and rsync must be available on both machines and ssh must be configured correctly (see description above). As with all good command line tool interaction, the power to bend rsync to your will lies in the usage switches you provide it in the rsync call (ie. "rsync -avz"). Notice that you can only use switches that are available on both the rsync of the source machine and the rsync of the destination machine. To see all the available options, type "rsync -h" or "man rsync" in the terminal.


A few of the (for me) most interesting switches are:


-a, --archivearchive mode (recurse into directories, copy symlinks as symlinks, preserve permissions, owner, group, times and devices);
equivalent to -rlptgoD
-e, --rsh=COMMANDspecify the remote shell; -e ssh tunnels the file transfer over an encrypted ssh connection
-n, --dry-runshow what would have been transferred without doing any file transfers
-u, --updateupdate only (don't overwrite files that are newer on the receiver)
-v, --verboseincrease verbosity
-z, --compresscompress data during transfer using gzip; saves bandwidth but needs more CPU power so use it for slow/expensive connections only
-E, --extended-attributesApple specific option to copy extended attributes, resource forks, and ACLs. Requires at least Mac OS X 10.4 or suitably patched rsync
This switch doesn't work for copying files from my MacBook to Ubuntu Gutsy (7.10) because it is not available on the rsync on my Ubuntu machine
-Pequivalent to --partial --progress (keep partially transferred files, show progress during transfer)
-S, --sparseTry to handle sparse files efficiently so they take up less space on the destination.
NOTE: Don't use this option when the destination is a Solaris "tmpfs" filesystem. It ends up corrupting the files.
-x, --one-file-systemdon't cross filesystem boundaries (ignore mounted volumes)
--delay-updatesput all updated files into place at transfer's end, very useful for live systems (is not available on my Mac OS rsync version)
--delete-afterdelete files in the target folder that are not in the source folder
--exclude=PATTERNexclude files matching PATTERN e.g.:
--exclude "*.bak" --exclude "*~" to ignore "*.bak" and "*~" files

--exclude=".*/" skips hidden files and directories)

--exclude-from=FILEexclude patterns listed in FILE (one per line)
--include=PATTERNdon't exclude files matching PATTERN
--include-from=FILEdon't exclude patterns listed in FILE
--statsgive some file transfer stats

If you're just getting started with rsync, the -n ("dry run") switch with -v (verbose) is a great way to see what files would get copied without actually copying. Use that switches to test out your rsync command before you run it as done in the following:

rsync -e ssh -nvauxPS --stats --exclude '.DS_Store' --exclude "*bak" --exclude "*~" ~/Documents/ steffen@myubuntu:'/media/sda1/Dokumente und Einstellungen/Steffen/Eigene Dateien'

The above command will login (with user "steffen" via ssh). As we have not setup passphrase-less keys, the script will halt and ask for the passphrase for the key '/Users/Steffen/.ssh/id_rsa'. It updates the directory ~/Documents/on my MacBook in the directory '/media/sda1/Dokumente und Einstellungen/Steffen/Eigene Dateien' on my Ubuntu machine ("myubuntu"). '.DS_Store', "*.bak" and "*~" files are ignored.

Notice the backslash in front of the spaces in the directory names. Also the closing slash for the source directory matters. If it would be left out, a subdirectory 'Documents' would be created in the destination directory. Furthermore, the single colon is needed for sending via ssh tunnel, as opposed to the regular rsh tunnel. If you use two colons, then despite the specification of ssh previously, the transfer would use rsh!

Check the output of the dry-run command. If everything seem to be OK, leave out the -n switch to actually do the transfer.

rsync -e ssh -vauxPS --stats --exclude '.DS_Store' --exclude "*bak" --exclude "*~" ~/Documents/ steffen@myubuntu:'/media/sda1/Dokumente und Einstellungen/Steffen/Eigene Dateien'

SSH Key Management

If entering the key passphrase each time you use ssh is bothering you, consider using SSHKeychain (for Mac OS 10.4 Tiger). It will store the passphrase and acts as gateway to the ssh-agent, so you will only be ask for your passphrase per ssh-session no matter who many commands use ssh afterwards. SSHKeychain also has an option to integrate key phrase into Apple Keychain so the key can be used just by unlocking the Keychain which makes usage within scripts also much easier. The easy installation procedure and the usage are described at the SSHKeychain Homepage. I have installed SSHKeychain 0.8.2 and it is working nicely.

Mac OS X 10.5 (Leopard) seem to have built-in ssh-agent support for SSH Key Management, but as I don't have Mac OS 10.5, I did not try this.

If you are using another system as client, you might find similar tools (e.g. Keyring in Ubuntu), but I haven't checked, if they provide similar functionality.

Make a backup script with rsync

In order to not always have to enter the above rsync command in the shell, a shell script can be used (which should work on other UNIX - based operating systems too). Such scripts using rsync can be found easily in the Web. The following describes one way to make a simple bash-script.


nano backup.bash


Copy the rsync command in the file, (CTRL+O) and exit (CTRL+X) backup.bash and make it executable with:


chmod 744 backup.bash


From the directory in which backup.bash was saved, type:

./backup.bash


to run the backup.


You can add some logging functionality to the shell script:


#!/bin/bash

echo ================================ rsync Backup script ================================= >>~/Documents/Programming/Shell_scripts/rsync.log

date >>~/Documents/Programming/Shell_scripts/rsync.log

echo ==start rsync logging== >>~/Documents/Programming/Shell_scripts/rsync.log

rsync -e ssh -nvauxPS --stats --exclude '.DS_Store' --exclude "*bak" --exclude "*~" ~/Documents/ steffen@myubuntu:'/media/sda1/Dokumente und Einstellungen/Steffen/Eigene Dateien'>>~/Documents/Programming/Shell_scripts/rsync.log

echo =rsync Backup Ended== >>~/Documents/Programming/Shell_scripts/rsync.log

sleep 2m

echo ===== Backup Complete ===== >>~/Documents/Programming/Shell_scripts/rsync.log

open ~/Documents/Programming/Shell_scripts/rsync.log


If desired, more functionality could be added in the shell script. Examples (e.g. for an incremental backup) can be found in (3).


But instead of perfecting the shell script I decided to make a Python script (as I like Python) to wrap around the rsync shell command and an XML - file (to supply the paths that I want copy via rsync). The core of the Python scripts is: subprocess.call("rsync -e ssh -options source destination", shell=True). As starting rsync e- ssh via a shell call from Python doesn't prompt the user for a passphrase, this only works if ssh-agent is properly set up (see the chapter about SSH-Key-Management above). Otherwise the shell call with terminate with an error message (see messages in the Console for details).

The following assumes, that you have at least Python 2.4 installed on your client and know how to start Python scripts. I recommend to install the newest version of Mac Python from (4) as the Python version that come installed with Mac OS 10.4 doesn't support all the Python command that I have used in the following script.

#!/usr/local/bin/python

# Filename: RsyncMacWithUbuntu.py
#

# I confirm that, to the best of my knowledge and belief, this contribution is free of any claims of third parties under
# copyright, patent or other rights or interests ("claims").

#

# Copyright 2008 Steffen Hellmich Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at http://www.apache.org/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#

"""Copies important data from my Mac to my Ubuntu machine with Rsync via SSH.

This scripts uses rsync to copy source folders to destination folders via ssh.

Required: XML-file "FoldersForRsync.xml" in the form of:

<Folders>

<options>-e ssh -nvauxPS --stats --exclude '*.DS_Store' --exclude '*bak' --exclude '*~'</options>

<rsyncTransfer use="yes">

<source>/Users/Steffen/Documents/</source>

<destination>steffen@myubuntu:/media/sda1/Dokumente und Einstellungen/Steffen/Eigene Dateien</destination>

<description>Important Documents</description>

</rsyncTransfer>

<rsyncTransfer use="no">

<source>/Users/Steffen/Documents/</source>

<destination></destination>

<description> ... </description>

</rsyncTransfer>

</Folders>

Status messages indicate process

Error messages indicate failure

The transfer is logged in "/Users/Steffen/Documents/Programming/Pyhton_scripts/rsync.log"

"""

from __future__ import with_statement

import subprocess

import sys

import os

import string

import xml.sax # XML handling module

import time

import xml.sax.handler

def escapeSpaces (text): # add backshlash in from of spaces

if text.find(' ') == -1 :

return text

elif text.find(':') == -1 :

return ''.join([''', text, '''])

else:

return ''.join([''', text.replace(" ","\ "),'''])

class RsyncTransferHandler(xml.sax.handler.ContentHandler):

def __init__(self):

self.inOptions = 0

self.options = ""

self.inDestination = 0

self.inDescription = 0

self.inSource = 0

self.destination = ""

self.description = ""

self.source = ""

self.use = ""

def startElement(self, name, attributes):

if name == "options":

self.inOptions = 1

self.options = ""

elif name == "rsyncTransfer":

self.source = ""

self.destination = ""

self.description = ""

self.use = attributes["use"]

elif name == "source":

self.inSource = 1

elif name == "destination":

self.inDestination = 1

elif name == "description":

self.inDescription = 1

def characters(self, data):

if self.inOptions:

self.options += data

elif self.inDestination:

self.destination += data

elif self.inDescription:

self.description += data

elif self.inSource:

self.source += data

def endElement(self, name):

if name == "options":

self.inOptions = 0

elif name == "destination":

self.inDestination = 0

elif name == "description":

self.inDescription = 0

elif name == "source":

self.inSource = 0

elif name == "rsyncTransfer":

if self.use == "yes":

print 'Rsync', self.description, 'with the following command via standard shell.'

cmd = "rsync"

logfile = "/Users/Steffen/Documents/Programming/Python_scripts/rsync.log"

rsyncCommand = string.join([cmd, self.options, escapeSpaces(self.source),

escapeSpaces(self.destination), '>>', logfile])

print rsyncCommand

with open(logfile,'a') as f:

f.write("======================== rsync backup script =============================n")

tDate = "-".join([time.strftime('%Y'),time.strftime('%m'),time.strftime('%d')])

tTime = ":".join([time.strftime('%H'),time.strftime('%M'),time.strftime('%S')])

f.write(" ".join([tDate, tTime, "== start rsync logging ==n"]))

f.write(" ".join(["Rsync", self.description, "withn", rsyncCommand, "n"]))

try:

retcode = subprocess.call(rsyncCommand, shell=True)

if retcode < 0:

print >>sys.stderr, "Child was terminated by signal", -retcode

else:

print >>sys.stderr, "Child returned", retcode

except OSError, e:

print >>sys.stderr, "Execution failed:", e

with open(logfile, 'a') as f:

f.write("======================== rsync backup ended ==============================n")

f.close()

try:

shellCmd = "ping -c 1 myubuntu"

retcode = subprocess.call(shellCmd, shell=True) # check if Ubuntu machine is reachable

if retcode == 0: # ping ok

print "Server myubuntu seem to be reachable. Return code for:", shellCmd, "=", retcode, "n"

shellCmd = "ssh steffen@myubuntu ls"

retcode = subprocess.call(shellCmd, shell=True) # check ssh login to Ubuntu machine

if retcode == 0: # ssh login ok --> start parsing XML-file for rsync transfer

print "Server myubuntu is reachable. Return code for:", shellCmd, "=", retcode, "n"

parser = xml.sax.make_parser( )

handler = RsyncTransferHandler( )

parser.setContentHandler(handler)

parser.parse("FoldersForRsync.xml") # name of XML-file to be parsed

secondsTimeout = 3

print 'nRsync finished.nnExiting in', secondsTimeout, 'seconds.'

time.sleep(3)

else: # ping ok, but ssh login not --> destination machine runs probably with Windows

print "Server myubuntu is not reachable. Return code for:", shellCmd, "=", retcode, "n"

print 'nErrors happened. Check in detail the messages above.n'

i = raw_input("Press enter to finish.") # wait until input

else: # Ubuntu machine not reachable

print "Server myubuntu is not reachable. Return code for:", shellCmd, "=", retcode, "n"

print 'nErrors happened. Check in detail the messages above.n'

i = raw_input("Press enter to finish.") # wait until input

except OSError, e:

print >>sys.stderr, "Execution shell call", shellCmd, "failed. Error:", e

Run the Python script via double-clicking from the desktop

To make it fast and easy to run the Python script, I created a shell script called backup.command that is placed directly on my desktop with the following content:


cd ~/Documents/Programming/Python_scripts/

python RsyncMacWithUbuntu.py


Double-clicking it will open a Terminal and run the shell script.

Further ideas

Automate the backup process by scheduling it with cron

You could automate the backup process by creating a cron job (scheduled task) or an repeating alarm in iCal (see: (5)) to call the backup script e.g. every night. That might be especially helpful, if you have to backup a lot of data and both machines are running in the night.


As I usually switch off my machines in the night and don't have a fixed time, when both machines are running, I prefer to start the backup manually.

Create multiple copies of anything (similar to a real backup)

I use the python script above only to mirror or synchronized data files/directories between my two machines. I don't use it to make multiple copies (as a backup scheme would). Others might have different needs (see page 2 of (5) or (7)).

A final word

The steps described above are only one way of using rsync and ssh. If you are following them, try to understand what you are doing and check carefully with the -n ("dry run") switch, that rsync will do what you desire before actually doing the transfer.

Rsync is perfectly good for synchronizing / backing up of data files. It should be enough, to recover most of my important data (e.g. my music files and documents) if one of my hard drives crashes. If you need a real backup of your whole system (e.g. entire bootable filesystem images) you might want to consider other ways.

Furthermore, you may have better or more efficient ways of doing this. Please post them so others can see what options there are. And, of course, I may have made mistakes that I have not found yet. Please help me to correct them.

References

(1) http://en.wikipedia.org/wiki/Secure_Shell
(2) https://help.ubuntu.com/community/AdvancedOpenSSH
(3) http://rsync.samba.org/examples.html
(4) http://www.python.org/download/
(5) http://www.macdevcenter.com/pub/a/mac/2005/07/22/backup.html?page=1
(6) http://ubuntuforums.org/showthread.php?t=15082
(7) http://www.egg-tech.com/mac_backup/ to an external FireWire, USB and network drives using rsync
(8) http://troy.jdmz.net/rsync/index.html

Comments

Popular posts from this blog

Automatically mount a ntfs hard disk at login in Ubuntu 8.04

How to make screenshots and screencasts easily in Mac OS X?

Download Web videos with Firefox