Wednesday, July 9, 2008

(rsync tool) rsync summary

by lar3ry (grab @ http://forums.macosxhints.com/archive/index.php/t-68550.html)

Would a checksum mode have to download the old backup to do a check sum on it? Or does it store the checksums in a separate file?

Caution: LONG ANSWER HERE!

Rsync doesn't download anything it doesn't have to download. It simply runs as two processes (a "sender" process that looks at the stuff you are backing up, and a "receiver" process that looks at where you are backing things up to). These processes run at the same time, and use logic that looks more or less like the simplified logic listed below:


Sender creates a list of files on the source system

Receiver creates a list of files on the destination system

The two processes communicate their lists to each other, skipping over "identical" files, and update "modified" files.

If a file exists on the source, but not the destination, it's considered an added file, and it's sent to the destination system.

If a file exists on the destination, but not the source, it's considered a deleted file, and is deleted from the destination system if the --delete option is given to rsync.

If a file has any differences between the version on the source and destination, it's considered a modified file, and the differences between the source and destination are communicated.


It's this last bit that changes depending on the -c option. Normally, rsync will just look at a file's meta-data (specifically, the modification time and size). If the meta-data is different between the source and destination files, then it's considered modified. With the -c option, the meta-data is ignored, and rolling checksums are used instead. This is slower, since the checksums need to be computed for each file, but more accurate, since meta information can be "spoofed" (I can change a single byte in a file and then use a utility like "touch" to restore a file's modification time so it appears that the file hasn't changed, which would defeat rsync's default checks).

Rsync has its own internal "remote-update" protocol that allows the program to transfer just the differences between two sets of files. The actual protocol is documented at the main rsync web site. (http://samba.anu.edu.au/rsync/tech_report/)

So, why would using -c improve things on a Samba share? Well, Windows was written on MS-DOS's code base and uses the FAT file system. This file system uses a time stamp that has a 2-second granularity. OS X, on the other hand, uses standard Unix time stamps, which have a 1-second granularity. That means that the times on OS X don't map exactly to DOS time stamps, and some information is lost, which can affect the reliability of using the meta data for determining if two files are identical.