If you have several similar large files to upload over a slow link you can use ‘rdiff’ to optimise the transfer. ‘rdiff’ compares files block by block and produces a delta file that contains only the blocks that differ.
It is therefore of most use when the two files are mostly similar but some parts different such as powerpoint product presentations targeted for two different customers.
In this example we have two similar but different documents:
$ dir *.doc
643,584 document-edition-two.doc
634,880 document-original.doc
Generate an MD5 hash to be used later to verify the file integrity
$ md5 document-edition-two.doc
8280AEAFABC0833D5FEC64CE5FEF6237Â document-edition-two.doc
Prepare a “signature” file which contains hash codes of each block in the base file.
$ rdiff signature document-original.doc document.sig
Next I use that signature file to see which blocks are different in the second file and extract them to a delta file.
$ rdiff delta document.sig document-edition-two.doc document.delta
$ dir document.delta
78,504 document.delta
Note that the “delta” file is only 12% of the size of “document-edition-two.doc”, the relative file size depends on how similar the two documents are.
Now I upload the files “document-original.doc” and “document.delta”
On the server I (or the recipients of the files) run ‘rdiff’ to generate the second document from the first and the delta.
$ rdiff patch document-original.doc document.delta document-edition-two-reconstructed.doc
Check the MD5 hash to confirm that the second document has been faithfully reproduced.
$
$ md5 document-edition-two-reconstructed.doc
8280AEAFABC0833D5FEC64CE5FEF6237Â document-edition-two-reconstructed.doc
Download rdiff for Windows, compiled with Cygwin Â
0 comments ↓
There are no comments yet...Kick things off by filling out the form below.
You must log in to post a comment.