Tuesday, December 10, 2013

Manually Checking File Integrity

This post will go over a quick and dirty way to check file integrity on a *nix system.  This may be useful to check for things such as bit rot or malicious tampering, although, in this example, only backups are being checked for bit rot.  There are other tools that can do this better, and some of them will be the topic of future posts.

Step 1:
Specify the hashing algorithm you want to use.  While there are pros and cons for different hashing algorithms, in this example, md5 will be used for one main reason, it is faster than sha256.

Step 2:
Specify the files you want to ensure are still readable.  For example, the folder "Documents" and "Images", as well as all sub-folders are the current subject of scrutiny.  These files are respectively 1.7GB and 178MB.

Step 3:
Calculate the hashes of the files and store in a file.  This can be done with the find and xargs command.

find $DIR/Images -type f -print0 | xargs -0 md5sum > images.md5
find $DIR/Documents -type f -print0 | xargs -0 md5sum > docs.md5

Step 4:
Force the system to re-read the file from disk and calculate the hashes again.  This will notify you if a file has been tampered with, or is suffering from bit-rot.  In this example, the files have already been cached in main memory, and the checksums are being calculated shortly thereafter, thus not hitting the disk.  The whole point of this is to verify the on disk data is still good.

md5sum -c images.md5 | grep FAIL
real    0m0.542s
md5sum -c documents.md5 | grep FAIL
real    0m5.346s

To force the system to clear the cached files, first write any data buffered in memory to disk:

sync

Then, clear the cache:

echo 3 > /proc/sys/vm/drop_caches

Re-check the hashes.

md5sum -c images.md5 | grep FAIL
real    0m19.415s
md5sum -c documents.md5 | grep FAIL
real    1m6.774s

No comments:

Post a Comment