Step 1:
Specify the hashing algorithm you want to use. While there are pros and cons for different hashing algorithms, in this example, md5 will be used for one main reason, it is faster than sha256.
Step 2:
Specify the files you want to ensure are still readable. For example, the folder "Documents" and "Images", as well as all sub-folders are the current subject of scrutiny. These files are respectively 1.7GB and 178MB.
Step 3:
Calculate the hashes of the files and store in a file. This can be done with the find and xargs command.
find $DIR/Images -type f -print0 | xargs -0 md5sum > images.md5 find $DIR/Documents -type f -print0 | xargs -0 md5sum > docs.md5
Step 4:
Force the system to re-read the file from disk and calculate the hashes again. This will notify you if a file has been tampered with, or is suffering from bit-rot. In this example, the files have already been cached in main memory, and the checksums are being calculated shortly thereafter, thus not hitting the disk. The whole point of this is to verify the on disk data is still good.
md5sum -c images.md5 | grep FAIL real 0m0.542s md5sum -c documents.md5 | grep FAIL real 0m5.346s
To force the system to clear the cached files, first write any data buffered in memory to disk:
sync
Then, clear the cache:
echo 3 > /proc/sys/vm/drop_caches
Re-check the hashes.
md5sum -c images.md5 | grep FAIL real 0m19.415s md5sum -c documents.md5 | grep FAIL real 1m6.774s
No comments:
Post a Comment