An evaluation of forensic similarity hashes

Authors:
Vassil Roussev
Affiliations:
Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA
Venue:
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Year:
2011

Citing 3
Cited 1

Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Building a Better Similarity Trap with Statistically Improbable Features

HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Identifying almost identical files using context triggered piecewise hashing

Digital Investigation: The International Journal of Digital Forensics & Incident Response

AndroSimilar: robust statistical feature signature for Android malware detection

Proceedings of the 6th International Conference on Security of Information and Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The fast growth of the average size of digital forensic targets demands new automated means to quickly, accurately and reliably correlate digital artifacts. Such tools need to offer more flexibility than the routine known-file filtering based on crypto hashes. Currently, there are two tools for which NIST has produced reference hash sets-ssdeep and sdhash. The former provides a fixed-sized fuzzy hash based on random polynomials, whereas the latter produces a variable-length similarity digest based on statistically-identified features packed into Bloom filters. This study provides a baseline evaluation of the capabilities of these tools both in a controlled environment and on real-world data. The results show that the similarity digest approach significantly outperforms in terms of recall and precision in all tested scenarios and demonstrates robust and scalable behavior.