Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Building a Better Similarity Trap with Statistically Improbable Features
HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Identifying almost identical files using context triggered piecewise hashing
Digital Investigation: The International Journal of Digital Forensics & Incident Response
AndroSimilar: robust statistical feature signature for Android malware detection
Proceedings of the 6th International Conference on Security of Information and Networks
Hi-index | 0.00 |
The fast growth of the average size of digital forensic targets demands new automated means to quickly, accurately and reliably correlate digital artifacts. Such tools need to offer more flexibility than the routine known-file filtering based on crypto hashes. Currently, there are two tools for which NIST has produced reference hash sets-ssdeep and sdhash. The former provides a fixed-sized fuzzy hash based on random polynomials, whereas the latter produces a variable-length similarity digest based on statistically-identified features packed into Bloom filters. This study provides a baseline evaluation of the capabilities of these tools both in a controlled environment and on real-world data. The results show that the similarity digest approach significantly outperforms in terms of recall and precision in all tested scenarios and demonstrates robust and scalable behavior.