An Empirical Analysis of Disk Sector Hashes for Data Carving

  • Authors:
  • Yoginder Singh Dandass;Nathan Joseph Necaise;Sherry Reede Thomas

  • Affiliations:
  • Computer Science and Engineering, Mississippi State University, Mississippi State, Mississippi, USA;Computer Science and Engineering, Mississippi State University, Mississippi State, Mississippi, USA;Computer Science and Engineering, Mississippi State University, Mississippi State, Mississippi, USA

  • Venue:
  • Journal of Digital Forensic Practice
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering known illicit material on digital storage devices is an important component of a digital forensic investigation. Using existing data carving techniques and tools, it is typically difficult to recover remaining fragments of deleted illicit files whose file system metadata and file headers have been overwritten by newer files. In such cases, a sector-based scan can be used to locate those sectors whose content matches those of sectors from known illicit files. However, brute-force sector-by-sector comparison is prohibitive in terms of time required. Techniques that compute and compare hash-based signatures of sectors in order to filter out those sectors that do not produce the same signatures as sectors from known illicit files are required for accelerating the process. This article reports the results of a case study in which the hashes for over 528 million sectors extracted from over 433,000 files of different types were analyzed. The hashes were computed using SHA1, MD5, CRC64, and CRC32 algorithms and hash collisions of sectors from JPEG and WAV files to other sectors were recorded. The analysis of the results shows that although MD5 and SHA1 produce no false-positive indications, the occurrence of false positives is relatively low for CRC32 and especially CRC64. Furthermore, the CRC-based algorithms produce considerably smaller hashes than SHA1 and MD5, thereby requiring smaller storage capacities. CRC64 provides a good compromise between number of collisions and storage capacity required for practical implementations of sector-scanning forensic tools.