SÁDI - Statistical Analysis for Data Type Identification

Authors:
Sarah J. Moody;Robert F. Erbacher
Affiliations:
-;-
Venue:
SADFE '08 Proceedings of the 2008 Third International Workshop on Systematic Approaches to Digital Forensic Engineering
Year:
2008

Citing 0
Cited 6

An intelligent technique to detect file formats and e-mail spam

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Web traffic profiling and characterization

Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research
Automated mapping of large binary objects using primitive fragment type classification

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Using purpose-built functions and block hashes to enable small block and sub-file forensics

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Bringing science to digital forensics with standardized forensic corpora

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Feature-based Type Identification of File Fragments

Security and Communication Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key task in digital forensic analysis is the location of relevant information within the computer system. Identification of the relevancy of data is often dependent upon the identification of the type of data being examined. Typical file type identification is based upon file extension or magic keys. These typical techniques fail in many typical forensic analysis scenarios such as needing to deal with embedded data, such as with Microsoft Word files, or file fragments. The SÁDI (Statistical Analysis Data Identification) technique applies statistical analysis of the byte values of the data in such a way that the accuracy of the technique does not rely on the potentially misleading metadata information but rather the values of the data itself. The development of SÁDI provides the capability to identify what digitally stored data actually represents and will also allow for the selective extraction of portions of the data for additional investigation; i.e., in the case of embedded data. Thus, our research provides a more effective type identification technique that does not fail on file fragments, embedded data types, or with obfuscated data.