File Fragment Classification-The Case for Specialized Approaches

Authors:
Vassil Roussev;Simson L. Garfinkel
Affiliations:
-;-
Venue:
SADFE '09 Proceedings of the 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering
Year:
2009

Citing 0
Cited 6

An intelligent technique to detect file formats and e-mail spam

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Machine learning in computer forensics (and the lessons learned from machine learning in computer security)

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Automated mapping of large binary objects using primitive fragment type classification

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Using purpose-built functions and block hashes to enable small block and sub-file forensics

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Locating executable fragments with Concordia, a scalable, semantics-based architecture

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
Classification and Recovery of Fragmented Multimedia Files using the File Carving Approach

International Journal of Mobile Computing and Multimedia Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasingly advances in file carving, memory analysis and network forensics requires the ability to identify the underlying type of a file given only a file fragment. Work to date on this problem has relied on identification of specific byte sequences in file headers and footers, and the use of statistical analysis and machine learning algorithms taken from the middle of the file. We argue that these approaches are fundamentally flawed because they fail to consider the inherent internal structure in widely used file types such as PDF, DOC, and ZIP. We support our argument with a bottom-up examination of some popular formats and an analysis of TK PDF files. Based on our analysis, we argue that specialized methods targeted to each specific file type will be necessary to make progress in this area.