Content Based File Type Detection Algorithms

Authors:
Mason McDaniel;M. Hossain Heydari
Affiliations:
-;-
Venue:
HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9
Year:
2003

Citing 0
Cited 13

A Study of Malcode-Bearing Documents

DIMVA '07 Proceedings of the 4th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
On Improving the Accuracy and Performance of Content-Based File Type Identification

ACISP '09 Proceedings of the 14th Australasian Conference on Information Security and Privacy
Fast file-type identification

Proceedings of the 2010 ACM Symposium on Applied Computing
An intelligent technique to detect file formats and e-mail spam

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Machine learning in computer forensics (and the lessons learned from machine learning in computer security)

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Classification of packet contents for malware detection

Journal in Computer Virology
GP-Fileprints: file types detection using genetic programming

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Predicting the types of file fragments

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Automated mapping of large binary objects using primitive fragment type classification

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Code type revealing using experiments framework

DBSec'12 Proceedings of the 26th Annual IFIP WG 11.3 conference on Data and Applications Security and Privacy
Feature-based Type Identification of File Fragments

Security and Communication Networks
Classification and Recovery of Fragmented Multimedia Files using the File Carving Approach

International Journal of Mobile Computing and Multimedia Communications
An information-theoretical approach to high-speed flow nature identification

IEEE/ACM Transactions on Networking (TON)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying the true type of a computer file can be a difficult problem. Previous methods of file type recognition include fixed file extensions, fixed "magic numbers" stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. This paper proposes algorithms for automatically generating "fingerprints" of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them. Recognition is performed by three different algorithms based on: byte frequency analysis, byte frequency cross-correlation analysis, and file header/trailer analysis. Tests were run to measure the accuracy of these algorithms. The accuracy varied from 23% to 96% depending upon whichalgorithm was used.These algorithms could be used by virus scanning packages, firewalls, intrusion detectionsystems, forensic analyses of computer hard drives, web browsers, or any other program that needs to identify the types of files for proper operation. File type detection is also important to the operating systems for correct identification and handling of files regardless of file extension.