A Study of Malcode-Bearing Documents
DIMVA '07 Proceedings of the 4th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
On Improving the Accuracy and Performance of Content-Based File Type Identification
ACISP '09 Proceedings of the 14th Australasian Conference on Information Security and Privacy
Proceedings of the 2010 ACM Symposium on Applied Computing
An intelligent technique to detect file formats and e-mail spam
Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Proceedings of the 4th ACM workshop on Security and artificial intelligence
Classification of packet contents for malware detection
Journal in Computer Virology
GP-Fileprints: file types detection using genetic programming
EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Predicting the types of file fragments
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Automated mapping of large binary objects using primitive fragment type classification
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Code type revealing using experiments framework
DBSec'12 Proceedings of the 26th Annual IFIP WG 11.3 conference on Data and Applications Security and Privacy
Feature-based Type Identification of File Fragments
Security and Communication Networks
Classification and Recovery of Fragmented Multimedia Files using the File Carving Approach
International Journal of Mobile Computing and Multimedia Communications
An information-theoretical approach to high-speed flow nature identification
IEEE/ACM Transactions on Networking (TON)
Hi-index | 0.00 |
Identifying the true type of a computer file can be a difficult problem. Previous methods of file type recognition include fixed file extensions, fixed "magic numbers" stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. This paper proposes algorithms for automatically generating "fingerprints" of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them. Recognition is performed by three different algorithms based on: byte frequency analysis, byte frequency cross-correlation analysis, and file header/trailer analysis. Tests were run to measure the accuracy of these algorithms. The accuracy varied from 23% to 96% depending upon whichalgorithm was used.These algorithms could be used by virus scanning packages, firewalls, intrusion detectionsystems, forensic analyses of computer hard drives, web browsers, or any other program that needs to identify the types of files for proper operation. File type detection is also important to the operating systems for correct identification and handling of files regardless of file extension.