Predicting the types of file fragments

Authors:
William C. Calhoun;Drue Coles
Affiliations:
Department of Mathematics, Computer Science and Statistics, Bloomsburg University of Pennsylvania, Bloomsburg, PA 17815, USA;Department of Mathematics, Computer Science and Statistics, Bloomsburg University of Pennsylvania, Bloomsburg, PA 17815, USA
Venue:
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Year:
2008

Citing 3
Cited 6

The Complexity of Some Problems on Subsequences and Supersequences

Journal of the ACM (JACM)
Content Based File Type Detection Algorithms

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9
Statistical Disk Cluster Classification for File Carving

IAS '07 Proceedings of the Third International Symposium on Information Assurance and Security

On Improving the Accuracy and Performance of Content-Based File Type Identification

ACISP '09 Proceedings of the 14th Australasian Conference on Information Security and Privacy
Classification of packet contents for malware detection

Journal in Computer Virology
Using purpose-built functions and block hashes to enable small block and sub-file forensics

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Bringing science to digital forensics with standardized forensic corpora

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Feature-based Type Identification of File Fragments

Security and Communication Networks
Photo forensics on shanzhai mobile phone

WASA'13 Proceedings of the 8th international conference on Wireless Algorithms, Systems, and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A problem that arises in computer forensics is to determine the type of a file fragment. An extension to the file name indicating the type is stored in the disk directory, but when a file is deleted, the entry for the file in the directory may be overwritten. This problem is easily solved when the fragment includes the initial header, which contains explicit type-identifying information, but it is more difficult to determine the type of a fragment from the middle of a file. We investigate two algorithms for predicting the type of a fragment: one based on Fisher's linear discriminant and the other based on longest common subsequences of the fragment with various sets of test files. We test the ability of the algorithms to predict a variety of common file types. Algorithms of this kind may be useful in designing the next generation of file-carvers - programs that reconstruct files when directory information is lost or deleted. These methods may also be useful in designing virus scanners, firewalls and search engines to find files that are similar to a given file.