Fast file-type identification

  • Authors:
  • Irfan Ahmed;Kyung-suk Lhee;Hyunjung Shin;ManPyo Hong

  • Affiliations:
  • Ajou University, South Korea;Ajou University, South Korea;Ajou University, South Korea;Ajou University, South Korea

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes two techniques to reduce the classification time of content-based file type identification. The first is a feature selection technique, which uses a subset of highly-occurring byte patterns in building the representative model of a file type and classifying files. The second is a content sampling technique, which uses a subset of file content in obtaining its byte-frequency distribution. Our initial experiments show that the proposed approaches are promising even the simple 1-gram features are used for the classification.