File classification using byte sub-stream kernels

  • Authors:
  • Olivier De Vel

  • Affiliations:
  • Information Assurance Branch, Information Networks Division, Defence Science and Technology Organisation, P.O. Box 1500, Edinburgh SA 5111, Australia

  • Venue:
  • Digital Investigation: The International Journal of Digital Forensics & Incident Response
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ability to automatically classify files based on their low-level, short-range structures is of particular importance in computer forensics. We report a study on the automatic learning of file classification using byte sub-stream kernels that capture these low-level structures. We automatically discover byte-level patterns in a file by extracting a byte sequence feature map and use a suffix trie data structure to efficiently store and manipulate the feature map. Using the feature map we compute the spectrum kernel and, together with a support vector machine classifier algorithm, we are able to efficiently categorize a variety of different system and application file types. Experiments have provided good file classification performance results.