Parsing Without a Grammar: Making Sense of Unknown File Formats

  • Authors:
  • Levon Lloyd;Steven Skiena

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The thousands of specialized structured file formats inuse today present a substantial barrier to freely exchanginginformation between applications programs. We considerthe problem of deducing such basic features as thewhitespace characters, bracketing delimiter symbols, andself-delimiter characters of a given file format from one ormore example files. We demonstrate that for sufficientlylarge example files, we can typically identify the basic featuresof interest.