A Similarity Measure Using Smallest Context-Free Grammars

  • Authors:
  • Daniele Cerra;Mihai Datcu

  • Affiliations:
  • -;-

  • Venue:
  • DCC '10 Proceedings of the 2010 Data Compression Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work presents a new approximation for the Kolmogorov complexity of strings based on compression with smallest Context Free Grammars (CFG). If, for a given string, a dictionary containing its relevant patterns may be regarded as a model, a Context-Free Grammar may represent a generative model, with all of its rules (and as a consequence its own size) being meaningful. Thus, we define a new complexity approximation which takes into account the size of the string model, in a representation similar to the Minimum Description Length. These considerations result in the definition of a new compression-based similarity measure: its novelty lies in the fact that the impact of complexity overestimations, due to the limits that a real compressor has, can be accounted for and decreased.