Smoothing and compression with stochastic k-testable tree languages

  • Authors:
  • Juan Ramón Rico-Juan;Jorge Calera-Rubio;Rafael C. Carrasco

  • Affiliations:
  • Departament de Llenguatges i Sistemes Informítics, Universitat d'Alacant, E-03071 Alacant, Spain;Departament de Llenguatges i Sistemes Informítics, Universitat d'Alacant, E-03071 Alacant, Spain;Departament de Llenguatges i Sistemes Informítics, Universitat d'Alacant, E-03071 Alacant, Spain

  • Venue:
  • Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well-known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.