Prediction suffix trees for supervised classification of sequences

  • Authors:
  • Christine Largeron-Leténo

  • Affiliations:
  • EURISE-Université Jean Monnet, 23 rue du Dr Michelon, 42023 Saint-Etienne Cedex 2, France

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2003

Quantified Score

Hi-index 0.10

Visualization

Abstract

This paper presents a statistical test and algorithms for patterns extraction and supervised classification of sequential data. First it defines the notion of prediction suffix tree (PST). This type of tree can be used to efficiently describe variable order chain. It performs better than the Markov chain of order L and at a lower storage cost. We propose an improvement of this model, based on a statistical test. This test enables us to control the risk of encountering different patterns in the model of the sequence to classify and in the model of its class. Applications to biological sequences are presented to illustrate this procedure. We compare the results obtained with different models (Markov chain of order L, Variable order model and the statistical test, with or without smoothing). We set out to show how the choice of the parameters of the models influences performance in these applications. Obviously these algorithms can be used in other fields in which the data are naturally ordered.