Prediction suffix trees for supervised classification of sequences

Authors:
Christine Largeron-Leténo
Affiliations:
EURISE-Université Jean Monnet, 23 rue du Dr Michelon, 42023 Saint-Etienne Cedex 2, France
Venue:
Pattern Recognition Letters
Year:
2003

Citing 4
Cited 2

Kendall's advanced theory of statistics

Kendall's advanced theory of statistics
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Improved Smoothing for Probabilistic Suffix Trees Seen as Variable Order Markov Chains

ECML '02 Proceedings of the 13th European Conference on Machine Learning
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Time discretisation applied to anomaly detection in a marine engine

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
A suffix tree based prediction scheme for pervasive computing environments

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Quantified Score

Hi-index	0.10

Visualization

Abstract

This paper presents a statistical test and algorithms for patterns extraction and supervised classification of sequential data. First it defines the notion of prediction suffix tree (PST). This type of tree can be used to efficiently describe variable order chain. It performs better than the Markov chain of order L and at a lower storage cost. We propose an improvement of this model, based on a statistical test. This test enables us to control the risk of encountering different patterns in the model of the sequence to classify and in the model of its class. Applications to biological sequences are presented to illustrate this procedure. We compare the results obtained with different models (Markov chain of order L, Variable order model and the statistical test, with or without smoothing). We set out to show how the choice of the parameters of the models influences performance in these applications. Obviously these algorithms can be used in other fields in which the data are naturally ordered.