Extraction of Recurrent Patterns from Stratified Ordered Trees

Authors:
Jean-Gabriel Ganascia
Affiliations:
-
Venue:
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Year:
2001

Citing 3
Cited 0

An Algorithm for Finding a Common Structure Shared by a Family of Strings

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Double Combinatorial Approach to Discovering Patterns in Biological Sequences

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new algorithm for pattern extraction from Stratified Ordered Trees (SOT). It first describes the SOT data structure that renders possible a representation of structured sequential data. Then it shows how it is possible to extract clusters of similar recurrent patterns from any SOT. The similarity on which our clustering algorithm is based is a generalized edit distance, also described in the paper. The algorithms presented have been tested on text mining: the aim was to detect recurrent syntactical motives in texts drawn from classical literature. Hopefully, this algorithm can be applied to many different fields where data are naturally sequential (e.g. financial data, molecular biology, traces of computation, etc.)