FlExPat: Flexible Extraction of Sequential Patterns

Authors:
Pierre-Yves Rolland
Affiliations:
-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 7

Pattern Detection and Discovery: The Case of Music Data Mining

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Efficient mining of sequential patterns with time constraints by delimited pattern growth

Knowledge and Information Systems
Efficient frequent sequence mining by a dynamic strategy switching algorithm

The VLDB Journal — The International Journal on Very Large Data Bases
Mining sequential patterns across multiple sequence databases

Data & Knowledge Engineering
Analysis on repeat-buying patterns

Knowledge-Based Systems
An efficient approach to extracting approximate repeating patterns in music databases

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper addresses sequential data mining, a sub-area of data mining where the data to be analyzed is organized in sequences. In many problem domains a natural ordering exists over data. Examples of sequential databases (SDBs) include: (a)collections of temporal data sequences, such as chronologicalseries of daily stock indices or multimedia data (sound, music, video..); and (b) macromolecule banks, where aminoacid or proteic sequences are represented as strings.In a SDB it is often valuable to detect regularities through one or several sequences. In particular, finding exact or approximate repetitions of segments ca be utilized directly (e.g.for determining the biochemical activity of a protein region) or indirectly, e.g. for prediction in finance. To this end, we present concepts and an algorithm for automatically extracting sequential patterns from a sequential database. Such a patter is defined as a group of significantly similar segments from one or several sequences. Appropriate functions for measuringsimilarity between sequence segments are proposed, generalizing the edit distance framework. There is a trade off here between flexibility, particularly in sequence data representation and in associated similarity metrics, and computational efficiency. Wedesigned the FlExPat algorithm to satisfactorily cope with this trade-off. FlExPat's complexity is in practice lesser than quadratic in the total length of the SDB analyzed, while allowinghigh flexibility. Some experimental results obtained with FlExPat on music data are presented and commented.