Finding relevant patterns in bursty sequences

Authors:
Alexander Lachmann;Mirek Riedewald
Affiliations:
RWTH, Aachen, Germany;Cornell University, Ithaca, New York
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 13
Cited 1

FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

Proceedings of the 6th annual ACM international workshop on Web information and data management
Mining Sequential Patterns from Large Data Sets (The Kluwer International Series on Advances in Database Systems)

Mining Sequential Patterns from Large Data Sets (The Kluwer International Series on Advances in Database Systems)
LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern

ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
Sequence Data Mining (Advances in Database Systems)

Sequence Data Mining (Advances in Database Systems)

Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence data is ubiquitous and finding frequent sequences in a large database is one of the most common problems when analyzing sequence data. Unfortunately many sources of sequence data, e.g., sensor networks for data-driven science, RFID-based supply chain monitoring, and computing system monitoring infrastructure, produce a challenging workload for sequence mining. It is common to find bursts of events of the same type. Such bursts result in high mining cost, because input sequences are longer. An even greater challenge is that these bursts tend to produce an overwhelming number of irrelevant repetitive sequence patterns with high support. Simply raising the support threshold is not a solution, because at some point interesting sequences will get eliminated. As an alternative we propose a novel transformation of the input sequences. We show that this transformation has several desirable properties. First, the transformed data can still be mined with existing sequence mining algorithms. Second, for a given support threshold the mining result can often be obtained much faster and it is usually much smaller and easier to interpret. Third, and most importantly, we show that the result sequences retain the important characteristics of the sequences that would have been found in the original (not transformed) data. We validate our technique with an experimental study using synthetic and real data.