CAMLS: a constraint-based apriori algorithm for mining long sequences

Authors:
Yaron Gonen;Nurit Gal-Oz;Ran Yahalom;Ehud Gudes
Affiliations:
Department of Computer Science, Ben Gurion University of the Negev, Israel;Department of Computer Science, Ben Gurion University of the Negev, Israel;Department of Computer Science, Ben Gurion University of the Negev, Israel;Department of Computer Science, Ben Gurion University of the Negev, Israel
Venue:
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Year:
2010

Citing 13
Cited 1

Data preparation for data mining

Data preparation for data mining
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations

ACM SIGMOD Record
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
A new algorithm for gap constrained sequence mining

Proceedings of the 2004 ACM symposium on Applied computing
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Constraint-based sequential pattern mining: the pattern-growth methods

Journal of Intelligent Information Systems

Users tracking and roles mining in web-based applications

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.