Data preparation for data mining
Data preparation for data mining
Sequence mining in categorical domains: incorporating constraints
Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
A new algorithm for gap constrained sequence mining
Proceedings of the 2004 ACM symposium on Applied computing
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Constraint-based sequential pattern mining: the pattern-growth methods
Journal of Intelligent Information Systems
Users tracking and roles mining in web-based applications
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Hi-index | 0.00 |
Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.