Frequent Closed Sequence Mining without Candidate Maintenance

Authors:
Jianyong Wang;Jiawei Han;Chun Li
Affiliations:
IEEE;IEEE;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 25
Cited 27

FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Cyclic Association Rules

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Evaluation of Techniques for Classifying Biological Sequences

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
TSP: Mining Top-K Closed Sequential Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent-subsequence-based prediction of outer membrane proteins

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Parallel mining of closed sequential patterns

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
C-Miner: Mining Block Correlations in Storage Systems

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficiently Mining Frequent Closed Partial Orders

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
MAPO: mining API usages from open source repositories

Proceedings of the 2006 international workshop on Mining software repositories

A novel Boolean algebraic framework for association and pattern mining

WSEAS Transactions on Computers
A Knowledge Discovery Framework for Learning Task Models from User Interactions in Intelligent Tutoring Systems

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
A Boolean algebraic framework for association and pattern mining

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Emerging Cubes: Borders, size estimations and lossless reductions

Information Systems
Frequency-based load shedding over a data stream of tuples

Information Sciences: an International Journal
Condensed Representation of Sequential Patterns According to Frequency-Based Measures

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Discovering hybrid temporal patterns from sequences consisting of point- and interval-based events

Data & Knowledge Engineering
Mining convergent and divergent sequences in multidimensional data

International Journal of Business Intelligence and Data Mining
Exploiting Partial Problem Spaces Learned from Users' Interactions to Provide Key Tutoring Services in Procedural and Ill-Defined Domains

Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling
A flexible and efficient sequential pattern mining algorithm

International Journal of Intelligent Information and Database Systems
Margin-closed frequent sequential pattern mining

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Mining weighted sequential patterns in a sequence database with a time-interval weight

Knowledge-Based Systems
Learning task models in ill-defined domain using an hybrid knowledge discovery framework

Knowledge-Based Systems
Mining Web navigation patterns with a path traversal graph

Expert Systems with Applications: An International Journal
Fast mining of non-derivable episode rules in complex sequences

MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
TripRec: recommending trip routes from large scale check-in data

Proceedings of the 21st international conference companion on World Wide Web
On mining clinical pathway patterns from medical behaviors

Artificial Intelligence in Medicine
A general framework to encode heterogeneous information sources for contextual pattern mining

Proceedings of the 21st ACM international conference on Information and knowledge management
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
General algorithms for mining closed flexible patterns under various equivalence relations

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
BIDE-Based parallel mining of frequent closed sequences with mapreduce

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
An application of improved gap-BIDE algorithm for discovering access patterns

Applied Computational Intelligence and Soft Computing - Special issue on Awareness Science and Engineering
MSGPs: a novel algorithm for mining sequential generator patterns

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
Closed inter-sequence pattern mining

Journal of Systems and Software
A prediction framework based on contextual data to support Mobile Personalized Marketing

Decision Support Systems
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm, which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method. A thorough performance study with both sparse and dense, real, and synthetic data sets has demonstrated that BIDE significantly outperforms the previous algorithm: It consumes an order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.