FreeSpan: frequent pattern-projected sequential pattern mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Mining long sequential patterns in a noisy environment
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Evaluation of Techniques for Classifying Biological Sequences
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Partial Periodic Patterns in Time Series Database
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth
ICDE '01 Proceedings of the 17th International Conference on Data Engineering
Frequent-subsequence-based prediction of outer membrane proteins
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
IEEE Transactions on Knowledge and Data Engineering
C-Miner: Mining Block Correlations in Storage Systems
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficiently Mining Frequent Closed Partial Orders
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code
IEEE Transactions on Software Engineering
SMArTIC: towards building an accurate, robust and scalable specification miner
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Constraint-based sequential pattern mining: the pattern-growth methods
Journal of Intelligent Information Systems
Mining contiguous sequential patterns from web logs
Proceedings of the 16th international conference on World Wide Web
Mining periodic patterns with gap requirement from sequences
ACM Transactions on Knowledge Discovery from Data (TKDD)
Frequent Closed Sequence Mining without Candidate Maintenance
IEEE Transactions on Knowledge and Data Engineering
Xproj: a framework for projected structural clustering of xml documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Information Processing and Management: an International Journal
Efficient mining of frequent sequence generators
Proceedings of the 17th international conference on World Wide Web
Partial least squares regression for graph mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Discovery of Frequent Approximate Sequential Patterns
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
CONTOUR: an efficient algorithm for discovering discriminating subsequences
Data Mining and Knowledge Discovery
Direct Discriminative Pattern Mining for Effective Classification
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Mining complex patterns across sequences with gap requirements
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
PMBC: Pattern mining from biological sequences with wildcard constraints
Computers in Biology and Medicine
Frequent patterns mining in multiple biological sequences
Computers in Biology and Medicine
Hi-index | 0.00 |
Mining frequent subsequence patterns is a typical data-mining problem and various efficient sequential pattern mining algorithms have been proposed. In many application domains (e.g., biology), the frequent subsequences confined by the predefined gap requirements are more meaningful than the general sequential patterns. In this article, we propose two algorithms, Gap-BIDE for mining closed gap-constrained subsequences from a set of input sequences, and Gap-Connect for mining repetitive gap-constrained subsequences from a single input sequence. Inspired by some state-of-the-art closed or constrained sequential pattern mining algorithms, the Gap-BIDE algorithm adopts an efficient approach to finding the complete set of closed sequential patterns with gap constraints, while the Gap-Connect algorithm efficiently mines an approximate set of long patterns by connecting short patterns. We also present several methods for feature selection from the set of gap-constrained patterns for the purpose of classification and clustering. Our extensive performance study shows that our approaches are very efficient in mining frequent subsequences with gap constraints, and the gap-constrained pattern based classification/clustering approaches can achieve high-quality results.