Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient enumeration of frequent sequences
Proceedings of the seventh international conference on Information and knowledge management
Mining asynchronous periodic patterns in time series data
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Computation and Visualization of Degenerate Repeats in Complete Genomes
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient Mining of Partial Periodic Patterns in Time Series Database
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Mining minimal distinguishing subsequence patterns with gap constraints
Knowledge and Information Systems
Out-of-core coherent closed quasi-clique mining from large dense graph databases
ACM Transactions on Database Systems (TODS)
Mining periodic patterns with gap requirement from sequences
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient mining of iterative patterns for software specification discovery
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining patterns and rules for software specification discovery
Proceedings of the VLDB Endowment
Identification of class specific discourse patterns
Proceedings of the 17th ACM conference on Information and knowledge management
Mining complex patterns across sequences with gap requirements
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
A Comparative Study of Pattern Matching Algorithms on Sequences
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Mining periodic behaviors for moving objects
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Keyword extraction based on sequential pattern mining
Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Incremental aggregation on multiple continuous queries
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Mining periodic behaviors of object movements for animal and biological sustainability studies
Data Mining and Knowledge Discovery
Probabilistically ranking web article quality based on evolution patterns
Transactions on Large-Scale Data- and Knowledge-Centered Systems VI
PMBC: Pattern mining from biological sequences with wildcard constraints
Computers in Biology and Medicine
MAIL: mining sequential patterns with wildcards
International Journal of Data Mining and Bioinformatics
Hi-index | 0.01 |
We study a problem of mining frequently occurring periodic patterns with a gap requirement from sequences. Given a character sequence S of length L and a pattern P of length l, we consider P a frequently occurring pattern in S if the probability of observing P given a randomly picked length-l subsequence of S exceeds a certain threshold. In many applications, particularly those related to bioinformatics, interesting patterns are periodic with a gap requirement. That is to say, the characters in P should match subsequences of S in such a way that the matching characters in S are separated by gaps of more or less the same size. We show the complexity of the mining problem and discuss why traditional mining algorithms are computationally infeasible. We propose practical algorithms for solving the problem, and study their characteristics. We also present a case study in which we apply our algorithms on some DNA sequences. We discuss some interesting patterns obtained from the case study.