Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Main-memory index structures with fixed-size partial keys
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining long sequential patterns in a noisy environment
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
SPADE: An Efficient Algorithm for Mining Frequent Sequences
Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
A Statistical Method for Finding Transcription Factor Binding Sites
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Double Combinatorial Approach to Discovering Patterns in Biological Sequences
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A sampling-based framework for parallel data mining
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification
IEEE Transactions on Knowledge and Data Engineering
Analyzing sequential patterns in retail databases
Journal of Computer Science and Technology
Mining sequential patterns for protein fold recognition
Journal of Biomedical Informatics
A new framework for detecting weighted sequential patterns in large sequence databases
Knowledge-Based Systems
Data & Knowledge Engineering
Permu-pattern: discovery of mutable permutation patterns with proximity constraint
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential Pattern Mining for Protein Function Prediction
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases
Data & Knowledge Engineering
Pattern matching with wildcards based on key character location
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Mining weighted sequential patterns in a sequence database with a time-interval weight
Knowledge-Based Systems
On probabilistic models for uncertain sequential pattern mining
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Mining interestingness measures for string pattern mining
Knowledge-Based Systems
Efficient algorithm for mining correlated Protein-DNA binding cores
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
General algorithms for mining closed flexible patterns under various equivalence relations
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Sequential pattern mining -- approaches and algorithms
ACM Computing Surveys (CSUR)
A two-phase algorithm for mining sequential patterns with differential privacy
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Biosequences typically have a small alphabet, a long length, and patterns containing gaps (i.e., "don't care") of arbitrary size. Mining frequent patterns in such sequences faces a different type of explosion than in transaction sequences primarily motivated in market-basket analysis. In this paper, we study how this explosion affects the classic sequential pattern mining, and present a scalable two-phase algorithm to deal with this new explosion. The Segment Phase first searches for short patterns containing no gaps, called segments. This phase is efficient. The Pattern Phase searches for long patterns containing multiple segments separated by variable length gaps. This phase is time consuming. The purpose of two phases is to exploit the information obtained from the first phase to speed up the pattern growth and matching and to prune the search space in the second phase. We evaluate this approach on synthetic and real life data sets.