Frequent patterns mining in multiple biological sequences

Authors:
Ling Chen;Wei Liu
Affiliations:
-;-
Venue:
Computers in Biology and Medicine
Year:
2013

Citing 19
Cited 0

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Finding Maximal Repetitions in a Word in Linear Time

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
From sequential pattern mining to structured pattern mining: a pattern-growth approach

Journal of Computer Science and Technology
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Exhaustive whole-genome tandem repeats search

Bioinformatics
STAR: an algorithm to Search for Tandem Approximate Repeats

Bioinformatics
Finding LPRs in DNA Sequence Based on a New Index — SUA

BIBE '05 Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
A Scalable Sequential Pattern Mining Algorithm

AICCSA '06 Proceedings of the IEEE International Conference on Computer Systems and Applications
Mining sequential patterns for protein fold recognition

Journal of Biomedical Informatics
A two-stage methodology for sequence classification based on sequential pattern mining and optimization

Data & Knowledge Engineering
Biosequence Analysis in PRISM

ICLP '08 Proceedings of the 24th International Conference on Logic Programming
Optimal extraction of motif patterns in 2D

Information Processing Letters
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Prism: An effective approach for frequent sequence mining via prime-block encoding

Journal of Computer and System Sciences
Approximate Repeating Pattern Mining with Gap Requirements

ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
Towards bounding sequential patterns

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing algorithms for mining frequent patterns in multiple biosequences may generate multiple projected databases and short candidate patterns, which can increase computation time and memory requirement. In order to overcome such shortcomings, we propose a fast and efficient algorithm for mining frequent patterns in multiple biological sequences (MSPM). We first present the concept of a primary pattern, which can be extended to form larger patterns in the sequence. To detect frequent primary patterns, a prefix tree is constructed. Based on this prefix tree, a pattern-extending approach is also presented to mine frequent patterns without producing a large number of irrelevant candidate patterns. The experimental results show that the MSPM algorithm can achieve not only faster speed, but also higher quality results as compared with other methods.