Discovering consensus patterns in biological databases

Authors:
Mohamed Y. ElTabakh;Walid G. Aref;Mourad Ouzzani;Mohamed H. Ali
Affiliations:
Dept. of Computer Science, Purdue University, West Lafayette, IN;Dept. of Computer Science, Purdue University, West Lafayette, IN;Cyber Center, Purdue University, West Lafayette, IN;Dept. of Computer Science, Purdue University, West Lafayette, IN
Venue:
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Year:
2006

Citing 14
Cited 0

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
New techniques for the union-find problem

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Supporting electronic ink databases

Information Systems
On effective multi-dimensional indexing for strings

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding motifs using random projections

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Algorithm for Approximate Tandem Repeats

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
On effective classification of strings with wavelets

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Algorithms and Validity Measures

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Coding and Information Theory

Coding and Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consensus patterns, like motifs and tandem repeats, are highly conserved patterns with very few substitutions where no gaps are allowed. In this paper, we present a progressive hierarchical clustering technique for discovering consensus patterns in biological databases over a certain length range. This technique can discover consensus patterns with various requirements by applying a post-processing phase. The progressive nature of the hierarchical clustering algorithm makes it scalable and efficient. Experiments to discover motifs and tandem repeats on real biological databases show significant performance gain over non-progressive clustering techniques.