Clustering sequences by overlap

Authors:
Dietmar H. Dorr;Anne M. Denton
Affiliations:
Department of Computer Science, North Dakota State University, Fargo, ND, 58105, USA.;Department of Computer Science, North Dakota State University, Fargo, ND, 58105, USA
Venue:
International Journal of Data Mining and Bioinformatics
Year:
2009

Citing 10
Cited 4

Classifying molecular sequences using a linkage graph with their pairwise similarities

Theoretical Computer Science - Special issue: Genome informatics
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Graph-based clustering for finding distant relationships in a large set of protein sequences

Bioinformatics
A generic motif discovery algorithm for sequential data

Bioinformatics
tuple_plot: Fast pairwise nucleotide sequence comparison with noise suppression

Bioinformatics
A novel pattern recognition algorithm to classify membrane protein unfolding pathways with high-throughput single-molecule force spectroscopy

Bioinformatics
Multiple alignment by aligning alignments

Bioinformatics
BAG: a graph theoretic sequence clustering algorithm

International Journal of Data Mining and Bioinformatics
An efficient motif discovery algorithm with unknown motif length and number of binding sites

International Journal of Data Mining and Bioinformatics

Establishing relationships among patterns in stock market data

Data & Knowledge Engineering
Generalised Sequence Signatures through symbolic clustering

International Journal of Data Mining and Bioinformatics
Semi-supervised clustering algorithm for haplotype assembly problem based on MEC model

International Journal of Data Mining and Bioinformatics
Alns: a new searchable and filterable sequence alignment format

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A clustering algorithm is introduced that combines the strengths of clustering and motif finding techniques. Clusters are identified based on unambiguously defined sequence sections as in motif finding algorithms. The definition of similarity within clusters allows transitive matches and, thereby, enables the discovery of remote homologies that cannot be found through motif-finding algorithms. Directed Acyclic Graph (DAG) structures are constructed that link short clusters to the longer ones. We compare the clustering results to the corresponding domains in the InterPro database. A second comparison shows that annotations based on our domains are inherently more consistent than those based on InterPro domains.