An efficient hybrid clustering algorithm for molecular sequences classification

Authors:
Wei-Bang Chen
Affiliations:
University of Alabama at Birmingham, Birmingham, Alabama
Venue:
Proceedings of the 44th annual Southeast regional conference
Year:
2006

Citing 2
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Data mining: concepts and techniques

Data mining: concepts and techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome these problems, the profile Hidden Markov Models (HMMs) were used to establish a model for each cluster. However, this mixture method still randomly assigns the training data to one of the k clusters, which might cause the problem of generating empty groups. To solve this problem, we proposed a hybrid clustering method by using agglomerative hierarchical clustering algorithm to pre-cluster molecular sequences into k clusters and use the pre-clustered data to generate the HMM profile for each group. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed clustering algorithm.