An efficient hybrid clustering algorithm for molecular sequences classification

  • Authors:
  • Wei-Bang Chen

  • Affiliations:
  • University of Alabama at Birmingham, Birmingham, Alabama

  • Venue:
  • Proceedings of the 44th annual Southeast regional conference
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome these problems, the profile Hidden Markov Models (HMMs) were used to establish a model for each cluster. However, this mixture method still randomly assigns the training data to one of the k clusters, which might cause the problem of generating empty groups. To solve this problem, we proposed a hybrid clustering method by using agglomerative hierarchical clustering algorithm to pre-cluster molecular sequences into k clusters and use the pre-clustered data to generate the HMM profile for each group. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed clustering algorithm.