Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Data mining: concepts and techniques
Data mining: concepts and techniques
Hi-index | 0.00 |
The k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome these problems, the profile Hidden Markov Models (HMMs) were used to establish a model for each cluster. However, this mixture method still randomly assigns the training data to one of the k clusters, which might cause the problem of generating empty groups. To solve this problem, we proposed a hybrid clustering method by using agglomerative hierarchical clustering algorithm to pre-cluster molecular sequences into k clusters and use the pre-clustered data to generate the HMM profile for each group. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed clustering algorithm.