Algorithms for clustering data
Algorithms for clustering data
Properties of Embedding Methods for Similarity Searching in Metric Spaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Stability-based validation of clustering solutions
Neural Computation
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
A statistical model of cluster stability
Pattern Recognition
Scale-based clustering using the radial basis function network
IEEE Transactions on Neural Networks
Hi-index | 0.01 |
In the present paper, 188 prokaryote genomes are classified by separately calculating the compositional spectra for the coding and the non-coding parts of the genomes. For each subsequence, the compositional spectrum is transformed into the corresponding point in a vector space. This enables the categorization of genomes into meaningful groups by a formal method. Repeated clustering performed for the coding and the non-coding genome parts makes it possible to estimate the true number of the genome clusters. The method we propose is based on a new application of external cluster validation indexes and on the misclassified quantities obtained in the process of repeated clustering. Besides, we have constructed additional data embedding into the appropriate Euclidean space only on the basis of the distances between compositional spectra. Biological evaluation of the results obtained for the 4-letter and the 2-letter alphabets substantiates the appropriateness of the resulting cluster-based classification.