NP-hard problems in hierarchical-tree clustering
Acta Informatica
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining clustering and co-training to enhance text classification using unlabelled data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster ensembles: a knowledge reuse framework for combining partitionings
Eighteenth national conference on Artificial intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Selecting differentially expressed genes using minimum probability of classification error
Journal of Biomedical Informatics
Hierarchical clustering, languages and cancer
EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Some new indexes of cluster validity
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a melanoma and a breast cancer data set. The method is validated by showing that the Silhouette, Dunne's and Davies-Bouldin's cluster validation indices are better for the proposed algorithm than those obtained by k-means and another consensus clustering algorithm. The hypotheses of the consensus clusters on both the data sets are corroborated by clear genetic markers and 100 percent classification accuracy. In Bittner et al.'s melanoma data set, a previously hypothesized primary cluster is recognized as the largest consensus cluster and a new partition of this cluster into two subclusters is proposed. In van't Veer et al.'s breast cancer data set, previously proposed "basal” and "luminal A” subtypes are clearly recognized as the two predominant clusters. Furthermore, a new hypothesis is provided about the existence of two subgroups within the "basal” subtype in this data set. The clusters of van't Veer's data set is also validated by high classification accuracy obtained in the data set of van de Vijver et al.