Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
An empirical comparison of four initialization methods for the K-Means algorithm
Pattern Recognition Letters
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Model-based clustering of high-dimensional data: A review
Computational Statistics & Data Analysis
Hi-index | 0.03 |
Clustering techniques play an important role in analyzing high dimensional data that is common in high-throughput screening such as microarray and mass spectrometry data. Effective use of the high dimensionality and some replications can help to increase clustering accuracy and stability. In this article a new partitioning algorithm with a robust distance measure is introduced to cluster variables in high dimensional low sample size (HDLSS) data that contain a large number of independent variables with a small number of replications per variable. The proposed clustering algorithm, PPCLUST, considers data from a mixture distribution and uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity to separate the mixture components. PPCLUST is able to efficiently cluster a large number of variables in the presence of very few replications. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. Numerical studies and an application to microarray gene expression data for colorectal cancer study are discussed.