Smoothing dissimilarities to cluster binary data

Authors:
David B. Hitchcock;Zhimin Chen
Affiliations:
University of South Carolina, Department of Statistics, United States;Morehouse School of Medicine, Cardiovascular Research Institute, United States
Venue:
Computational Statistics & Data Analysis
Year:
2008

Citing 1
Cited 1

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics

James-Stein shrinkage to improve k-means cluster analysis

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

Cluster analysis attempts to group data objects into homogeneous clusters on the basis of the pairwise dissimilarities among the objects. When the data contain noise, we might consider performing a smoothing operation, either on the data themselves or on the dissimilarities, before implementing the clustering algorithm. Possible benefits to such pre-smoothing are discussed in the context of binary data. We suggest a method for cluster analysis of binary data based on ''smoothed'' dissimilarities. The smoothing method presented borrows ideas from shrinkage estimation of cell probabilities. Some simulation results are given showing that improvement in the accuracy of the clustering result is obtained via smoothing, especially in the case in which the observed data contain substantial noise. The method is illustrated with an example involving binary test item response data.