A new formulation of the coVAT algorithm for visual assessment of clustering tendency in rectangular data

  • Authors:
  • Timothy C. Havens;James C. Bezdek

  • Affiliations:
  • Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824;Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, Victoria 3010, Australia

  • Venue:
  • International Journal of Intelligent Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since 1998, a graphical representation used in visual clustering called the reordered dissimilarity image or cluster heat map has appeared in more than 4000 biological or biomedical publications. These images are typically used to visually estimate the number of clusters in a data set, which is the most important input to most clustering algorithms, including the popularly chosen fuzzy c-means and crisp k-means. This paper presents a new formulation of a matrix reordering algorithm, coVAT, which is the only known method for providing visual clustering information on all four types of cluster structure in rectangular relational data. Finite rectangular relational data are an m× n array R of relational values between m row objects Or and n column objects Oc. R presents four clustering problems: clusters in Or, Oc, Or∪c, and coclusters containing some objects from each of Or and Oc. coVAT1 is a clustering tendency algorithm that provides visual estimates of the number of clusters to seek in each of these problems by displaying reordered dissimilarity images. We provide several examples where coVAT1 fails to do its job. These examples justify the introduction of coVAT2, a modification of coVAT1 based on a different reordering scheme. We offer several examples to illustrate that coVAT2 may detect coclusters in R when coVAT1 does not. Furthermore, coVAT2 is not limited to just relational data R. The R matrix can also take the form of feature data, such as gene microarray data where each data element is a real number: Positive values indicate upregulation, and negative values indicate downregulation. We show examples of coVAT2 on microarray data that indicate coVAT2 shows cluster tendency in these data. © 2012 Wiley Periodicals, Inc. © 2012 Wiley Periodicals, Inc.