bigVAT: Visual assessment of cluster tendency for large data sets

  • Authors:
  • Jacalyn M. Huband;James C. Bezdek;Richard J. Hathaway

  • Affiliations:
  • Computer Science Department, University of West Florida, Pensacola, FL 32514, USA;Computer Science Department, University of West Florida, Pensacola, FL 32514, USA;Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA 30460, USA

  • Venue:
  • Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Assessment of clustering tendency is an important first step in cluster analysis. One tool for assessing cluster tendency is the Visual Assessment of Tendency (VAT) algorithm. VAT produces an image matrix that can be used for visual assessment of cluster tendency in either relational or object data. However, VAT becomes intractable for large data sets. The revised VAT (reVAT) algorithm reduces the number of computations done by VAT, and replaces the image matrix with a set of profile graphs that are used for the visual assessment step. Thus, reVAT overcomes the large data set problem which encumbers VAT, but presents a new problem: interpretation of the set of reVAT profile graphs becomes very difficult when the number of clusters is large, or there is significant overlap between groups of objects in the data. In this paper, we propose a new algorithm called bigVAT which (i) solves the large data problem suffered by VAT, and (ii) solves the interpretation problem suffered by reVAT. bigVAT combines the quasi-ordering technique used by reVAT with an image display of the set of profile graphs displaying the clustering tendency information with a VAT-like image. Several numerical examples are given to illustrate and support the new technique.