A new implementation of the co-VAT algorithm for visual assessment of clusters in rectangular relational data

  • Authors:
  • Timothy C. Havens;James C. Bezdek;James M. Keller

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Missouri, Columbia, MO;Department of Electrical and Computer Engineering, University of Missouri, Columbia, MO;Department of Electrical and Computer Engineering, University of Missouri, Columbia, MO

  • Venue:
  • ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new implementation of the co-VAT algorithm. We assume we have an m × n matrix D, where the elements of D are pairwise dissimilarities between m row objects Or and n column objects Oc. The union of these disjoint sets are (N = m + n) objects O. Clustering tendency assessment is the process by which a data set is analyzed to determine the number(s) of clusters present. In 2007, the co-Visual Assessment of Tendency (co-VAT) algorithm was proposed for rectangular data such as these. co-VAT is a visual approach that addresses four clustering tendency questions: i) How many clusters are in the row objects Or? ii) How many clusters are in the column objects Oc? iii) How many clusters are in the union of the row and column objects Or ∪ Oc? And, iv) How many (co)-clusters are there that contain at least one of each type? co-VAT first imputes pair-wise dissimilarity values among the row objects, the square relational matrix Dr, and the column objects, the square relational matrix Dc, and then builds a larger square dissimilarity matrix Dr∪c. The clustering questions can then be addressed by using the VAT algorithm on Dr, Dc, and Dr∪c; D is reordered by shuffling the reordering indices of Dr∪c. Subsequently, the co-VAT image of D may show tendency for co-clusters (problem iv). We first discuss a different way to construct this image, and then we also extend a path-based distance transform, which is used in the iVAT algorithm, to co-VAT. The new algorithm, co-iVAT, shows dramatic improvement in the ability of co-VAT to show cluster tendency in rectangular dissimilarity data.