Class visualization of high-dimensional data with applications

  • Authors:
  • Inderjit S. Dhillon;Dharmendra S. Modha;W.Scott Spangler

  • Affiliations:
  • Department of Computer Sciences, University of Texas, Austin, TX 78712-1188, USA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120-6099, USA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120-6099, USA

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2002

Quantified Score

Hi-index 0.03

Visualization

Abstract

The problem of visualizing high-dimensional data that has been categorized into various classes is considered. The goal in visualizing is to quickly absorb inter-class and intra-class relationships. Towards this end, class-preserving projections of the multidimensional data onto two-dimensional planes, which can be displayed on a computer screen, are introduced. These class-preserving projections maintain the high-dimensional class structure, and are closely related to Fisher's linear discriminants. By displaying sequences of such two-dimensional projections and by moving continuously from one projection to the next, an illusion of smooth motion through a multidimensional display can be created. Such sequences are called class tours. Furthermore, class-similarity graphs are overlaid on the two-dimensional projections to capture the distance relationships in the original high-dimensional space. The above visualization tools are illustrated on the classical Iris plant data, the ISOLET spoken letter data, and the PENDIGITS on-line handwriting data set. It is shown how the visual examination of the data can uncover latent class relationships.