Fast ordering of large categorical datasets for visualization

  • Authors:
  • Alina Beygelzimer;Chang-Shing Perng;Sheng Ma

  • Affiliations:
  • Department of Computer Science, University of Rochester, Rochester, NY, USA. E-mail: beygel@cs.rochester.edu;IBM T.J. Watson Research Center, Hawthorne, NY, USA. E-mail: perng@us.ibm.com;IBM T.J. Watson Research Center, Hawthorne, NY, USA. E-mail: shengma@us.ibm.com

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important issue in visualizing categorical data is how to order categorical values -- non-numeric values that do not have a natural ordering, which makes it difficult to map them to visual coordinates. The focus of this paper is on constructing categorical orderings efficiently without compromising their visual quality. In order to avoid the inherent intractability of previous discrete formulations, we consider a continuous relaxation of the problem solvable exactly using the spectral method. The latter is based on computing certain algebraic information about the similarity matrix of the dataset. However, even computing the similarity matrix itself is prohibitive for large datasets. In order to achieve greater efficiency, we propose a new multi-level scheme based on an approximate representation of the matrix. We show that it sufficient to compute only a small portion of the matrix of size linear in the number of objects, as opposed to quadratic, to guarantee a small probability of approximation error. Thus an effective ordering can be constructed without actually having to compute {\it most} pairwise similarities of values. Experiments have been conducted to qualitatively verify the effectiveness of resulting visualizations.