Fast ordering of large categorical datasets for visualization

Authors:
Alina Beygelzimer;Chang-Shing Perng;Sheng Ma
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY, USA. E-mail: beygel@cs.rochester.edu;IBM T.J. Watson Research Center, Hawthorne, NY, USA. E-mail: perng@us.ibm.com;IBM T.J. Watson Research Center, Hawthorne, NY, USA. E-mail: shengma@us.ibm.com
Venue:
Intelligent Data Analysis
Year:
2002

Citing 13
Cited 0

An improved spectral graph partitioning algorithm for mapping parallel computations

SIAM Journal on Scientific Computing
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Multilevel hypergraph partitioning: application in VLSI domain

DAC '97 Proceedings of the 34th annual Design Automation Conference
A Spectral Algorithm for Seriation and the Consecutive Ones Problem

SIAM Journal on Computing
Knowledge Mining With VxInsight: Discovery ThroughInteraction

Journal of Intelligent Information Systems - Special issue on information visualization: the next frontier
Spectral partitioning with multiple eigenvectors

Discrete Applied Mathematics - Special volume on VLSI
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering in large graphs and matrices

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity Clustering of Dimensions for an Enhanced Visualization of Multidimensional Data

INFOVIS '98 Proceedings of the 1998 IEEE Symposium on Information Visualization
IVEE: an Information Visualization and Exploration Environment

INFOVIS '95 Proceedings of the 1995 IEEE Symposium on Information Visualization
XmdvTool: integrating multiple methods for visualizing multivariate data

VIS '94 Proceedings of the conference on Visualization '94

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important issue in visualizing categorical data is how to order categorical values -- non-numeric values that do not have a natural ordering, which makes it difficult to map them to visual coordinates. The focus of this paper is on constructing categorical orderings efficiently without compromising their visual quality. In order to avoid the inherent intractability of previous discrete formulations, we consider a continuous relaxation of the problem solvable exactly using the spectral method. The latter is based on computing certain algebraic information about the similarity matrix of the dataset. However, even computing the similarity matrix itself is prohibitive for large datasets. In order to achieve greater efficiency, we propose a new multi-level scheme based on an approximate representation of the matrix. We show that it sufficient to compute only a small portion of the matrix of size linear in the number of objects, as opposed to quadratic, to guarantee a small probability of approximation error. Thus an effective ordering can be constructed without actually having to compute {\it most} pairwise similarities of values. Experiments have been conducted to qualitatively verify the effectiveness of resulting visualizations.