ClusterSculptor: A Visual Analytics Tool for High-Dimensional Data

Authors:
Eun Ju Nam;Yiping Han;Klaus Mueller;Alla Zelenyuk;Dan Imre
Affiliations:
Stony Brook University. email: ejnam@cs.sunysb.edu;Stony Brook University. email: yiping@cs.sunysb.edu;Stony Brook University. email: mueller@cs.sunysb.edu;Pacific Northwest National Lab. email: alla.zelenyuk@pnl.gov;Imre Consulting. email: dimre2b@charter.net
Venue:
VAST '07 Proceedings of the 2007 IEEE Symposium on Visual Analytics Science and Technology
Year:
2007

Citing 0
Cited 6

DataMeadow: a visual canvas for analysis of large-scale multivariate data

Information Visualization - Special issue on visual analytics science and technology
Visual Verification of Hypotheses

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
Visually driven analysis of movement data by progressive clustering

Information Visualization
Knowledge Assisted Visualization: A high-dimensional feature clustering approach to support knowledge-assisted visualization

Computers and Graphics
Iterative visual clustering for unstructured text mining

ISB '10 Proceedings of the International Symposium on Biocomputing
Hypermoval: interactive visual validation of regression models for real-time simulation

EuroVis'10 Proceedings of the 12th Eurographics / IEEE - VGTC conference on Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster analysis (CA) is a powerful strategy for the exploration of high-dimensional data in the absence of a-priori hypotheses or data classification models, and the results of CA can then be used to form such models. But even though formal models and classification rules may not exist in these data exploration scenarios, domain scientists and experts generally have a vast amount of non-compiled knowledge and intuition that they can bring to bear in this effort. In CA, there are various popular mechanisms to generate the clusters, however, the results from their non-supervised deployment rarely fully agree with this expert knowledge and intuition. To this end, our paper describes a comprehensive and intuitive framework to aid scientists in the derivation of classification hierarchies in CA, using k-means as the overall clustering engine, but allowing them to tune its parameters interactively based on a non-distorted compact visual presentation of the inherent characteristics of the data in high-dimensional space. These include cluster geometry, composition, spatial relations to neighbors, and others. In essence, we provide all the tools necessary for a high-dimensional activity we call cluster sculpting, and the evolving hierarchy can then be viewed in a space-efficient radial dendrogram. We demonstrate our system in the context of the mining and classification of a large collection of millions of data items of aerosol mass spectra, but our framework readily applies to any high-dimensional CA scenario.