VISTA: validating and refining clusters via visualization

Authors:
Keke Chen;Ling Liu
Affiliations:
College of Computing, Georgia Institute of Technology, GA;College of Computing, Georgia Institute of Technology, GA
Venue:
Information Visualization
Year:
2004

Citing 26
Cited 17

Algorithms for clustering data

Algorithms for clustering data
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Applied multivariate techniques

Applied multivariate techniques
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Interactive exploration of very large relational datasets through 3D dynamic projections

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Geometric methods and applications: for computer science and engineering

Geometric methods and applications: for computer science and engineering
Visual exploration of large data sets

Communications of the ACM
Visualizing multi-dimensional clusters, trends, and outliers using star coordinates

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Visualizing Data

Visualizing Data
Machine Learning

Machine Learning
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Cluster validity methods: part I

ACM SIGMOD Record
HD-Eye: Visual Mining of High-Dimensional Data

IEEE Computer Graphics and Applications
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Non-Linear Dimensionality Reduction

Advances in Neural Information Processing Systems 5, [NIPS Conference]
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Empirical Study on the Visual Cluster Validation Method with Fastmap

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Inventing discovery tools: combining information visualization with data mining

Information Visualization
Validating and Refining Clusters via Visual Rendering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cluster rendering of skewed datasets via visualization

Proceedings of the 2003 ACM symposium on Applied computing

Hypothesis oriented cluster analysis in data mining by visualization

Proceedings of the working conference on Advanced visual interfaces
iVIBRATE: Interactive visualization-based framework for clustering large datasets

ACM Transactions on Information Systems (TOIS)
Efficiently clustering transactional data with weighted coverage density

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A dimensionality reduction algorithm and its application for interactive visualization

Journal of Visual Languages and Computing
Determining the best K for clustering transactional datasets: A coverage density-based approach

Data & Knowledge Engineering
Visual Verification of Hypotheses

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
“Best K”: critical clustering structures in categorical datasets

Knowledge and Information Systems
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams

The VLDB Journal — The International Journal on Very Large Data Bases
A Visual Method for High-Dimensional Data Cluster Exploration

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Improved Visual Clustering through Unsupervised Dimensionality Reduction

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
SCALE: a scalable framework for efficiently clustering transactional data

Data Mining and Knowledge Discovery
Enhanced visual separation of clusters by M-mapping to facilitate cluster analysis

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
CloudVista: visual cluster exploration for extreme scale data in the cloud

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
iDVS: an interactive multi-document visual summarization system

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
DClusterE: A Framework for Evaluating and Understanding Document Clustering Using Visualization

ACM Transactions on Intelligent Systems and Technology (TIST)
HOV3: an approach to visual cluster analysis

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
CloudVista: interactive and economical visual cluster analysis for big data in the cloud

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an important technique for understanding of large multi-dimensional datasets. Most of clustering research to date has been focused on developing automatic clustering algorithms and cluster validation methods. The automatic algorithms are known to work well in dealing with clusters of regular shapes, for example, compact spherical shapes, but may incur higher error rates when dealing with arbitrarily shaped clusters. Although some efforts have been devoted to addressing the problem of skewed datasets, the problem of handling clusters with irregular shapes is still in its infancy, especially in terms of dimensionality of the datasets and the precision of the clustering results considered. Not surprisingly, the statistical indices works ineffective in validating clusters of irregular shapes, too. In this paper, we address the problem of clustering and validating arbitrarily shaped clusters with a visual framework (VISTA). The main idea of the VISTA approach is to capitalize on the power of visualization and interactive feedbacks to encourage domain experts to participate in the clustering revision and clustering validation process. The VISTA system has two unique features. First, it implements a linear and reliable visualization model to interactively visualize multi-dimensional datasets in a 2D star-coordinate space. Second, it provides a rich set of user-friendly interactive rendering operations, allowing users to validate and refine the cluster structure based on their visual experience as well as their domain knowledge.