CloudVista: visual cluster exploration for extreme scale data in the cloud

Authors:
Keke Chen;Huiqi Xu;Fengguang Tian;Shumin Guo
Affiliations:
Ohio Center of Excellence in Knowledge Enabled Computing, Department of Computer Science and Engineering, Wright State University, Dayton, OH;Ohio Center of Excellence in Knowledge Enabled Computing, Department of Computer Science and Engineering, Wright State University, Dayton, OH;Ohio Center of Excellence in Knowledge Enabled Computing, Department of Computer Science and Engineering, Wright State University, Dayton, OH;Ohio Center of Excellence in Knowledge Enabled Computing, Department of Computer Science and Engineering, Wright State University, Dayton, OH
Venue:
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Year:
2011

Citing 17
Cited 1

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Visualizing multi-dimensional clusters, trends, and outliers using star coordinates

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
HD-Eye: Visual Mining of High-Dimensional Data

IEEE Computer Graphics and Applications
Interactively Exploring Hierarchical Clustering Results

Computer
Multidimensional detective

INFOVIS '97 Proceedings of the 1997 IEEE Symposium on Information Visualization (InfoVis '97)
Movement as an Aid to Understanding Graphs

IV '03 Proceedings of the Seventh International Conference on Information Visualization
VISTA: validating and refining clusters via visualization

Information Visualization
iVIBRATE: Interactive visualization-based framework for clustering large datasets

ACM Transactions on Information Systems (TOIS)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce
Client + cloud: evaluating seamless architectures for visual data analytics in the ocean sciences

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
PEGASUS: mining peta-scale graphs

Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)

CloudVista: interactive and economical visual cluster analysis for big data in the cloud

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of efficient and high-quality clustering of extreme scale datasets with complex clustering structures continues to be one of the most challenging data analysis problems. An innovate use of data cloud would provide unique opportunity to address this challenge. In this paper, we propose the Cloud-Vista framework to address (1) the problems caused by using sampling in the existing approaches and (2) the problems with the latency caused by cloud-side processing on interactive cluster visualization. The CloudVista framework aims to explore the entire large data stored in the cloud with the help of the data structure visual frame and the previously developed VISTA visualization model. The latency of processing large data is addressed by the RandGen algorithm that generates a series of related visual frames in the cloud without user's intervention, and a hierarchical exploration model supported by cloud-side subset processing. Experimental study shows this framework is effective and efficient for visually exploring clustering structures for extreme scale datasets stored in the cloud.