SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Visualizing multi-dimensional clusters, trends, and outliers using star coordinates
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
HD-Eye: Visual Mining of High-Dimensional Data
IEEE Computer Graphics and Applications
INFOVIS '97 Proceedings of the 1997 IEEE Symposium on Information Visualization (InfoVis '97)
Movement as an Aid to Understanding Graphs
IV '03 Proceedings of the Seventh International Conference on Information Visualization
VISTA: validating and refining clusters via visualization
Information Visualization
iVIBRATE: Interactive visualization-based framework for clustering large datasets
ACM Transactions on Information Systems (TOIS)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
Hadoop: The Definitive Guide
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
Client + cloud: evaluating seamless architectures for visual data analytics in the ocean sciences
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
PEGASUS: mining peta-scale graphs
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
CloudVista: interactive and economical visual cluster analysis for big data in the cloud
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The problem of efficient and high-quality clustering of extreme scale datasets with complex clustering structures continues to be one of the most challenging data analysis problems. An innovate use of data cloud would provide unique opportunity to address this challenge. In this paper, we propose the Cloud-Vista framework to address (1) the problems caused by using sampling in the existing approaches and (2) the problems with the latency caused by cloud-side processing on interactive cluster visualization. The CloudVista framework aims to explore the entire large data stored in the cloud with the help of the data structure visual frame and the previously developed VISTA visualization model. The latency of processing large data is addressed by the RandGen algorithm that generates a series of related visual frames in the cloud without user's intervention, and a hierarchical exploration model supported by cloud-side subset processing. Experimental study shows this framework is effective and efficient for visually exploring clustering structures for extreme scale datasets stored in the cloud.