The grand tour: a tool for viewing multidimensional data
SIAM Journal on Scientific and Statistical Computing
Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Algorithms for clustering data
Algorithms for clustering data
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Applied multivariate techniques
Applied multivariate techniques
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Almost-constant-time clustering of arbitrary corpus subsets4
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Projections for efficient document clustering
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
DNA visual and analytic data mining
VIS '97 Proceedings of the 8th conference on Visualization '97
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Interactive Internet search through automatic clustering (poster abstract): an empirical study
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Interactive exploration of very large relational datasets through 3D dynamic projections
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Geometric methods and applications: for computer science and engineering
Geometric methods and applications: for computer science and engineering
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Visual exploration of large data sets
Communications of the ACM
Visualizing multi-dimensional clusters, trends, and outliers using star coordinates
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Extraction, Construction and Selection: A Data Mining Perspective
Modern Information Retrieval
Cluster validity methods: part I
ACM SIGMOD Record
HD-Eye: Visual Mining of High-Dimensional Data
IEEE Computer Graphics and Applications
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
INFOVIS '97 Proceedings of the 1997 IEEE Symposium on Information Visualization (InfoVis '97)
Inventing discovery tools: combining information visualization with data mining
Information Visualization
The learning-curve sampling method applied to model-based clustering
The Journal of Machine Learning Research
Exploring N-dimensional databases
VIS '90 Proceedings of the 1st conference on Visualization '90
XmdvTool: integrating multiple methods for visualizing multivariate data
VIS '94 Proceedings of the conference on Visualization '94
ClusterMap: labeling clusters in large datasets via visualization
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Distributed Approach to Node Clustering in Decentralized Peer-to-Peer Networks
IEEE Transactions on Parallel and Distributed Systems
VISTA: validating and refining clusters via visualization
Information Visualization
The "Best K" for entropy-based categorical data clustering
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Semi-supervised visual clustering for spherical coordinates systems
Proceedings of the 2008 ACM symposium on Applied computing
A Prediction-Based Visual Approach for Cluster Exploration and Cluster Validation by HOV3
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Improved Visual Clustering through Unsupervised Dimensionality Reduction
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
CloudVista: visual cluster exploration for extreme scale data in the cloud
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
iDVS: an interactive multi-document visual summarization system
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
iVisClustering: An Interactive Visual Document Clustering via Topic Modeling
Computer Graphics Forum
Hi-index | 0.00 |
With continued advances in communication network technology and sensing technology, there is astounding growth in the amount of data produced and made available through cyberspace. Efficient and high-quality clustering of large datasets continues to be one of the most important problems in large-scale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of sampling/summarization, iterative cluster analysis, and disk-labeling. There are three known problems with this framework which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling these particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third obstacle is the lack of research about issues related to effectively integrating the three phases. In this article, we describe iVIBRATE---an interactive visualization-based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster-rendering subsystem which invites human interplay into the large-scale iterative clustering process through interactive visualization, and its adaptive ClusterMap labeling subsystem which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of the special issues presented in integrating the two components and the sampling approach into a coherent framework, as well as the solutions for improving the reliability of the framework and for minimizing the amount of errors generated within the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and we experimentally evaluate the iVIBRATE approach using both real-life and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user in the clustering process and generate high-quality clustering results for large datasets.