Algorithms for clustering data
Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Interactive exploration of very large relational datasets through 3D dynamic projections
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Geometric methods and applications: for computer science and engineering
Geometric methods and applications: for computer science and engineering
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Visual exploration of large data sets
Communications of the ACM
Visualizing multi-dimensional clusters, trends, and outliers using star coordinates
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster validity methods: part I
ACM SIGMOD Record
HD-Eye: Visual Mining of High-Dimensional Data
IEEE Computer Graphics and Applications
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Inventing discovery tools: combining information visualization with data mining
Information Visualization
Validating and Refining Clusters via Visual Rendering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cluster rendering of skewed datasets via visualization
Proceedings of the 2003 ACM symposium on Applied computing
A visual framework invites human into the clustering process
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
iVIBRATE: Interactive visualization-based framework for clustering large datasets
ACM Transactions on Information Systems (TOIS)
Exploiting parallelism to support scalable hierarchical clustering
Journal of the American Society for Information Science and Technology
Visualization and clustering of crowd video content in MPCA subspace
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.01 |
With the rapid increase of data in many areas, clustering on large datasets has become an important problem in data analysis. Since cluster analysis is a highly iterative process, cluster analysis on large datasets prefers short iteration on a relatively small representative set. Thus, a two-phase framework "sampling/summarization - iterative cluster analysis" is often applied in practice. Since the clustering result only labels the small representative set, there are problems with extending the result to the entire large dataset, which are almost ignored by the traditional clustering research. This extending is often named as labeling process. Labeling irregular shaped clusters, distinguishing outliers and extending cluster boundary are the main problems in this stage. We address these problems and propose a visualization-based approach to dealing with them precisely. This approach partially involves human into the process of defining and refining the structure "ClusterMap". Based on this structure, the ClusterMap algorithm scans the large dataset to adapt the boundary extension and generate the cluster labels for the entire dataset. Experimental result shows that ClusterMap can preserve cluster quality considerably with low computational cost, compared to the distance-comparison-based labeling algorithms.