Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The BANG-Clustering System: Grid-Based Data Analysis
IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
New unsupervised clustering algorithm for large datasets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
GCA: A real-time grid-based clustering algorithm for large data set
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets
ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Scan Surveillance in Internet Networks
NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
Hi-index | 0.00 |
Grid-based clustering is particularly appropriate to deal with massive datasets. The principle is to first summarize the dataset with a grid representation, and then to merge grid cells in order to obtain clusters. All previous methods use grids with hyper-rectangular cells. In this paper we propose a flexible grid built from arbitrary shaped polyhedra for the data summary. For the clustering step, a graph is then extracted from this representation. Its edges are weighted by combining density and spatial informations. The clusters are identified as the main connected components of this graph. We present experiments indicating that our grid often leads to better results than an adaptive rectangular grid method.