GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases

Authors:
A. H. Pilevar;M. Sukumar
Affiliations:
DOS in Computer Science, University of Mysore, Mysore, India;IT Department, S.J. College of Engineering, Mysore, India
Venue:
Pattern Recognition Letters
Year:
2005

Citing 13
Cited 6

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
STING+: An Approach to Active Spatial Data Mining

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A novel genetic algorithm for automatic clustering

Pattern Recognition Letters
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

An adaptive crossover-imaged clustering algorithm

SMO'07 Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization
An axis-shifted crossover-imaged clustering algorithm

WSEAS TRANSACTIONS on SYSTEMS
A deflected grid-based algorithm for clustering analysis

WSEAS Transactions on Computers
A semi-supervised clustering algorithm based on rough reduction

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Grid-based clustering algorithm based on intersecting partition and density estimation

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Subspace clustering of high-dimensional data: an evolutionary approach

Applied Computational Intelligence and Soft Computing

Quantified Score

Hi-index	0.10

Visualization

Abstract

Spatial clustering, which groups similar spatial objects into classes, is an important component of spatial data mining [Han and Kamber, Data Mining: Concepts and Techniques, 2000]. Due to its immense applications in various areas, spatial clustering has been highly active topic in data mining researches, with fruitful, scalable clustering methods developed recently. These spatial clustering methods can be classified into four categories: partitioning method, hierarchical method, density-based method and grid-based method. Clustering large data sets of high dimensionality has always been a serious challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data with very large number of records or data sets with very high number of dimensions. This new clustering method GCHL (a Grid-Clustering algorithm for High-dimensional very Large spatial databases) combines a novel density-grid based clustering with axis-parallel partitioning strategy to identify areas of high density in the input data space. The algorithm work as well in the feature space of any data set. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, capability of discovering concave/deeper and convex/higher regions, their robustness to outlier and noise, and GCHL excellent scalability.