GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases

  • Authors:
  • A. H. Pilevar;M. Sukumar

  • Affiliations:
  • DOS in Computer Science, University of Mysore, Mysore, India;IT Department, S.J. College of Engineering, Mysore, India

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2005

Quantified Score

Hi-index 0.10

Visualization

Abstract

Spatial clustering, which groups similar spatial objects into classes, is an important component of spatial data mining [Han and Kamber, Data Mining: Concepts and Techniques, 2000]. Due to its immense applications in various areas, spatial clustering has been highly active topic in data mining researches, with fruitful, scalable clustering methods developed recently. These spatial clustering methods can be classified into four categories: partitioning method, hierarchical method, density-based method and grid-based method. Clustering large data sets of high dimensionality has always been a serious challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data with very large number of records or data sets with very high number of dimensions. This new clustering method GCHL (a Grid-Clustering algorithm for High-dimensional very Large spatial databases) combines a novel density-grid based clustering with axis-parallel partitioning strategy to identify areas of high density in the input data space. The algorithm work as well in the feature space of any data set. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, capability of discovering concave/deeper and convex/higher regions, their robustness to outlier and noise, and GCHL excellent scalability.