AGRID: an efficient algorithm for clustering large high-dimensional datasets

  • Authors:
  • Zhao Yanchang;Song Junde

  • Affiliations:
  • Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, China;Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, China

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a non-parametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.