AGRID: an efficient algorithm for clustering large high-dimensional datasets

Authors:
Zhao Yanchang;Song Junde
Affiliations:
Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, China;Electronic Engineering School, Beijing University of Posts and Telecommunications, Beijing, China
Venue:
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2003

Citing 13
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
k-Plane Clustering

Journal of Global Optimization
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Clustering High-Dimensional Data with Low-Order Neighbors

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Enhancing grid-density based clustering for high dimensional data

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a non-parametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.