Enhancing grid-density based clustering for high dimensional data

Authors:
Yanchang Zhao;Jie Cao;Chengqi Zhang;Shichao Zhang
Affiliations:
Centrelink, Australia;Jiangsu Provincial Key Laboratory of E-business, Nanjing University of Finance and Economics, Nanjing, 210003, P.R. China;Centre for Quantum Computation and Intelligent Systems, Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia;College of CS & IT, Guangxi Normal University, Guilin, China
Venue:
Journal of Systems and Software
Year:
2011

Citing 25
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Techniques of Cluster Algorithms in Data Mining

Data Mining and Knowledge Discovery
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Model-based evaluation of clustering validation measures

Pattern Recognition
Robust projected clustering

Knowledge and Information Systems
Clustering multidimensional sequences in spatial and temporal databases

Knowledge and Information Systems
Characterization and evaluation of similarity measures for pairs of clusterings

Knowledge and Information Systems
AGRID: an efficient algorithm for clustering large high-dimensional datasets

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an enhanced grid-density based approach for clustering high dimensional data. Our technique takes objects (or points) as atomic units in which the size requirement to cells is waived without losing clustering accuracy. For efficiency, a new partitioning is developed to make the number of cells smoothly adjustable; a concept of the ith-order neighbors is defined for avoiding considering the exponential number of neighboring cells; and a novel density compensation is proposed for improving the clustering accuracy and quality. We experimentally evaluate our approach and demonstrate that our algorithm significantly improves the clustering accuracy and quality.