BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Influence sets based on reverse nearest neighbor queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Cure: an efficient clustering algorithm for large databases
Information Systems
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Parameterless Method for Efficiently Discovering Clusters of Arbitrary Shape in Large Datasets
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Simple Yet Effective Data Clustering Algorithm
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Automatic extraction of clusters from hierarchical clustering representations
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
DBStrata: a system for density-based clustering and outlier detection based on stratification
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Ranking outliers using symmetric neighborhood relationship
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
Clustering is a widely used unsupervised data mining technique. It allows to identify structures in collections of objects by grouping them into classes, named clusters, in such a way that similarity of objects within any cluster is maximized and similarity of objects belonging to different clusters is minimized. In density-based clustering, a cluster is defined as a connected dense component and grows in the direction driven by the density. The basic structure of density-based clustering presents some common drawbacks: (i) parameters have to be set; (ii) the behavior of the algorithm is sensitive to the density of the starting object; and (iii) adjacent clusters of different densities could not be properly identified. In this paper, we address all the above problems. Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a space with one more dimension. It performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets.