Enhancing density-based clustering: Parameter reduction and outlier detection

Authors:
Carmelo Cassisi;Alfredo Ferro;Rosalba Giugno;Giuseppe Pigola;Alfredo Pulvirenti
Affiliations:
University of Catania, Department of Mathematics and Computer Science, Viale A. Doria, 6 95125 Catania, Italy;University of Catania, Department of Mathematics and Computer Science, Viale A. Doria, 6 95125 Catania, Italy;University of Catania, Department of Mathematics and Computer Science, Viale A. Doria, 6 95125 Catania, Italy;University of Catania, Department of Mathematics and Computer Science, Viale A. Doria, 6 95125 Catania, Italy;University of Catania, Department of Mathematics and Computer Science, Viale A. Doria, 6 95125 Catania, Italy
Venue:
Information Systems
Year:
2013

Citing 19
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Influence sets based on reverse nearest neighbor queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Cure: an efficient clustering algorithm for large databases

Information Systems
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Parameterless Method for Efficiently Discovering Clusters of Arbitrary Shape in Large Datasets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Simple Yet Effective Data Clustering Algorithm

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Automatic extraction of clusters from hierarchical clustering representations

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
DBStrata: a system for density-based clustering and outlier detection based on stratification

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a widely used unsupervised data mining technique. It allows to identify structures in collections of objects by grouping them into classes, named clusters, in such a way that similarity of objects within any cluster is maximized and similarity of objects belonging to different clusters is minimized. In density-based clustering, a cluster is defined as a connected dense component and grows in the direction driven by the density. The basic structure of density-based clustering presents some common drawbacks: (i) parameters have to be set; (ii) the behavior of the algorithm is sensitive to the density of the starting object; and (iii) adjacent clusters of different densities could not be properly identified. In this paper, we address all the above problems. Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a space with one more dimension. It performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets.