APSCAN: A parameter free algorithm for clustering

Authors:
Xiaoming Chen;Wanquan Liu;Huining Qiu;Jianhuang Lai
Affiliations:
School of Information Science and Technology, Sun Yat-Sen University, Guangzhou 510275, PR China and Department of Computing, Curtin University of Technology, Bentley, WA 6102, Australia;Department of Computing, Curtin University of Technology, Bentley, WA 6102, Australia;Department of Computing, Curtin University of Technology, Bentley, WA 6102, Australia and School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou 510275, PR China;School of Information Science and Technology, Sun Yat-Sen University, Guangzhou 510275, PR China
Venue:
Pattern Recognition Letters
Year:
2011

Citing 14
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
'1 + 1 2': Merging Distance and Density Based Clustering

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
l-DBSCAN: A Fast Hybrid Density Based Clustering Method

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
ST-DBSCAN: An algorithm for clustering spatial-temporal data

Data & Knowledge Engineering
A survey of kernel and spectral methods for clustering

Pattern Recognition
Multi-modality video shot clustering with tensor representation

Multimedia Tools and Applications
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

Pattern Recognition Letters
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Fast and robust general purpose clustering algorithms

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks
Survey of clustering algorithms

IEEE Transactions on Neural Networks

DBCAMM: A novel density based clustering algorithm via using the Mahalanobis metric

Applied Soft Computing

Quantified Score

Hi-index	0.10

Visualization

Abstract

DBSCAN is a density based clustering algorithm and its effectiveness for spatial datasets has been demonstrated in the existing literature. However, there are two distinct drawbacks for DBSCAN: (i) the performances of clustering depend on two specified parameters. One is the maximum radius of a neighborhood and the other is the minimum number of the data points contained in such neighborhood. In fact these two specified parameters define a single density. Nevertheless, without enough prior knowledge, these two parameters are difficult to be determined; (ii) with these two parameters for a single density, DBSCAN does not perform well to datasets with varying densities. The above two issues bring some difficulties in applications. To address these two problems in a systematic way, in this paper we propose a novel parameter free clustering algorithm named as APSCAN. Firstly, we utilize the Affinity Propagation (AP) algorithm to detect local densities for a dataset and generate a normalized density list. Secondly, we combine the first pair of density parameters with any other pair of density parameters in the normalized density list as input parameters for a proposed DDBSCAN (Double-Density-Based SCAN) to produce a set of clustering results. In this way, we can obtain different clustering results with varying density parameters derived from the normalized density list. Thirdly, we develop an updated rule for the results obtained by implementing the DDBSCAN with different input parameters and then synthesize these clustering results into a final result. The proposed APSCAN has two advantages: first it does not need to predefine the two parameters as required in DBSCAN and second, it not only can cluster datasets with varying densities but also preserve the nonlinear data structure for such datasets.