A novel method to find appropriate for ε for DBSCAN

Authors:
Jamshid Esmaelnejad;Jafar Habibi;Soheil Hassas Yeganeh
Affiliations:
Computer Engineering Department, Sharif University of Technology, Tehran, Iran;Computer Engineering Department, Sharif University of Technology, Tehran, Iran;Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Venue:
ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
Year:
2010

Citing 7
Cited 0

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Density-based spatial clustering in the presence of obstacles and facilitators

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A Novel Clustering Algorithm Based on Circlusters to Find Arbitrary Shaped Clusters

ICCEE '08 Proceedings of the 2008 International Conference on Computer and Electrical Engineering
DBRS: a density-based spatial clustering method with random sampling

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Towards automatic Eps calculation in density-based clustering

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
AEC algorithm: a heuristic approach to calculating density-based clustering Eps parameter

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is one of the most useful methods of data mining, in which a set of real or abstract objects are categorized into clusters. The DBSCAN clustering method, one of the most famous density based clustering methods, categorizes points in dense areas into same clusters. In DBSCAN a point is said to be dense if the ε-radius circular area around it contains at least MinPts points. To find such dense areas, region queries are fired. Two points are defined as density connected if the distance between them is less than ε and at least one of them is dense. Finally, density connected parts of the data set extracted as clusters. The significant issue of such a method is that its parameters (ε and MinPts) are very hard for a user to guess. So, it is better to remove them or to replace them with some other parameters that are simpler to estimate. In this paper, we have focused on the DBSCAN algorithm, tried to remove the ε and replace it with another parameter named ρ (Noise ratio of the data set). Using this method will not reduce the number of parameters but the ρ parameter is usually much more simpler to set than the ε. Even in some applications the user knows the noise ratio of the data set in advance. Being a relative (not absolute) measure is another advantage of ρ over ε. We have also proposed a novel visualization technique that may help users to set the ε value interactively. Also experimental results have been represented to show that our algorithm gets almost similar results to the original DBSCAN with ε set to an appropriate value.