An algorithm for discovering clusters of different densities or shapes in noisy data sets

Authors:
Fereshte Khani;Mohmmad Javad Hosseini;Ahmad Ali Abin;Hamid Beigy
Affiliations:
Sharif University of Technology, Iran;Sharif University of Technology, Iran;Sharif University of Technology, Iran;Sharif University of Technology, Iran
Venue:
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Year:
2013

Citing 6
Cited 0

Algorithms for proximity problems in higher dimensions

Computational Geometry: Theory and Applications
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
ST-DBSCAN: An algorithm for clustering spatial-temporal data

Data & Knowledge Engineering
Fast Construction of k-Nearest Neighbor Graphs for Point Clouds

IEEE Transactions on Visualization and Computer Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In clustering spatial data, we are given a set of points in Rn and the objective is to find the clusters (representing spatial objects) in the set of points. Finding clusters with different shapes, sizes, and densities in data with noise and potentially outliers is a challenging task. This problem is especially studied in machine learning community and has lots of applications. We present a novel clustering technique, which can solve mentioned issues considerably. In the proposed algorithm, we let the structure of the data set itself find the clusters, this is done by having points actively send and receive feedbacks to each other. The idea of the proposed method is to transform the input data set into a graph by adding edges between points that belong to the same cluster, so as connected components correspond to clusters, whereas points in different clusters are almost disconnected. At the start, our algorithm creates a preliminary graph and tries to improve it iteratively. In order to build the graph (add more edges), each point sends feedback to its neighborhood points. The neighborhoods and the feedback to be sent are determined by investigating the received feedbacks. This process continues until a stable graph is created. Henceforth, the clusters are formed by post-processing the constructed graph. Our algorithm is intuitive, easy to state and analyze, and does not need to have lots of parameter tuning. Experimental results show that our proposed algorithm outperforms existing related methods in this area.