A multiple-resolution method for edge-centric data clustering

  • Authors:
  • Scott Epter;Mukkai Krishnamoorthy

  • Affiliations:
  • Department of Computer Science, Rensselaer Polytechnic Institute;Department of Computer Science, Rensselaer Polytechnic Institute

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent works in spatial data clustering view the input data set in terms of inter-point edge lengths rather than the points themselves. Cluster detection in such a system is a matter of finding connected paths of edges whose weight is no greater than some user input threshold or cutoff value. The SMTIN algorithm[9] is one such system that uses Delaunay triangulation to compute the set of nearest neighbor edges quickly and efficiently. Experiments demonstrate a substantial performance and accuracy improvement using SMTIN in comparison to other clustering systems.The resolution of the clusters discovered in the SMTIN system is directly related to the choice of a cutoff threshold, which makes SMTIN perform poorly for input sets with clusters at multiple resolutions. In this work we introduce an edge-centric clustering method that detects clusters at multiple resolutions. Our algorithm detects differences in density among groups of points and uses multiple cutoff points in order to account for clusters at different resolutions. One of the main benefits of the multi-resolution approach of our system is the ability to accurately cluster points that other systems would consider to be noise. Experiments indicate a substantial improvement in the clustering quality of our system in comparison to SMTIN as well as the removal of the requirement of an input distance-threshold, achieved with comparable theoretical as well as actual runtime performance. We present promising directions for this new algorithm.