Clustering Uncertain Data Via K-Medoids

  • Authors:
  • Francesco Gullo;Giovanni Ponti;Andrea Tagarelli

  • Affiliations:
  • DEIS, University of Calabria, Rende, Italy I87036;DEIS, University of Calabria, Rende, Italy I87036;DEIS, University of Calabria, Rende, Italy I87036

  • Venue:
  • SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Uncertain data are usually represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In the context of uncertain data management, there has been a growing interest in clustering uncertain data. In particular, the classic K-means clustering algorithm has been recently adapted to handle uncertain data. However, the centroid-based partitional clustering approach used in the adapted K-means presents two major weaknesses that are related to: (i)an accuracy issue, since cluster centroids are computed as deterministic objects using the expected values of the pdfs of the clustered objects; and, (ii)an efficiency issue, since the expected distance between uncertain objects and cluster centroids is computationally expensive.In this paper, we address the problem of clustering uncertain data by proposing a K-medoids-based algorithm, called UK-medoids, which is designed to overcome the above issues. In particular, our UK-medoids algorithm employs distance functions properly defined for uncertain objects, and exploits a K-medoids scheme. Experiments have shown that UK-medoids outperforms existing algorithms from an accuracy viewpoint while achieving reasonably good efficiency.