Efficient Clustering of Uncertain Data

  • Authors:
  • Wang Kay Ngai;Ben Kao;Chun Kit Chui;Reynold Cheng;Michael Chau;Kevin Y. Yip

  • Affiliations:
  • The University of Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong;Hong Kong Polytechnic University, Hong Kong;The University of Hong Kong, Hong Kong;Yale University, USA

  • Venue:
  • ICDM '06 Proceedings of the Sixth International Conference on Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of clustering data objects whose locations are uncertain. A data object is represented by an uncertainty region over which a probability density function (pdf) is defined. One method to cluster uncertain objects of this sort is to apply the UK-means algorithm, which is based on the traditional K-means algorithm. In UK-means, an object is assigned to the cluster whose representative has the smallest expected distance to the object. For arbitrary pdf, calculating the expected distance between an object and a cluster representative requires expensive integration computation. We study various pruning methods to avoid such expensive expected distance calculation.