Exceeding expectations and clustering uncertain data

  • Authors:
  • Sudipto Guha;Kamesh Munagala

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA, USA;Duke University, Durham, NC, USA

  • Venue:
  • Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database technology is playing an increasingly important role in understanding and solving large-scale and complex scientific and societal problems and phenomena, for instance, understanding biological networks, climate modeling, electronic markets, etc. In these settings, uncertainty or imprecise information is a pervasive issue that becomes a serious impediment to understanding and effectively utilizing such systems. Clustering is one of the key problems in this context. In this paper we focus on the problem of clustering, specifically the k-center problem. Since the problem is NP-Hard in deterministic setting, a natural avenue is to consider approximation algorithms with a bounded performance ratio. In an earlier paper Cormode and McGregor had considered certain variants of this problem, but failed to provide approximations that preserved the number of centers. In this paper we remedy the situation and provide true approximation algorithms for a wider class of these problems. However, the key aspect of this paper is to devise general techniques for optimization under uncertainty. We show that a particular formulation which uses the contribution of a random variable above its expectation is useful in this context. We believe these techniques will find wider applications in optimization under uncertainty.