Querying uncertain data with aggregate constraints

  • Authors:
  • Mohan Yang;Haixun Wang;Haiquan Chen;Wei-Shinn Ku

  • Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China;Microsoft Research Asia, Beijing, China;Auburn University, Auburn, AL, USA;Auburn University, Auburn, AL, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data uncertainty arises in many situations. A common approach to query processing uncertain data is to sample many "possible worlds" from the uncertain data and to run queries against the possible worlds. However, sampling is not a trivial task, as a randomly sampled possible world may not satisfy known constraints imposed on the data. In this paper, we focus on an important category of constraints, the aggregate constraints. An aggregate constraint is placed on a set of records instead of on a single record, and a real-life system usually has a large number of aggregate constraints. It is a challenging task to find qualified possible worlds in this scenario, since tuple by tuple sampling is extremely inefficient because it rarely leads to a qualified possible world. In this paper, we introduce two approaches for querying uncertain data with aggregate constraints: constraint aware sampling and MCMC sampling. Our experiments show that the new approaches lead to high quality query results with reasonable cost.