Querying and cleaning uncertain data

  • Authors:
  • Reynold Cheng

  • Affiliations:
  • Department of Computer Science, The University of Hong Kong, Hong Kong

  • Venue:
  • QuaCon'09 Proceedings of the 1st international conference on Quality of context
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The management of uncertainty in large databases has recently attracted tremendous research interest. Data uncertainty is inherent in many emerging and important applications, including locationbased services, wireless sensor networks, biometric and biological databases, and data stream applications. In these systems, it is important to manage data uncertainty carefully, in order to make correct decisions and provide high-quality services to users. To enable the development of these applications, uncertain database systems have been proposed. They consider data uncertainty as a "first-class citizen", and use generic data models to capture uncertainty, as well as provide query operators that return answers with statistical confidences. We summarize our work on uncertain databases in recent years. We explain how data uncertainty can be modeled, and present a classification of probabilistic queries (e.g., range query and nearest-neighbor query). We further study how probabilistic queries can be efficiently evaluated and indexed. We also highlight the issue of removing uncertainty under a stringent cleaning budget, with an attempt of generating high-quality probabilistic answers.