Causality and responsibility: probabilistic queries revisited in uncertain databases

  • Authors:
  • Xiang Lian;Lei Chen

  • Affiliations:
  • The University of Texas - Pan American, Edinburg, TX;The Hong Kong University of Sci. and Tech., Kowloon, Hong Kong, China

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, due to ubiquitous data uncertainty in many real-life applications, it has become increasingly important to study efficient and effective processing of various probabilistic queries over uncertain data, which usually retrieve uncertain objects that satisfy query predicates with high probabilities. However, one annoying, yet challenging, problem is that, some probabilistic queries are very sensitive to low-quality objects in uncertain databases, and the returned query answers might miss some important results (due to low data quality). To identify both accurate query answers and those potentially low-quality objects, in this paper, we investigate the causes of query answers/non-answers from a novel angle of causality and responsibility (CR), and propose a new interpretation of probabilistic queries. Particularly, we focus on the problem of CR-based probabilistic nearest neighbor (CR-PNN) query, and design a general framework for answering CR-based queries (including CR-PNN), which can return both query answers with high confidences and low-quality objects that may potentially affect query results (for data cleaning purposes). To efficiently process CR-PNN queries, we propose effective pruning strategies to quickly filter out false alarms, and design efficient algorithms to obtain CR-PNN answers. Extensive experiments have been conducted to verify the efficiency and effectiveness of our proposed approaches.