Ranked Query Processing in Uncertain Databases

  • Authors:
  • Xiang Lian;Lei Chen

  • Affiliations:
  • Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, many new applications, such as sensor data monitoring and mobile device tracking, raise up the issue of uncertain data management. Compared to "certain” data, the data in the uncertain database are not exact points, which, instead, often reside within a region. In this paper, we study the ranked queries over uncertain data. In fact, ranked queries have been studied extensively in traditional database literature due to their popularity in many applications, such as decision making, recommendation raising, and data mining tasks. Many proposals have been made in order to improve the efficiency in answering ranked queries. However, the existing approaches are all based on the assumption that the underlying data are exact (or certain). Due to the intrinsic differences between uncertain and certain data, these methods are designed only for ranked queries in certain databases and cannot be applied to uncertain case directly. Motivated by this, we propose novel solutions to speed up the probabilistic ranked query (PRank) with monotonic preference functions over the uncertain database. Specifically, we introduce two effective pruning methods, spatial and probabilistic pruning, to help reduce the PRank search space. A special case of PRank with linear preference functions is also studied. Then, we seamlessly integrate these pruning heuristics into the PRank query procedure. Furthermore, we propose and tackle the PRank query processing over the join of two distinct uncertain databases. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches in answering PRank queries, in terms of both wall clock time and the number of candidates to be refined.