MUD: Mapping-based query processing for high-dimensional uncertain data

  • Authors:
  • Lidan Shou;Xiaolong Zhang;Gang Chen;Yuan Gao;Ke Chen

  • Affiliations:
  • College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

Many real-world applications require management of uncertain data that are modeled as objects in high-dimensional space with imprecise values. In such applications, data objects are typically associated with probability density functions. A fundamental operation on such uncertain data is the probabilistic-threshold range query (PTRQ), which retrieves the objects appearing in the query region with probabilities no less than a specified value. In this paper, we propose a novel framework called MUD for efficient processing of PTRQs on high-dimensional uncertain data. We first propose a cost-effective pruning technique based on a very simple form of probabilistic pruning information (PPI), namely the probabilistic quantiles. Then we map high-dimensional uncertain objects to a single-dimensional space, where the quantiles of uncertain objects can be indexed using the existing single-dimensional indices such as the B+-tree. Each PTRQ in the high-dimensional space is transformed into multiple range queries on the single-dimensional space and evaluated there. We also discuss a method to optimize the indexing scheme for MUD. Specifically, we formulate a mathematical model for measuring the ''pruning power'' of quantiles, and propose a dynamic programming algorithm which selects the ''best'' quantiles for mapping and indexing. We perform extensive experiments on both synthetic and real data sets. Our experimental results reveal that the MUD framework is both effective and efficient for processing PTRQs on high-dimensional uncertain data, and it can significantly outperform state-of-the-art schemes.