MUD: Mapping-based query processing for high-dimensional uncertain data

Authors:
Lidan Shou;Xiaolong Zhang;Gang Chen;Yuan Gao;Ke Chen
Affiliations:
College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China
Venue:
Information Sciences: an International Journal
Year:
2012

Citing 43
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Querying high-dimensional data in single-dimensional space

The VLDB Journal — The International Journal on Very Large Data Bases
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Efficient join processing over uncertain data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A hyperplane based indexing technique for high-dimensional data

Information Sciences: an International Journal
Range search on multidimensional uncertain data

ACM Transactions on Database Systems (TODS)
Efficient indexing methods for probabilistic threshold queries over uncertain data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Query answering techniques on uncertain and probabilistic data: tutorial summary

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Probabilistic Group Nearest Neighbor Queries in Uncertain Databases

IEEE Transactions on Knowledge and Data Engineering
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
Efficient search for the top-k probable nearest neighbors in uncertain databases

Proceedings of the VLDB Endowment
Evaluating probability threshold k-nearest-neighbor queries over uncertain data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On High Dimensional Indexing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Top-k Spatial Joins of Probabilistic Objects

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Spatial Range Querying for Gaussian-Based Imprecise Query Objects

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data

The VLDB Journal — The International Journal on Very Large Data Bases
Reverse skyline search in uncertain databases

ACM Transactions on Database Systems (TODS)
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment
Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data

IEEE Transactions on Knowledge and Data Engineering
Superseding Nearest Neighbor Search on Uncertain Spatial Databases

IEEE Transactions on Knowledge and Data Engineering
Supporting ranking queries on uncertain and incomplete data

The VLDB Journal — The International Journal on Very Large Data Bases
Histograms and Wavelets on Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
Scalable Probabilistic Similarity Ranking in Uncertain Databases

IEEE Transactions on Knowledge and Data Engineering
Finding the least influenced set in uncertain databases

Information Systems
Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Information Sciences: an International Journal
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking queries on uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking uncertain sky: The probabilistic top-k skyline operator

Information Systems
Adaptive Cluster Distance Bounding for High-Dimensional Indexing

IEEE Transactions on Knowledge and Data Engineering
Semantics of Ranking Queries for Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
Shooting top-k stars in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Subspace Similarity Search under {\rm L}_p-Norm

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.07

Visualization

Abstract

Many real-world applications require management of uncertain data that are modeled as objects in high-dimensional space with imprecise values. In such applications, data objects are typically associated with probability density functions. A fundamental operation on such uncertain data is the probabilistic-threshold range query (PTRQ), which retrieves the objects appearing in the query region with probabilities no less than a specified value. In this paper, we propose a novel framework called MUD for efficient processing of PTRQs on high-dimensional uncertain data. We first propose a cost-effective pruning technique based on a very simple form of probabilistic pruning information (PPI), namely the probabilistic quantiles. Then we map high-dimensional uncertain objects to a single-dimensional space, where the quantiles of uncertain objects can be indexed using the existing single-dimensional indices such as the B+-tree. Each PTRQ in the high-dimensional space is transformed into multiple range queries on the single-dimensional space and evaluated there. We also discuss a method to optimize the indexing scheme for MUD. Specifically, we formulate a mathematical model for measuring the ''pruning power'' of quantiles, and propose a dynamic programming algorithm which selects the ''best'' quantiles for mapping and indexing. We perform extensive experiments on both synthetic and real data sets. Our experimental results reveal that the MUD framework is both effective and efficient for processing PTRQs on high-dimensional uncertain data, and it can significantly outperform state-of-the-art schemes.