Top-$\boldsymbol{k}$ query processing over uncertain data in distributed environments

Authors:
Yongjiao Sun;Ye Yuan;Guoren Wang
Affiliations:
College of Information Science & Engineering, Northeastern University, Shenyang, China;College of Information Science & Engineering, Northeastern University, Shenyang, China;College of Information Science & Engineering, Northeastern University, Shenyang, China
Venue:
World Wide Web
Year:
2012

Citing 27
Cited 1

On the representation and querying of sets of possible worlds

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
GADT: A Probability Space ADT for Representing and Querying the Physical World

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Peer-to-Peer Approach to Web Service Discovery

World Wide Web
Efficient top-K query calculation in distributed networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Querying Imprecise Data in Moving Object Environments

IEEE Transactions on Knowledge and Data Engineering
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Indexing multi-dimensional uncertain data with arbitrary probability density functions

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Reducing network traffic in unstructured P2P systems using Top-k queries

Distributed and Parallel Databases
Adaptive Probabilistic Search Over Unstructured Peer-to-Peer Computing Systems

World Wide Web
A Novel Context-based Technique for Web Information Retrieval

World Wide Web
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
Efficient indexing methods for probabilistic threshold queries over uncertain data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On efficient top-k query processing in highly distributed environments

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient search for the top-k probable nearest neighbors in uncertain databases

Proceedings of the VLDB Endowment
Efficient Processing of Top-k Queries in Uncertain Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Ranking distributed probabilistic data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Reverse skyline search in uncertain databases

ACM Transactions on Database Systems (TODS)
PeerLearning: A Content-Based e-Learning Material Sharing System Based on P2P Network

World Wide Web

CIRCE: Correcting Imprecise Readings and Compressing Excrescent points for querying common patterns in uncertain sensor streams

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although top-k queries over uncertain data in centralized databases have been studied widely in recent years, it is still a challenging issue in distributed environments. In distributed environments, such as Peer-to-Peer (P2P) systems and sensor networks, there exists an inherent uncertainty on the data objects due to imprecise measurements and network delays. Therefore, it is necessary to study the problem of how to efficiently retrieve top-k uncertain data objects over distributed environments with minimum network overhead. In this paper, we propose a novel approach of processing uncertain top-k queries in large-scale P2P networks, where datasets are horizontally partitioned over peers. In our approach, each peer constructs an Uncertain Quad-Tree (UQ-Tree) index for its local uncertain data, while the P2P network constructs a global index by summarizing the local indexes. Based on the global index, we propose a spatial-pruning algorithm to reduce communication costs and a distributed-pruning algorithm to reduce computation costs. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time.