Ranking distributed probabilistic data

Authors:
Feifei Li;Ke Yi;Jeffrey Jestes
Affiliations:
Florida State University, Tallahassee, FL, USA;Hong Kong University of Science and Technology, Hong Kong, China;Florida State University, Tallahassee, FL, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 32
Cited 7

Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering
Efficient top-K query calculation in distributed networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Finding (Recently) Frequent Items in Distributed Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Holistic aggregates in a networked world: distributed tracking of approximate quantiles

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The threshold join algorithm for top-k queries in distributed sensor networks

DMSN '05 Proceedings of the 2nd international workshop on Data management for sensor networks
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Streaming pattern discovery in multiple time-series

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A geometric approach to monitoring threshold functions over distributed data streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sharing aggregate computation for distributed queries

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
From complete to incomplete information and back

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Top-k Monitoring in Wireless Sensor Networks

IEEE Transactions on Knowledge and Data Engineering
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Probabilistic top-k and ranking-aggregate queries

ACM Transactions on Database Systems (TODS)
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases

Proceedings of the VLDB Endowment
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Constraint Monitoring Using Adaptive Thresholds

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Top-k Spatial Joins of Probabilistic Objects

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ranking queries on uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic threshold join over distributed uncertain data

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Top-$\boldsymbol{k}$ query processing over uncertain data in distributed environments

World Wide Web
Efficient fuzzy ranking queries in uncertain databases

Applied Intelligence
Entity resolution for distributed probabilistic data

Distributed and Parallel Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ranking queries are essential tools to process large amounts of probabilistic data that encode exponentially many possible deterministic instances. In many applications where uncertainty and fuzzy information arise, data are collected from multiple sources in distributed, networked locations, e.g., distributed sensor fields with imprecise measurements, multiple scientific institutes with inconsistency in their scientific data. Due to the network delay and the economic cost associated with communicating large amounts of data over a network, a fundamental problem in these scenarios is to retrieve the global top-k tuples from all distributed sites with minimum communication cost. Using the well founded notion of the expected rank of each tuple across all possible worlds as the basis of ranking, this work designs both communication- and computation-efficient algorithms for retrieving the top-k tuples with the smallest ranks from distributed sites. Extensive experiments using both synthetic and real data sets confirm the efficiency and superiority of our algorithms over the straightforward approach of forwarding all data to the server.