Probabilistic top-k and ranking-aggregate queries

Authors:
Mohamed A. Soliman;Ihab F. Ilyas;Kevin Chen--Chuan Chang
Affiliations:
University of Waterloo, Ontario, Canada;University of Waterloo, Ontario, Canada;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2008

Citing 28
Cited 21

Incomplete Information in Relational Databases

Journal of the ACM (JACM)
On the representation and querying of sets of possible worlds

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Optimizing queries using materialized views: a practical, scalable solution

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
The design of an acquisitional query processor for sensor networks

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ConQuer: efficient management of inconsistent databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Supporting ad-hoc ranking aggregates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Query-time entity resolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Evaluation of probabilistic queries over imprecise data in constantly-evolving environments

Information Systems
URank: formulation and efficient evaluation of top-k queries in uncertain databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient aggregation algorithms for probabilistic data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Monte-Carlo algorithms for enumeration and reliability problems

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Efficient Processing of Top-k Queries in Uncertain Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Efficient top-k count queries over imprecise duplicates

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Ranking distributed probabilistic data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Semantics and evaluation of top-k queries in probabilistic databases

Distributed and Parallel Databases
Creating probabilistic databases from duplicated data

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
Finding maximum degrees in hidden bipartite graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient and effective similarity search over probabilistic data based on earth mover's distance

Proceedings of the VLDB Endowment
EcoTop: an economic model for dynamic processing of top-k queries in mobile-P2P networks

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Aggregation in probabilistic databases via knowledge compilation

Proceedings of the VLDB Endowment
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine

ACM Transactions on Intelligent Systems and Technology (TIST)
Exact and approximate algorithms for the most connected vertex problem

ACM Transactions on Database Systems (TODS)
Top-k best probability queries on probabilistic data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
CHIC: a combination-based recommendation system

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Top-K aggregate queries on continuous probabilistic datasets

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Prediction based quantile filter for top-k query processing in wireless sensor networks

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories and Technology
Top-k entities query processing on uncertainly fused multi-sensory data

Personal and Ubiquitous Computing
Top-k best probability queries and semantics ranking properties on probabilistic databases

Data & Knowledge Engineering
Probabilistic top-K dominating services composition with uncertain QoS

Service Oriented Computing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ranking and aggregation queries are widely used in data exploration, data analysis, and decision-making scenarios. While most of the currently proposed ranking and aggregation techniques focus on deterministic data, several emerging applications involve data that is unclean or uncertain. Ranking and aggregating uncertain (probabilistic) data raises new challenges in query semantics and processing, making conventional methods inapplicable. Furthermore, uncertainty imposes probability as a new ranking dimension that does not exist in the traditional settings. In this article we introduce new probabilistic formulations for top-k and ranking-aggregate queries in probabilistic databases. Our formulations are based on marriage of traditional top-k semantics with possible worlds semantics. In the light of these formulations, we construct a generic processing framework supporting both query types, and leveraging existing query processing and indexing capabilities in current RDBMSs. The framework encapsulates a state space model and efficient search algorithms to compute query answers. Our proposed techniques minimize the number of accessed tuples and the size of materialized search space to compute query answers. Our experimental study shows the efficiency of our techniques under different data distributions with orders of magnitude improvement over naïve methods.