Probabilistic ranking over relations

Authors:
Lijun Chang;Jeffrey Xu Yu;Lu Qin;Xuemin Lin
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The University of New South Wales, Sydney, Australia
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 33
Cited 0

Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient join processing over uncertain data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
From complete to incomplete information and back

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Self-organizing strategies for a column-store database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probabilistic top-k and ranking-aggregate queries

ACM Transactions on Database Systems (TODS)
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Efficient search for the top-k probable nearest neighbors in uncertain databases

Proceedings of the VLDB Endowment
Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations

IEEE Transactions on Knowledge and Data Engineering
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Best-Effort Top-k Query Processing Under Budgetary Constraints

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Confidence-Aware Join Algorithms

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A common database approach for OLTP and OLAP using an in-memory column database

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Dictionary-based order-preserving string compression for main memory column stores

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Self-organizing tuple reconstruction in column-stores

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
An architecture for recycling intermediates in a column-store

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Ranking distributed probabilistic data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic top-k ranking queries have been extensively studied due to the fact that data obtained can be uncertain in many real applications. A probabilistic top-k ranking query ranks objects by the interplay of score and probability, with an implicit assumption that both scores based on which objects are ranked and probabilities of the existence of the objects are stored in the same relation. We observe that in general scores and probabilities are highly possible to be stored in different relations, for example, in column-oriented DBMSs and in data warehouses. In this paper we study probabilistic top-k ranking queries when scores and probabilities are stored in different relations. We focus on reducing the join cost in probabilistic top-k ranking. We investigate two probabilistic score functions, discuss the upper/lower bounds in random access and sequential access, and provide insights on the advantages and disadvantages of random/sequential access in terms of upper/lower bounds. We also propose random, sequential, and hybrid algorithms to conduct probabilistic top-k ranking. We conducted extensive performance studies using real and synthetic datasets, and report our findings in this paper.