Indexing correlated probabilistic databases

Authors:
Bhargav Kanagal;Amol Deshpande
Affiliations:
University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 22
Cited 12

Incomplete Information in Relational Databases

Journal of the ACM (JACM)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Indexing multi-dimensional uncertain data with arbitrary probability density functions

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Optimizing mpf queries: decision support and probabilistic inference

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
From complete to incomplete information and back

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient indexing methods for probabilistic threshold queries over uncertain data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Event queries on correlated probabilistic streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases

Proceedings of the VLDB Endowment
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Access Methods for Markovian Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Focusing generalizations of belief propagation on targeted queries

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
AND/OR cutset conditioning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

PrDB: Managing Large-Scale Correlated Probabilistic Databases (Abstract)

SUM '09 Proceedings of the 3rd International Conference on Scalable Uncertainty Management
Increasing representational power and scaling reasoning in probabilistic databases

Proceedings of the 13th International Conference on Database Theory
Transducing Markov sequences

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Lineage processing over correlated probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A generic framework for handling uncertain data with local correlations

Proceedings of the VLDB Endowment
Identifying interesting instances for probabilistic skylines

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Similarity search and mining in uncertain databases

Proceedings of the VLDB Endowment
A unified approach to ranking in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic management of OCR data using an RDBMS

Proceedings of the VLDB Endowment
Local structure and determinism in probabilistic databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Approximation trade-offs in a Markovian stream warehouse: An empirical study

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With large amounts of correlated probabilistic data being generated in a wide range of application domains including sensor networks, information extraction, event detection etc., effectively managing and querying them has become an important research direction. While there is an exhaustive body of literature on querying independent probabilistic data, supporting efficient queries over large-scale, correlated databases remains a challenge. In this paper, we develop efficient data structures and indexes for supporting inference and decision support queries over such databases. Our proposed hierarchical data structure is suitable both for in-memory and disk-resident databases. We represent the correlations in the probabilistic database using a junction tree over the tuple-existence or attribute-value random variables, and use tree partitioning techniques to build an index structure over it. We show how to efficiently answer inference and aggregation queries using such an index, resulting in orders of magnitude performance benefits in most cases. In addition, we develop novel algorithms for efficiently keeping the index structure up-to-date as changes (inserts, updates) are made to the probabilistic database. We present a comprehensive experimental study illustrating the benefits of our approach to query processing in probabilistic databases.