Maximally joining probabilistic data

Authors:
Benny Kimelfeld;Yehoshua Sagiv
Affiliations:
The Hebrew University of Jerusalem;The Hebrew University of Jerusalem
Venue:
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2007

Citing 19
Cited 19

On generating all maximal independent sets

Information Processing Letters
Counting classes are at least as hard as the polynomial-time hierarchy

SIAM Journal on Computing
Outerjoins as disjunctions

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A probabilistic relational model and algebra

ACM Transactions on Database Systems (TODS)
Integrating information by outerjoins and full disjunctions (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
On the Equivalence of Database Models

Journal of the ACM (JACM)
On the Desirability of Acyclic Database Schemes

Journal of the ACM (JACM)
Degrees of acyclicity for hypergraphs and relational database schemes

Journal of the ACM (JACM)
The Management of Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
The Theory of Probabilistic Databases

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Computing full disjunctions

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An incremental algorithm for computing ranked full disjunctions

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Full disjunctions: polynomial-delay iterators in action

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An incremental algorithm for computing ranked full disjunctions

Journal of Computer and System Sciences
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
An abstract framework for generating maximal answers to queries

ICDT'05 Proceedings of the 10th international conference on Database Theory
Asymptotic conditional probabilities for conjunctive queries

ICDT'05 Proceedings of the 10th international conference on Database Theory

Matching twigs in probabilistic XML

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficiently enumerating results of keyword search over data graphs

Information Systems
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query efficiency in probabilistic XML models

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query answering techniques on uncertain and probabilistic data: tutorial summary

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Incorporating constraints in probabilistic XML

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Generating all maximal induced subgraphs for hereditary and connected-hereditary graph properties

Journal of Computer and System Sciences
Generating efficient safe query plans for probabilistic databases

Data & Knowledge Engineering
Modeling and querying probabilistic XML data

ACM SIGMOD Record
Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems (TODS)
Probabilistic data exchange

Proceedings of the 13th International Conference on Database Theory
Transducing Markov sequences

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiently computing and querying multidimensional OLAP data cubes over probabilistic relational data

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Information Sciences: an International Journal
Processing top-k join queries

Proceedings of the VLDB Endowment
Probabilistic data exchange

Journal of the ACM (JACM)
Aggregate queries on probabilistic record linkages

Proceedings of the 15th International Conference on Extending Database Technology
An embedded co-processor for accelerating window joins over uncertain data streams

Microprocessors & Microsystems
The complexity of mining maximal frequent subgraphs

Proceedings of the 32nd symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conceptually, the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result. This approach ignores the possibility that the probabilities of complete answers are too low and, hence, partial answers (with sufficiently high probabilities) become important. Therefore, we consider the semantics in which answers are maximal (i.e., have the smallest degree of incompleteness), subject tothe constraint that the probability is still above a given threshold. We investigate the complexity of joining relations under the above semantics. In contrast to the deterministic case, this approach gives rise to two different enumeration problems. The first is finding all maximal sets of tuples that are join consistent, connected and have a joint probability above the threshold. The second is computing all maximal tuples that are answers of partial joins and have a probability above the threshold. Both problems are tractable under data complexity. We also consider query-and-data complexity, which rules out as efficient the following naive algorithm: compute all partial answers and then choose the maximal ones among those with probabilities above the threshold. We give efficient algorithms for several, important special cases. We also show that, in general, the first problem is NP-hard whereas the secondis #P-hard.