The dichotomy of probabilistic inference for unions of conjunctive queries

  • Authors:
  • Nilesh Dalvi;Dan Suciu

  • Affiliations:
  • Facebook, Menlo Park, CA;University of Washington, Seattle, WA

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the complexity of computing a query on a probabilistic database. We consider unions of conjunctive queries, UCQ, which are equivalent to positive, existential First Order Logic sentences, and also to nonrecursive datalog programs. The tuples in the database are independent random events. We prove the following dichotomy theorem. For every UCQ query, either its probability can be computed in polynomial time in the size of the database, or is #P-hard. Our result also has applications to the problem of computing the probability of positive, Boolean expressions, and establishes a dichotomy for such classes based on their structure. For the tractable case, we give a very simple algorithm that alternates between two steps: applying the inclusion/exclusion formula, and removing one existential variable. A key and novel feature of this algorithm is that it avoids computing terms that cancel out in the inclusion/exclusion formula, in other words it only computes those terms whose Mobius function in an appropriate lattice is nonzero. We show that this simple feature is a key ingredient needed to ensure completeness. For the hardness proof, we give a reduction from the counting problem for positive, partitioned 2CNF, which is known to be #P-complete. The hardness proof is nontrivial, and combines techniques from logic, classical algebra, and analysis.