Approximate Probabilistic Query Answering over Inconsistent Databases

  • Authors:
  • Sergio Greco;Cristian Molinaro

  • Affiliations:
  • DEIS, Univ. della Calabria, Rende, Italy 87036;DEIS, Univ. della Calabria, Rende, Italy 87036

  • Venue:
  • ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of managing and querying inconsistent databases has been deeply investigated in the last few years. Most of the approaches proposed so far rely on the notion of repair(a minimal set of delete/insert operations making the database consistent) and consistent query answer(the answer to a query is given by considering the set of `repaired' databases). Since the problem of consistent query answering is hard in the general case, most of the proposed techniques have an exponential complexity, although for special classes of constraints and queries the problem becomes polynomial. A second problem with most of the proposed approaches is that repairs do not take into account update operations (they consider delete and insert operations only).This paper presents a general framework where constraints consist of functional dependencies and queries may be expressed by positive relational algebra. The framework allows us to compute certain (i.e. tuples derivable from all or from none of the repaired databases) and uncertain query answers (i.e. tuples derivable from a proper not empty subset of the repaired databases). Each tuple in the answer is associated with a probability, which depends on the number of repaired databases from which the tuple can be derived. In the proposed framework, databases are repaired by means of update operations and repaired databases are stored by means of a "condensed" database, so that all the repaired databases can be derived by "expanding" the unique condensed database. A condensed database can be rewritten into a probabilistic database where each tuple is associated with an event (i.e. a boolean formula) and, thus, a probability value. The probabilistic query answer can be computed by querying the so obtained probabilistic database. As the complexity of querying probabilistic databases is #P-complete, approximate probabilistic answers which are computable in polynomial time are considered.