Efficient core computation in data exchange

  • Authors:
  • Georg Gottlob;Alan Nash

  • Affiliations:
  • University of Oxford, Oxford, United Kingdom;IBM Almaden Research Center, San Jose, California

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data exchange deals with inserting data from one database into another database having a different schema. Fagin et al. [2005] have shown that among the universal solutions of a solvable data exchange problem, there exists—up to isomorphism—a unique most compact one, “the core”, and have convincingly argued that this core should be the database to be materialized. They stated as an important open problem whether the core can be computed in polynomial time in the general setting where the mapping between the source and target schemas is given by source-to-target constraints that are arbitrary tuple generating dependencies (tgds) and target constraints consisting of equality generating dependencies (egds) and a weakly acyclic set of tgds. In this article, we solve this problem by developing new methods for efficiently computing the core of a universal solution. This positive result shows that data exchange based on cores is feasible and applicable in a very general setting. In addition to our main result, we use the method of hypertree decompositions to derive new algorithms and upper bounds for query containment checking and computing cores of arbitrary database instances. We also show that computing the core of a data exchange problem is fixed-parameter intractable with respect to a number of relevant parameters, and that computing cores is NP-complete if the rule bodies of target tgds are augmented by a special predicate that distinguishes a null value from a constant data value.