Solutions and query rewriting in data exchange

  • Authors:
  • Marcelo Arenas;Pablo Barceló;Ronald Fagin;Leonid Libkin

  • Affiliations:
  • Department of Computer Science, Pontificia Universidad Católica, Av. Vicuña Mackenna 4860, Santiago, Chile;Department of Computer Science, Universidad de Chile, Avda. Blanco Encalada 2120, Santiago, Chile;IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120-6099, United States;Laboratory for Foundations of Computer Science, University of Edinburgh, Informatics Forum, Crichton Street, Edinburgh, EH8 9AB, UK

  • Venue:
  • Information and Computation
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema. Given a source instance, there may be many solutions - target instances that satisfy the constraints of the data exchange problem. Previous work has identified two classes of desirable solutions: canonical universal solutions, and their cores. Query answering in data exchange amounts to rewriting a query over the target schema to another query that, over a materialized target instance, gives the result that is semantically consistent with the source (specifically, the ''certain answers''). Basic questions are then: (1) how do these solutions compare in terms of query rewriting? and (2) how can we determine whether a query is rewritable over a particular solution? Our goal is to answer these questions. Our first main result is that, in terms of rewritability by relational algebra queries, the core is strictly less expressive than the canonical universal solution, which in turn is strictly less expressive than the source. To develop techniques for proving queries non-rewritable, we establish structural properties of solutions; in fact they are derived from the technical machinery developed in the rewritability proofs. Our second result is that both the canonical universal solution and the core preserve the local structure of the data, and that every target query rewritable over any of these solutions cannot distinguish tuples whose neighborhoods in the source are similar. This gives us a first simple tool for checking whether a query is non-rewritable over the canonical universal solution or over the core. We also show that these tools generalize to arbitrary transformations that preserve the local structure of the data, and investigate an alternative semantics of query answering in data exchange.