On the data mapping problem

  • Authors:
  • George H. L. Fletcher

  • Affiliations:
  • Indiana University

  • Venue:
  • On the data mapping problem
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emerging networked world promises new possibilities for information sharing and collaboration between autonomous data sources. Facilitating technologies, however, have not successfully addressed the most difficult forms of data heterogeneity which arise in these collaborations, such as differences in the structuring of data and semantic pluralism in the interpretation of data. At the heart of overcoming data heterogeneity is the data mapping problem: automating the discovery of effective mappings between autonomous structured data sources. The data mapping problem is one of the longest standing issues in data management. Fully automating the discovery of mappings is generally recognized as an "AI-complete" problem in the sense that it is as hard as the hardest problems of Artificial Intelligence. Consequently, data mapping solutions have typically focused on discovering restricted types of mappings. More robust solutions must also facilitate discovery of the richer structural and semantic transformations which inevitably arise in coordinating heterogeneous information systems. In this dissertation, we make the following contributions towards a better understanding of the data mapping problem. (1) We give a novel formal statement of the general data mapping problem and of the important special case of mapping between relational data sources. (2) We propose a generic architecture for data mapping systems and describe an instantiation of this framework in the Tupelo system. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities for relational databases. (3) We present theoretical results on several fundamental questions regarding example-driven mapping discovery in systems such as Tupelo. (4) We present a new declarative formalism for expressing dynamic relational transformations as a tool for further investigations into the relational data-metadata mapping space.