Sample-driven schema mapping

  • Authors:
  • Li Qian;Michael J. Cafarella;H. V. Jagadish

  • Affiliations:
  • University of Michigan, Ann Arbor, USA;University of Michigan, Ann Arbor, USA;University of Michigan, Ann Arbor, USA

  • Venue:
  • SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

End-users increasingly find the need to perform light-weight, customized schema mapping. State-of-the-art tools provide powerful functions to generate schema mappings, but they usually require an in-depth understanding of the semantics of multiple schemas and their correspondences, and are thus not suitable for users who are technically unsophisticated or when a large number of mappings must be performed. We propose a system for sample-driven schema mapping. It automatically constructs schema mappings, in real time, from user-input sample target instances. Because the user does not have to provide any explicit attribute-level match information, she is isolated from the possibly complex structure and semantics of both the source schemas and the mappings. In addition, the user never has to master any operations specific to schema mappings: she simply types data values into a spreadsheet-style interface. As a result, the user can construct mappings with a much lower cognitive burden. In this paper we present Mweaver, a prototype sample-driven schema mapping system. It employs novel algorithms that enable the system to obtain desired mapping results while meeting interactive response performance requirements. We show the results of a user study that compares Mweaver with two state-of-the-art mapping tools across several mapping tasks, both real and synthetic. These suggest that the Mweaver system enables users to perform practical mapping tasks in about 1/5th the time needed by the state-of-the-art tools.