Incrementally improving dataspaces based on user feedback

  • Authors:
  • Khalid Belhajjame;Norman W. Paton;Suzanne M. Embury;Alvaro A. A. Fernandes;Cornelia Hedeler

  • Affiliations:
  • School of Computer Science, University of Manchester, Oxford Road, Manchester, United Kingdom;School of Computer Science, University of Manchester, Oxford Road, Manchester, United Kingdom;School of Computer Science, University of Manchester, Oxford Road, Manchester, United Kingdom;School of Computer Science, University of Manchester, Oxford Road, Manchester, United Kingdom;School of Computer Science, University of Manchester, Oxford Road, Manchester, United Kingdom

  • Venue:
  • Information Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

One aspect of the vision of dataspaces has been articulated as providing various benefits of classical data integration with reduced up-front costs. In this paper, we present techniques that aim to support schema mapping specification through interaction with end users in a pay-as-you-go fashion. In particular, we show how schema mappings, that are obtained automatically using existing matching and mapping generation techniques, can be annotated with metrics estimating their fitness to user requirements using feedback on query results obtained from end users. Using the annotations computed on the basis of user feedback, and given user requirements in terms of precision and recall, we present a method for selecting the set of mappings that produce results meeting the stated requirements. In doing so, we cast mapping selection as an optimization problem. Feedback may reveal that the quality of schema mappings is poor. We show how mapping annotations can be used to support the derivation of better quality mappings from existing mappings through refinement. An evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement. User feedback can also be used to annotate the results of the queries that the user poses against an integration schema. We show how estimates for precision and recall can be computed for such queries. We also investigate the problem of propagating feedback about the results of (integration) queries down to the mappings used to populate the base relations in the integration schema.