Tuning schema matching software using synthetic scenarios

  • Authors:
  • Mayssam Sayyadian;Yoonkyong Lee;AnHai Doan;Arnon S. Rosenthal

  • Affiliations:
  • University of Illinois;University of Illinois;University of Illinois;The MITRE Corporation

  • Venue:
  • VLDB '05 Proceedings of the 31st international conference on Very large data bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most recent schema matching systems assemble multiple components, each employing a particular matching technique. The domain user must then tune the system: select the right component to be executed and correctly adjust their numerous "knobs" (e.g., thresholds, formula coefficients). Tuning is skill- and time-intensive, but (as we show) without it the matching accuracy is significantly inferior.We describe eTuner, an approach to automatically tune schema matching systems. Given a schema S, we match S against synthetic schemas, for which the ground truth mapping is known, and find a tuning that demonstrably improves the performance of matching S against real schemas. To efficiently search the huge space of tuning configurations, eTuner works sequentially, starting with tuning the lowest level components. To increase the applicability of eTuner, we develop methods to tune a broad range of matching components. While the tuning process is completely automatic, eTuner can also exploit user assistance (whenever available) to further improve the tuning quality. We employed eTuner to tune four recently developed matching systems on several real-world domains. eTuner produced tuned matching systems that achieve higher accuracy than using the systems with currently possible tuning methods, at virtually no cost to the domain user.