Instance discovery and schema matching with applications to biological deep web data integration

  • Authors:
  • Tantan Liu;Fan Wang;Gagan Agrawal

  • Affiliations:
  • Department of Computer Science and Engineering, Ohio State University, Columbus, OH;Department of Computer Science and Engineering, Ohio State University, Columbus, OH;Department of Computer Science and Engineering, Ohio State University, Columbus, OH

  • Venue:
  • DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information provided by the query interfaces themselves, as well as from the obtained output pages of the related data sources, by query probing using dynamically identified input instances. Then, using a hierarchical representation of schemas and by applying clustering techniques, we are able to generate schema matches. We show the effectiveness of our technique while integrating 24 query interfaces.