Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order Terms

  • Authors:
  • Simon Price;Peter Flach

  • Affiliations:
  • Department of Computer Science, University of Bristol, Bristol, United Kingdom BS8 1UB;Department of Computer Science, University of Bristol, Bristol, United Kingdom BS8 1UB

  • Venue:
  • ILP '08 Proceedings of the 18th international conference on Inductive Logic Programming
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Integrating heterogeneous data from sources as diverse as web pages, digital libraries, knowledge bases, the Semantic Web and databases is an open problem. The ultimate aim of our work is to be able to query such heterogeneous data sources as if their data were conveniently held in a single relational database. Pursuant to this aim, we propose a generalisation of joins from the relational database model to enable joins on arbitrarily complex structured data in a higher-order representation. By incorporating kernels and distances for structured data, we further extend this model to support approximate joins of heterogeneous data. We demonstrate the flexibility of our approach in the publications domain by evaluating example approximate queries on the CORA data sets, joining on types ranging from sets of co-authors through to entire publications.