Instance-based schema matching for web databases by domain-specific query probing

  • Authors:
  • Jiying Wang;Ji-Rong Wen;Fred Lochovsky;Wei-Ying Ma

  • Affiliations:
  • Computer Science Department, Hong Kong Univ. of Science and Technology, Hong Kong;Information Management & System, Group Microsoft Research Asia, Beijing, China;Computer Science, Department Hong Kong Univ. of Science and Technology, Hong Kong;Information Management & System, Group Microsoft Research Asia, Beijing, China

  • Venue:
  • VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a Web database that dynamically provides information in response to user queries, two distinct schemas, interface schema (the schema users can query) and result schema (the schema users can browse), are presented to users. Each partially reflects the actual schema of the Web database. Most previous work only studied the problem of schema matching across query interfaces of Web databases. In this paper, we propose a novel schema model that distinguishes the interface and the result schema of a Web database in a specific domain. In this model, we address two significant Web database schema-matching problems: intra-site and inter-site. The first problem is crucial in automatically extracting data from Web databases, while the second problem plays a significant role in meta-retrieving and integrating data from different Web databases. We also investigate a unified solution to the two problems based on query probing and instance-based schema matching techniques. Using the model, a cross validation technique is also proposed to improve the accuracy of the schema matching. Our experiments on real Web databases demonstrate that the two problems can be solved simultaneously with high precision and recall.