Domain-oriented Deep Web Data Sources' Discovery and Identification

  • Authors:
  • Yingjun Li;Tiezheng Nie;Derong Shen;Ge Yu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • APWEB '10 Proceedings of the 2010 12th International Asia-Pacific Web Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As Deep Web contains tremendous well-structured data sources, how to integrate data sources in Deep Web has become a hotspot in current research. Accurately discovering and identifying Deep Web data sources related to a specific domain become key issues. We propose a Domain-Oriented Deep Web data source Discovery method (DO-DWD) and a novel Domain Identification strategy of Deep Web data sources (DIDW). In the discovery stage, we use machine learning algorithms and some heuristic rules to find query interfaces of the data sources; In the identification stage, we identify Deep Web data sources associated with the domain by calculating the relevance between a query interface and the domain based on semantic similarity. Finally, we have extensive experiments on a real data set showing that DO-DWD and DIDW are of high correctness and accuracy.