Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach

  • Authors:
  • James Caverlee;Ling Liu;Daniel Rocco

  • Affiliations:
  • Georgia Institute of Technology, College of Computing, Atlanta, USA 30332;Georgia Institute of Technology, College of Computing, Atlanta, USA 30332;Georgia Institute of Technology, College of Computing, Atlanta, USA 30332

  • Venue:
  • World Wide Web
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The escalation of deep web databases has been phenomenal over the last decade, spawning a growing interest in automated discovery of interesting relationships among available deep web databases. Unlike the "surface" web of static pages, these deep web databases provide data through a web-based query interface and account for a huge portion of all web content. This paper presents a novel source-biased approach to efficiently discover interesting relationships among web-enabled databases on the deep web. Our approach supports a relationship-centric view over a collection of deep web databases through source-biased database analysis and exploration. Our source-biased approach has three unique features: First, we develop source-biased probing techniques, which allow us to determine in very few interactions whether a target database is relevant to the source database by probing the target with very precise probes. Second, we introduce source-biased relevance metrics to evaluate the relevance of deep web databases discovered, to identify interesting types of source-biased relationships for a collection of deep web databases, and to rank them accordingly. The source-biased relationships discovered not only present value-added metadata for each deep web database but can also provide direct support for personalized relationship-centric queries. Third, but not least, we also develop a performance optimization using source-biased probing with focal terms to further improve the effectiveness of the basic source-biased model. A prototype system is designed for crawling, probing, and supporting relationship-centric queries over deep web databases using the source-biased approach. Our experiments evaluate the effectiveness of the proposed source-biased analysis and discovery model, showing that the source-biased approach outperforms query-biased probing and unbiased probing.