Self-supervised web search for any-k complete tuples

  • Authors:
  • Alexander Löser;Christoph Nagel;Stephan Pieper;Christoph Boden

  • Affiliations:
  • University of Technology, Berlin, Berlin;University of Technology, Berlin, Berlin;University of Technology, Berlin, Berlin;University of Technology, Berlin, Berlin

  • Venue:
  • Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common task of Web users is querying structured information from Web pages. In this paper we propose a novel query processor for systematically discovering any-k relations from Web search results with conjunctive queries. The 'any-k' phrase denotes that retrieved tuples are not ranked by the system. For realizing this interesting scenario the query processor transfers a structured query into keyword queries that are submitted to a search engine, forwards search results to relation extractors, and then combines relations into result tuples. Unfortunately, relation extractors may fail to return a relation for a result tuple. We propose a solid information theory-based approach for retrieving missing attribute values of partially retrieved relations. Moreover, user-defined data sources may not return at least k complete result tuples. To solve this problem, we extend the Eddy query processing mechanism [14] for our 'querying the Web' scenario with a continuous, adaptive routing model. The model determines the most promising next incomplete row for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our experiments demonstrate that our query processor returns complete result tuples while processing only very few Web pages.