Siphon++: a hidden-webcrawler for keyword-based interfaces

  • Authors:
  • Karane Vieira;Luciano Barbosa;Juliana Freire;Altigran Silva

  • Affiliations:
  • UFAM, Manaus, Brazil;University of Utah, Salt Lake City, USA;University of Utah, Salt Lake City, USA;UFAM, Manaus, Brazil

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The hidden Web consists of data that is generally hidden behind form interfaces, and as such, it is out of reach for traditional search engines. With the goal of leveraging the high-quality information in this largely unexplored portion of the Web, in this paper, we propose a new strategy for automatically retrieving data hidden behind keyword-based form interfaces. Unlike previous approaches to this problem, our strategy adapts the query generation and selection by detecting features of the index. We describe an extensive experimental evaluation which shows that: our strategy is able to derive appropriate queries to obtain high coverage while, at the same time, avoiding the retrieval of redundant data; and it obtains higher coverage and is more efficient approaches that use a fixed strategy for query generation.