Siphon++: a hidden-webcrawler for keyword-based interfaces

Authors:
Karane Vieira;Luciano Barbosa;Juliana Freire;Altigran Silva
Affiliations:
UFAM, Manaus, Brazil;University of Utah, Salt Lake City, USA;University of Utah, Salt Lake City, USA;UFAM, Manaus, Brazil
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 1
Cited 2

Downloading textual hidden web content through keyword queries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries

Optimal algorithms for crawling a hidden database in the web

Proceedings of the VLDB Endowment
Understanding query interfaces by statistical parsing

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The hidden Web consists of data that is generally hidden behind form interfaces, and as such, it is out of reach for traditional search engines. With the goal of leveraging the high-quality information in this largely unexplored portion of the Web, in this paper, we propose a new strategy for automatically retrieving data hidden behind keyword-based form interfaces. Unlike previous approaches to this problem, our strategy adapts the query generation and selection by detecting features of the index. We describe an extensive experimental evaluation which shows that: our strategy is able to derive appropriate queries to obtain high coverage while, at the same time, avoiding the retrieval of redundant data; and it obtains higher coverage and is more efficient approaches that use a fixed strategy for query generation.