SmartCrawl: a new strategy for the exploration of the hidden web

  • Authors:
  • Augusto de Carvalho Fontes;Fábio Soares Silva

  • Affiliations:
  • Universidade Tiradentes;Universidade Tiradentes

  • Venue:
  • Proceedings of the 6th annual ACM international workshop on Web information and data management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The way current search engines work leaves a large amount of information available in the World Wide Web outside their catalogues. This is due to the fact that crawlers work by following hyperlinks and a few other references and ignore HTML forms. In this paper, we propose a search engine prototype that can retrieve information behind HTML forms by automatically generating queries for them. We describe the architecture, some implementation details and an experiment that proves that the information is not in fact indexed by current search engines.