GoGetIt!: a tool for generating structure-driven web crawlers

  • Authors:
  • Márcio L. A. Vidal;Altigran S. da Silva;Edleno S. de Moura;João M. B. Cavalcanti

  • Affiliations:
  • Universidade Federal do Amazonas, Amazonas, Brazil;Universidade Federal do Amazonas, Amazonas, Brazil;Universidade Federal do Amazonas, Amazonas, Brazil;Universidade Federal do Amazonas, Amazonas, Brazil

  • Venue:
  • Proceedings of the 15th international conference on World Wide Web
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a Web site and generates a structure-driven crawler based on navigation patterns, sequences of patterns for the links a crawler has to follow to reach the pages structurally similar to the sample page. In the experiments we have performed, structure-driven crawlers generated by GoGetIt! were able to collect all pages that match the samples given, including those pages added after their generation.