Combining text and link analysis for focused crawling-An application for vertical search engines

  • Authors:
  • G. Almpanidis;C. Kotropoulos;I. Pitas

  • Affiliations:
  • Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki GR-54124, Greece;Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki GR-54124, Greece;Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki GR-54124, Greece

  • Venue:
  • Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler self-evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain-specific web documents. Our implementation presents a different approach to focused crawling and aims to overcome the limitations imposed by the need to provide initial data for training, while maintaining a high recall/precision ratio. We compare its efficiency with other well-known web information retrieval techniques.