Focused crawling using latent semantic indexing – an application for vertical search engines

  • Authors:
  • George Almpanidis;Constantine Kotropoulos;Ioannis Pitas

  • Affiliations:
  • Department of Infomatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Infomatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Infomatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

  • Venue:
  • ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Vertical search engines and web portals are gaining ground over the general-purpose engines due to their limited size and their high precision for the domain they cover. The number of vertical portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain specific web documents. We compare its efficiency with other well-known web information retrieval techniques. Our implementation presents a different approach to focused crawling and aims to overcome the size limitations of the initial training data while maintaining a high recall/precision ratio.