Exploiting Tags and Social Profiles to Improve Focused Crawling

  • Authors:
  • Zhiyong Zhang;Olfa Nasraoui;Roelof Van Zwol

  • Affiliations:
  • -;-;-

  • Venue:
  • WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent years have transformed the Web from a Web of content to a Web of applications and social content. Thus, it has become crucial to be able to tap on this social aspect of the Web whenever possible, in addition to its content, particularly for focused crawling. In this paper, we present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing web sites without assuming any privileged access to the internal private databases of such websites, nor any requirement for the existence of APIs for the extraction of social data. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and OPIC crawlers, when crawling the flickr web site for two different topics.