Exploiting Tags and Social Profiles to Improve Focused Crawling

Authors:
Zhiyong Zhang;Olfa Nasraoui;Roelof Van Zwol
Affiliations:
-;-;-
Venue:
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2009

Citing 13
Cited 0

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
On the design of a learning crawler for topical resource discovery

ACM Transactions on Information Systems (TOIS)
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Adaptive on-line page importance computation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Topical web crawlers: Evaluating adaptive algorithms

ACM Transactions on Internet Technology (TOIT)
What's there and what's not?: focused crawling for missing documents in digital libraries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
Topic-specific crawling on the web with the measurements of the relevancy context graph

Information Systems - Special issue: The semantic web and web services
Structure-driven crawler generation by example

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Profile-Based Focused Crawler for Social Media-Sharing Websites

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have transformed the Web from a Web of content to a Web of applications and social content. Thus, it has become crucial to be able to tap on this social aspect of the Web whenever possible, in addition to its content, particularly for focused crawling. In this paper, we present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing web sites without assuming any privileged access to the internal private databases of such websites, nor any requirement for the existence of APIs for the extraction of social data. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and OPIC crawlers, when crawling the flickr web site for two different topics.