Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
On the design of a learning crawler for topical resource discovery
ACM Transactions on Information Systems (TOIS)
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
What's there and what's not?: focused crawling for missing documents in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
Topic-specific crawling on the web with the measurements of the relevancy context graph
Information Systems - Special issue: The semantic web and web services
Structure-driven crawler generation by example
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Profile-Based Focused Crawler for Social Media-Sharing Websites
ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
Hi-index | 0.00 |
Recent years have transformed the Web from a Web of content to a Web of applications and social content. Thus, it has become crucial to be able to tap on this social aspect of the Web whenever possible, in addition to its content, particularly for focused crawling. In this paper, we present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing web sites without assuming any privileged access to the internal private databases of such websites, nor any requirement for the existence of APIs for the extraction of social data. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and OPIC crawlers, when crawling the flickr web site for two different topics.