Where to crawl next for focused crawlers

Authors:
Yuki Uemura;Tsuyoshi Itokawa;Teruaki Kitasuka;Masayoshi Aritsugi
Affiliations:
Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan;Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan;Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan;Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan
Venue:
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Year:
2010

Citing 13
Cited 1

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
A General Evaluation Framework for Topical Crawlers

Information Retrieval
Crawling a country: better strategies than breadth-first for web page ordering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
Accurate and efficient crawling for relevant websites

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
RankMass crawler: a crawler with high personalized pagerank coverage guarantee

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

A novel focused crawler based on breadcrumb navigation

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since WWW provides a large amount of data, it is useful for innovative and creative activities of human beings to retrieve interesting and useful information effectively and efficiently from WWW. In this paper, we attempt to propose a focused crawler for individual activities. We develop an algorithm for deciding where to crawl next for focused crawlers, by integrating the concept of PageRank into the decision. We empirically evaluate our proposal in terms of precision and target recall. Some results show that our system can give good target recall performance regardless of topics on which the crawler system focuses.