Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Kairos: proactive harvesting of research paper metadata from scientific conference web sites
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Hi-index | 0.00 |
Focused crawling is a critical technique for topical resource discovery on the Web. We propose a new frontier prioritizing algorithm, namely, the OTIE (On-line Topical Importance Estimation) algorithm, which efficiently and effectively combines link-based and content-based analysis to evaluate the priority of an uncrawled URL in the frontier. We then demonstrate OTIE's advantages over traditional prioritizing algorithms by real crawling experiments.