Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Persona: A Contextualized and Personalized Web Search
HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3
Ontology-focused crawling of Web documents
Proceedings of the 2003 ACM symposium on Applied computing
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Using ODP metadata to personalize search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Geographically focused collaborative crawling
Proceedings of the 15th international conference on World Wide Web
A Method for Focused Crawling Using Combination of Link Structure and Content Similarity
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Hi-index | 0.09 |
Focused crawling is increasingly seen as a solution to address the scalability limitations of existing general-purpose search engines, by traversing the Web to only gather pages that are relevant to a specific topic. How to predict the relevance of the unvisited pages pointed to by candidate URLs in the crawling frontier to a given topic is a key issue in the design of focused crawlers. In this paper, we propose a novel approach based on multiple relevance prediction strategies to address this problem. For cross-language crawling, we first introduce a hierarchical taxonomy to describe topics in both English and Chinese. We then present a formal description of the relevance predicting process and discuss four strategies that make use of page contents, anchor texts, URL addresses and link types of Web pages, respectively, to evaluate the relevance more accurately, in which we propose a particular strategy using Chinese URL addresses to estimate the relevance of cross-language Web pages. Finally, we get a new focused crawling algorithm (FCMRPS, Focused Crawling based on Multiple Relevance Prediction Strategies) based on the combination of these strategies and Shark-Search, which is a classic focused crawling algorithm. Experiments show that the FCMRPS is more effective than the traditional algorithms, namely Breadth-First, Best-First and Shark-Search, in terms of precision and sum of information.