Multiple-goal search algorithms and their application to web crawling

Authors:
Dmitry Davidov;Shaul Markovitch
Affiliations:
Computer Science Department, Technion, Haifa 32000, Israel;Computer Science Department, Technion, Haifa 32000, Israel
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 11
Cited 1

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Learning page-independent heuristics for extracting data from Web pages

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
The Web as a graph

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
WebBase: a repository of Web pages

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Finding authorities and hubs from link structures on the World Wide Web

Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases

Web page clustering using heuristic search in the web graph

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The work described in this paper presents a new framework for heuristic search where the task is to collect as many goals as possible within the allocated resources. We show the inadequacy of traditional distance heuristics for this type of tasks and present alternative types of heuristics that are more appropriate for multiple-goal search. In particular we introduce the yield heuristic that estimates the cost and the benefit of exploring a subtree below a search node. We present a learning algorithm for inferring the yield based on search experience. We apply our adaptive and non-adaptive multiple-goal search algorithms to the web crawling problem and show their efficiency.