The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Learning page-independent heuristics for extracting data from Web pages
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Finding authorities and hubs from link structures on the World Wide Web
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Web page clustering using heuristic search in the web graph
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
The work described in this paper presents a new framework for heuristic search where the task is to collect as many goals as possible within the allocated resources. We show the inadequacy of traditional distance heuristics for this type of tasks and present alternative types of heuristics that are more appropriate for multiple-goal search. In particular we introduce the yield heuristic that estimates the cost and the benefit of exploring a subtree below a search node. We present a learning algorithm for inferring the yield based on search experience. We apply our adaptive and non-adaptive multiple-goal search algorithms to the web crawling problem and show their efficiency.