Focused web crawler with revisit policy
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Freshness tuning in focused crawler
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size of the web. Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. But it also has problems. The major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem we design a focused crawler (we call it HAWK) that not only uses content of web page to improve page relevance, but also uses link structure to improve the coverage of a specific topic.