The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Crawling the web: discovery and maintenance of large-scale web data
Crawling the web: discovery and maintenance of large-scale web data
Average-clicks: a new measure of distance on the World Wide Web
Journal of Intelligent Information Systems - Special issue on web intelligence
Scheduling Algorithms for Web Crawling
LA-WEBMEDIA '04 Proceedings of the WebMedia & LA-Web 2004 Joint Conference 10th Brazilian Symposium on Multimedia and the Web 2nd Latin American Web Congress
WWW '05 Proceedings of the 14th international conference on World Wide Web
The discoverability of the web
Proceedings of the 16th international conference on World Wide Web
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
RankMass crawler: a crawler with high personalized pagerank coverage guarantee
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A three-year study on the freshness of web search engine databases
Journal of Information Science
FICA: A novel intelligent crawling algorithm based on reinforcement learning
Web Intelligence and Agent Systems
Hi-index | 0.00 |
Crawler is a main component of search engines. In search engines, crawler part is responsible for discovering and downloading web pages. No search engine can cover whole of the web, thus it has to focus on the most valuable web pages. Several Crawling algorithms like PageRank, OPIC and FICA have been proposed, but they have low throughput. To overcome the problem, we propose a new crawling algorithm, called FICA+ which is easy to implement. In FICA+, importances of pages are determined based on the logarithmic distance and weight of the incoming links. To evaluate FICA+ we use web graph of university of California, Berkeley. Experimental result shows that our algorithm outperforms other crawling algorithms in discovering highly important pages.