FICA: A Fast Intelligent Crawling Algorithm

Authors:
Ali Mohammad Zareh Bidoki;Nasser Yazdani;Pedram Ghodsnia
Affiliations:
-;-;-
Venue:
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2007

Citing 0
Cited 4

FICA: A novel intelligent crawling algorithm based on reinforcement learning

Web Intelligence and Agent Systems
Focused web crawler with revisit policy

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Freshness tuning in focused crawler

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Web page importance ranking

Advances in Data Analysis and Classification

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the proliferation and highly dynamic nature of the web, an efficient crawling and ranking algorithm for retrieving the most important pages has remained as a challenging issue. Several algorithms like PageRank [13] and OPIC [1] have been proposed. Unfortunately, they have high time complexity. In this paper, an intelligent crawling algorithm based on reinforcement learning, called FICA is proposed that models a real surfing user. The priority for crawling pages is based on a concept which we name as logarithmic distance. FICA is easy to implement and its time complexity is O(E*logV) where V and E are the number of nodes and edges in the web graph respectively. Comparison of the FICA with other proposed algorithms shows that FICA outperforms them in discovering highly important pages. Furthermore, FICA computes the importance (ranking) of each page during the crawling process. Thus, we can also use FICA as a ranking method for computation of page importance. We have used UK's web graph for our experiments.