An incremental approach to link evaluation in topic-driven web resource discovery

  • Authors:
  • Huaxiang Zhang;Shangteng Huang

  • Affiliations:
  • Information and Management School, Shandong Normal Univ., Jinan, Shandong, China;Department of Computer Science and Technology, Shanghai Jiaotong Univ., Shanghai, China

  • Venue:
  • AAIM'05 Proceedings of the First international conference on Algorithmic Applications in Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The key issue concerning with Topic-driven Web resource discovery is how to increase the harvest rate, and the crawler should learn from the crawled online information such as the Web pages and the hyperlink structure. We address this problem by endowing a crawler with an incremental learning ability, and propose an online incremental leaning algorithm (IncL). IncL can effectively utilize the multi-feature characteristics of Web pages to enhance their link evaluation accuracy and reliability. We take into account not only a hyperlink's positive source pages but also its negative source pages in its score that is used to rank the Web pages. Many current crawling approaches ignore the negative pages' effect on the page ranking. Experiments show IncL gets high harvest rate.