SCTWC: An online semi-supervised clustering approach to topical web crawlers

  • Authors:
  • Huaxiang Zhang;Jing Lu

  • Affiliations:
  • Dept. of Computer Science, Shandong Normal University, No. 88 Wenhuadong Road, Jinan 250014, Shandong, China;Dept. of Computer Science, Shandong University of Finance, Jinan 250014, Shandong, China

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Focused web crawlers collect topic-related web pages from the Internet. Using Q learning and semi-supervised learning theories, this study proposes an online semi-supervised clustering approach for topical web crawlers (SCTWC) to select the most topic-related URL to crawl based on the scores of the URLs in the unvisited list. The scores are calculated based on the fuzzy class memberships and the Q values of the unlabelled URLs. Experimental results show that SCTWC increases the crawling performance.