Characterization of evaluation metrics in topical web crawling based on genetic algorithm

  • Authors:
  • Tao Peng;Wanli Zuo;Yilin Liu

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China

  • Venue:
  • ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. A topic driven crawler chooses the best URLs to pursue during web crawling. It is difficult to evaluate what URLs downloaded are the best. This paper presents some important metrics and an evaluation function for ranking URLs about pages relevance. We also discuss an approach to evaluate the function based on GA. GA evolving process can discover the best combination of the metrics' weights. Avoiding misleading the result by a single topic, this paper presents a method which characterization of the metrics' combination be extracted by mining frequent patterns. Extracting features adopts a novel FP-tree structure and FP-growth mining method based on FP-tree without candidate generation. The experiment shows that the performance is exciting, especially about a popular topic.