An Improved Shark-Search Algorithm Based on Multi-information

  • Authors:
  • Zhumin Chen;Jun Ma;Jingsheng Lei;Bo Yuan;Li Lian

  • Affiliations:
  • Shandong University, Jinan, 250061, China;Shandong University, Jinan, 250061, China;Hainan University, Haikou, 570228, China;University of Southern California, Los Angeles, CA 90088, USA;Shandong University, Jinan, 250061, China

  • Venue:
  • FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the enormous growth of World Wide Web, existing general-purpose search engines have presented much more limitations. Focused crawling is increasingly seen as a potential solution. The key of focused crawling is how to accurately predict the relevance of the unvisited Web pages pointed to by known URLs to a given topic. A formalized description of the predicting process is introduced. Then, four policies are proposed to predict the relevance of unvisited pages to a topic. Further the combinations of these policies are used to improve the Shark-Search, which is a classic focused crawling algorithm mainly based on the textual information of Web pages. A large number of experiments were carried out to identify the optimized combination and verify that the improved Shark-Search is more effective than the original one.