Eliminate redundancy in parallel search: a multi-agent coordination approach

  • Authors:
  • Jiewen Luo;Zhongzhi Shi

  • Affiliations:
  • Institute of Computing Technology,Chinese Academy of Sciences, Beijing, China and Graduate University of Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology,Chinese Academy of Sciences, Beijing, China

  • Venue:
  • PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider's crawling process. However, parallel execution often causes redundant web pages to occupy vast storing space. How to solve this problem becomes a significant issue for the design of next generation web spiders. In this paper, we employ the method from multi-agent coordination to design a parallel spider model and implement it on the multi-agent platform MAGE. Through the control of central facilitator agent, spiders can coordinate each other to avoid redundant pages in the web page search process. Experiment results demonstrate that it is very effective to improve the collection efficiency and can eliminate redundant pages with a tiny efficiency cost.