Topical web crawling using weighted anchor text and web page change detection techniques

  • Authors:
  • Divakar Yadav;A. K. Sharma;J. P. Gupta

  • Affiliations:
  • JIIT University Noida, India;YMCA Faridabad, India;JIIT University Noida, India

  • Venue:
  • WSEAS Transactions on Information Science and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we discuss about the focused web crawler and relevance of anchor text as well as method for web page change detection for search engine. We have proposed a technique called weighted anchor text which uses the link structure to form the weighted directed graph of anchor texts. These weights are further used for deciding the relevance of the web pages as the indexing of these pages is done in the decreasing order of weights assigned to them. Weights are assigned for every incoming link for a node of the directed graph. We applied our algorithm on various websites and observed the results. We deduce that the algorithm can be very useful when incorporated with other existing algorithms. As Web usage has increased exponentially in the past few years. This collection of enormous web pages is highly changing and web pages show a rapid change, the degree of which varies from site to site. We discuss the relevance of change detection and then move on to explore the related work in the area. Based on this understanding we propose a new algorithm to map changes in a web page. After verifying results on various web pages we observe the relative merits of the proposed algorithm.