Clustering Based URL Normalization Technique for Web Mining

  • Authors:
  • Naresh Kumar Nagwani

  • Affiliations:
  • -

  • Venue:
  • ACE '10 Proceedings of the 2010 International Conference on Advances in Computer Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL’s and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.