Near-replicas of web pages detection efficient algorithm based on single MD5 fingerprint

  • Authors:
  • Wang Da-Zhen;Chen Yu-Hui

  • Affiliations:
  • Department of Computer Science, Hubei University of Technology, Wuhan, P.R.C. and Department of Information Management, Wuhan University, Wuhan, P.R.C.;Department of Computer Science, Hubei University of Technology, Wuhan, P.R.C.

  • Venue:
  • ICAI'07 Proceedings of the 8th Conference on 8th WSEAS International Conference on Automation and Information - Volume 8
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider how to efficiently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archives and in the presentation of search results, among others. Our experiments show that how common replication is on the web, and testified that our algorithm is better than others.