Undue influence: eliminating the impact of link plagiarism on web search rankings

  • Authors:
  • Baoning Wu;Brian D. Davison

  • Affiliations:
  • Lehigh University, Bethlehem, PA;Lehigh University, Bethlehem, PA

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Link farm spam and replicated pages can greatly deteriorate link-based ranking algorithms such as HITS. In order to identify and neutralize link farm spam and replicated pages, we look for sufficient material copied from one page to another. In particular, we focus on the use of "complete hyperlinks" to distinguish link targets by the anchor text used. We build and analyze the bipartite graph of documents and their complete hyperlinks to find pages that share anchor text and link targets. Link farms and replicated pages are identified in this process, permitting the influence of problematic links to be reduced in a weighted adjacency matrix. Experiments and user evaluations show significant improvement in the quality of results produced using HITS-like methods.