A Parameterized Approach to Spam-Resilient Link Analysis of the Web

  • Authors:
  • James Caverlee;Steve Webb;Ling Liu;William B. Rouse

  • Affiliations:
  • Texas A&M University, College Station;Purewire, Atlanta;Georgia Institute of Technology, Atlanta;Georgia Institute of Technology, Atlanta

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Link-based analysis of the Web provides the basis for many important applications—like Web search, Web-based data mining, and Web page categorization—that bring order to the massive amount of distributed Web content. Due to the overwhelming reliance on these important applications, there is a rise in efforts to manipulate (or spam) the link structure of the Web. In this manuscript, we present a parameterized framework for link analysis of the Web that promotes spam resilience through a source-centric view of the Web. We provide a rigorous study of the set of critical parameters that can impact source-centric link analysis and propose the novel notion of influence throttling for countering the influence of link-based manipulation. Through formal analysis and a large-scale experimental study, we show how different parameter settings may impact the time complexity, stability, and spam resilience of Web link analysis. Concretely, we find that the source-centric model supports more effective and robust rankings in comparison with existing Web algorithms such as PageRank.