Ranking billions of web pages using diodes

  • Authors:
  • Rohit Kaul;Yeogirl Yun;Seong-Gon Kim

  • Affiliations:
  • Become, Inc., Mountain View, CA;Wisenut, Inc., Seoul, Korea;Mississippi State University, Mississippi State, MS

  • Venue:
  • Communications of the ACM - A Blind Person's Interaction with Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Introduction Because of the web's rapid growth and lack of central organization, Internet search engines play a vital role in assisting the users of the Web in retrieving relevant information out of the tens of billions of documents available. With millions of dollars of potential revenue at stake, commercial Web sites compete fiercely to be placed prominently within the first page returned by a search engine. As a result, search engine optimizers (SEOs) developed various forms of search engine spamming (or spamdexing) techniques to artificially inflate the rankings of Web pages. Link-based ranking algorithms, such as Google's PageRank, have been largely effective against most conventional spamming techniques. However, PageRank has three fundamental flaws that, when exploited aggressively, can be proven to be its Achilles' heel: First, PageRank gives a minimum guaranteed score to every page on the Web; second, it rewards all incoming links as valid endorsements; and third, it imposes no penalty for making links to low-quality pages. SEOs can take advantage of these shortcomings to the extreme by employing an Artificial Web, a collection of an extremely large number of computer-generated Web pages containing many links to only a few target pages. Each page of the Artificial Web collects the minimum PageRank and feeds it back to the target pages. Although the individual endorsements are small, the flaws of PageRank make it possible for an Artificial Web to accumulate sizable PageRank values for the target pages. The SEOs can even download a substantial portion of the real Web and modify only the destinations of the hyperlinks, thus circumventing any detection algorithms based on the quality or the size of pages. As the size of an Artificial Web can be comparable to that of the real Web, SEOs can seriously compromise the objectivity of the results that PageRank provides. Although some statistical measures can be employed to identify specific attributes associated with an Artificial Web and filter them out of search results, it is far more desirable to develop a new ranking model that is free of such exploits to begin with.