Prefetching in Content Distribution Networks via Web Communities Identification and Outsourcing

  • Authors:
  • Antonis Sidiropoulos;George Pallis;Dimitrios Katsaros;Konstantinos Stamos;Athena Vakali;Yannis Manolopoulos

  • Affiliations:
  • Informatics Department, Aristotle University, Thessaloniki, Greece 54124;Informatics Department, Aristotle University, Thessaloniki, Greece 54124;Informatics Department, Aristotle University, Thessaloniki, Greece 54124 and Computer & Communication Engineering Department, University of Thessaly, Volos, Greece 38221;Informatics Department, Aristotle University, Thessaloniki, Greece 54124;Informatics Department, Aristotle University, Thessaloniki, Greece 54124;Informatics Department, Aristotle University, Thessaloniki, Greece 54124

  • Venue:
  • World Wide Web
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Content distribution networks (CDNs) improve scalability and reliability, by replicating content to the "edge" of the Internet. Apart from the pure networking issues of the CDNs relevant to the establishment of the infrastructure, some very crucial data management issues must be resolved to exploit the full potential of CDNs to reduce the "last mile" latencies. A very important issue is the selection of the content to be prefetched to the CDN servers. All the approaches developed so far, assume the existence of adequate content popularity statistics to drive the prefetch decisions. Such information though, is not always available, or it is extremely volatile, turning such methods problematic. To address this issue, we develop self-adaptive techniques to select the outsourced content in a CDN infrastructure, which requires no apriori knowledge of request statistics. We identify clusters of "correlated" Web pages in a site, called Web site communities, and make these communities the basic outsourcing unit. Through a detailed simulation environment, using both real and synthetic data, we show that the proposed techniques are very robust and effective in reducing the user-perceived latency, performing very close to an unfeasible, off-line policy, which has full knowledge of the content popularity.