Algorithms and performance of load-balancing with multiple hash functions in massive content distribution

Authors:
Ye Xia;Shigang Chen;Chunglae Cho;Vivekanand Korgaonkar
Affiliations:
Computer and Information Science and Engineering Department, University of Florida, P.O. Box 116120, Gainesville, FL 32611-6120, United States;Computer and Information Science and Engineering Department, University of Florida, P.O. Box 116120, Gainesville, FL 32611-6120, United States;Computer and Information Science and Engineering Department, University of Florida, P.O. Box 116120, Gainesville, FL 32611-6120, United States;Computer and Information Science and Engineering Department, University of Florida, P.O. Box 116120, Gainesville, FL 32611-6120, United States
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2009

Citing 25
Cited 2

Accessing nearby copies of replicated objects in a distributed environment

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Expected Length of the Longest Probe Sequence in Hash Code Searching

Journal of the ACM (JACM)
Balanced Allocations

SIAM Journal on Computing
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Overlook: Scalable Name Service on an Overlay Network

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
The power of two choices in randomized load balancing

The power of two choices in randomized load balancing
Bullet: high bandwidth data dissemination using an overlay mesh

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
SplitStream: high-bandwidth multicast in cooperative environments

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Ulysses: A Robust, Low-Diameter, Low-Latency Peer-ti-Peer Network

ICNP '03 Proceedings of the 11th IEEE International Conference on Network Protocols
The effectiveness of request redirection on CDN robustness

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Maintaining high bandwidth under dynamic network conditions

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Democratizing content publication with coral

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
FastReplica: efficient large file distribution within content delivery networks

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Scale and performance in the CoBlitz large-file distribution service

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
A hierarchical internet object cache

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Winnowing: Protecting P2P systems against pollution through cooperative index filtering

Journal of Network and Computer Applications
Replica-aided load balancing in overlay networks

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a two-tier content distribution system for distributing massive content, consisting of an infrastructure content distribution network (CDN) and a large number of ordinary clients. The nodes of the infrastructure network form a structured, distributed-hash-table-based (DHT) peer-to-peer (P2P) network. Each file is first placed in the CDN, and possibly, is replicated among the infrastructure nodes depending on its popularity. In such a system, it is particularly pressing to have proper load-balancing mechanisms to relieve server or network overload. The subject of the paper is on popularity-based file replication techniques within the CDN using multiple hash functions. Our strategy is to set aside a large number of hash functions. When the demand for a file exceeds the overall capacity of the current servers, a previously unused hash function is used to obtain a new node ID where the file will be replicated. The central problems are how to choose an unused hash function when replicating a file and how to choose a used hash function when requesting the file. Our solution to the file replication problem is to choose the unused hash function with the smallest index, and our solution to the file request problem is to choose a used hash function uniformly at random. Our main contribution is that we have developed a set of distributed, robust algorithms to implement the above solutions and we have evaluated their performance. In particular, we have analyzed a random binary search algorithm for file request and a random gap removal algorithm for failure recovery.