Supporting Load Balancing and Efficient Reorganization During System Scaling

Authors:
Feng Zhu;Xiaowei Sun;Betty Salzberg;Svein-Olaf Hvasshovd
Affiliations:
Northeastern University, Boston, MA;Northeastern University, Boston, MA;Northeastern University, Boston, MA;Norwegian Institute of Science and Technology, Trondheim, Norway
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Year:
2005

Citing 11
Cited 2

LH*—a scalable, distributed data structure

ACM Transactions on Database Systems (TODS)
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Application-Level Multicast Using Content-Addressable Networks

NGC '01 Proceedings of the Third International COST264 Workshop on Networked Group Communication
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Virtual hashing: a dynamically changing hashing

VLDB '78 Proceedings of the fourth international conference on Very Large Data Bases - Volume 4
Linear hashing: a new tool for file and table addressing

VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
Linear hashing with partial expansions

VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

New Balanced Data Allocating and Online Migrating Algorithms in Database Cluster

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
CEA: A Cyclic Expansion Algorithm for data migration in parallel video servers

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reorganization becomes constantly necessary for maintaining load balancing as distributed storage systems scale up and down. To support load balancing and efficient reorganization during system scaling, we propose a new hashing method called Prime Based Hashing (PBH) that can be used for data allocation in large distributed systems. PBH distributes objects among storage units based on residues (congruence) of hash-transformed key values modulo prime numbers. PBH provides nearly perfect load balancing, distributes objects evenly and rebalances to preserve the even distribution as system scales. At the same time it facilitates cost-effective reorganization by minimizing data migration during system scaling. Locating an object in PBH is fast through low complexity computations, requiring only the knowledge of the total number of storage units. We also propose a local data clustering method to couple with PBH to make reorganizationmore efficient. Objects are clustered according to the order of migration so that only the part of the data that needs to be migrated is scanned. In addition, we show that by storing a small amount of pre-computed information, ordering of objects for clustering can be very efficient. We demonstrate through analysis and experiments the effectiveness of our algorithms.