Scalable load balancing in cluster storage systems

Authors:
Gae-won You;Seung-won Hwang;Navendu Jain
Affiliations:
Pohang University of Science and Technology, Republic of Korea;Pohang University of Science and Technology, Republic of Korea;Microsoft Research Redmond
Venue:
Proceedings of the 12th International Middleware Conference
Year:
2011

Citing 20
Cited 0

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
TCP Nice: a mechanism for background transfers

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
A Strategy of Load Balancing in Object Storage System

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Quickly finding near-optimal storage designs

ACM Transactions on Computer Systems (TOCS)
Introduction to Operations Research and Revised CD-ROM 8

Introduction to Operations Research and Revised CD-ROM 8
Ursa minor: versatile cluster-based storage

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Linear hashing: a new tool for file and table addressing

VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
pMapper: power and migration cost aware application placement in virtualized systems

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
VL2: a scalable and flexible data center network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Rhizoma: a runtime for self-deploying, self-managing overlays

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
A load balancing framework for clustered storage systems

HiPC'08 Proceedings of the 15th international conference on High performance computing
On energy management, load balancing and replication

ACM SIGMOD Record
BASIL: automated IO load balancing across storage devices

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Everest: scaling down peak loads through I/O off-loading

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dynamic database replica provisioning through virtualization

CloudDB '10 Proceedings of the second international workshop on Cloud data management
Schism: a workload-driven approach to database replication and partitioning

Proceedings of the VLDB Endowment
Zephyr: live migration in shared nothing databases for elastic cloud platforms

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Enterprise and cloud data centers are comprised of tens of thousands of servers providing petabytes of storage to a large number of users and applications. At such a scale, these storage systems face two key challenges: (a) hot-spots due to the dynamic popularity of stored objects and (b) high reconfiguration costs of data migration due to bandwidth oversubscription in the data center network. Existing storage solutions, however, are unsuitable to address these challenges because of the large number of servers and data objects. This paper describes the design, implementation, and evaluation of Ursa, which scales to a large number of storage nodes and objects and aims to minimize latency and bandwidth costs during system reconfiguration. Toward this goal, Ursa formulates an optimization problem that selects a subset of objects from hot-spot servers and performs topology-aware migration to minimize reconfiguration costs. As exact optimization is computationally expensive, we devise scalable approximation techniques for node selection and efficient divide-and-conquer computation. Our evaluation shows Ursa achieves cost-effective load balancing while scaling to large systems and is time-responsive in computing placement decisions, e.g., about two minutes for 10K nodes and 10M objects.