SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
TCP Nice: a mechanism for background transfers
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
A Strategy of Load Balancing in Object Storage System
CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Quickly finding near-optimal storage designs
ACM Transactions on Computer Systems (TOCS)
Introduction to Operations Research and Revised CD-ROM 8
Introduction to Operations Research and Revised CD-ROM 8
Ursa minor: versatile cluster-based storage
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Linear hashing: a new tool for file and table addressing
VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
pMapper: power and migration cost aware application placement in virtualized systems
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
VL2: a scalable and flexible data center network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Rhizoma: a runtime for self-deploying, self-managing overlays
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
A load balancing framework for clustered storage systems
HiPC'08 Proceedings of the 15th international conference on High performance computing
On energy management, load balancing and replication
ACM SIGMOD Record
BASIL: automated IO load balancing across storage devices
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Everest: scaling down peak loads through I/O off-loading
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dynamic database replica provisioning through virtualization
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Schism: a workload-driven approach to database replication and partitioning
Proceedings of the VLDB Endowment
Zephyr: live migration in shared nothing databases for elastic cloud platforms
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Enterprise and cloud data centers are comprised of tens of thousands of servers providing petabytes of storage to a large number of users and applications. At such a scale, these storage systems face two key challenges: (a) hot-spots due to the dynamic popularity of stored objects and (b) high reconfiguration costs of data migration due to bandwidth oversubscription in the data center network. Existing storage solutions, however, are unsuitable to address these challenges because of the large number of servers and data objects. This paper describes the design, implementation, and evaluation of Ursa, which scales to a large number of storage nodes and objects and aims to minimize latency and bandwidth costs during system reconfiguration. Toward this goal, Ursa formulates an optimization problem that selects a subset of objects from hot-spot servers and performs topology-aware migration to minimize reconfiguration costs. As exact optimization is computationally expensive, we devise scalable approximation techniques for node selection and efficient divide-and-conquer computation. Our evaluation shows Ursa achieves cost-effective load balancing while scaling to large systems and is time-responsive in computing placement decisions, e.g., about two minutes for 10K nodes and 10M objects.