A load-aware data placement policy on cluster file system

Authors:
Yu Wang;Jing Xing;Jin Xiong;Dan Meng
Affiliations:
National Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences and Graduate University of Chinese Academy of Sciences;National Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences;National Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences;National Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences
Venue:
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Year:
2011

Citing 15
Cited 0

A fast file system for UNIX

ACM Transactions on Computer Systems (TOCS)
Utopia: a load sharing facility for large, heterogeneous distributed computer systems

Software—Practice & Experience
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time

IEEE Transactions on Computers
Efficient, distributed data placement strategies for storage area networks (extended abstract)

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Load Balancing in Parallel Computers: Theory and Practice

Load Balancing in Parallel Computers: Theory and Practice
Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Clusters (invited paper)

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Deep scientific computing requires deep data

IBM Journal of Research and Development
A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
CRUSH: controlled, scalable, decentralized placement of replicated data

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Measurement and analysis of large-scale network file system workloads

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
BASIL: automated IO load balancing across storage devices

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a large-scale cluster system with many applications running on it, cluster-wide I/O access workload disparity and disk saturation on only some storage servers have been the severe performance bottleneck that deteriorates the system I/O performance. As a result, the system response time will increase and the throughput of the system will decrease drastically. In this paper, we present a load-aware data placement policy that will distribute data across the storage servers based on the load of each server and automatically migrate data from heavily-loaded servers to lightly-loaded servers. This policy is adaptive and self-managing. It operates without any prior knowledge of application access workload characteristics or the capabilities of storage servers. It can make full use of the aggregate disk bandwidth of all storage servers efficiently. Performance evaluation shows that our policy will improve the aggregate I/O bandwidth by 10%-20% compared with random data placement policy especially under mixed workloads.