ACIC: automatic cloud I/O configurator for HPC applications

Authors:
Mingliang Liu;Ye Jin;Jidong Zhai;Yan Zhai;Qianqian Shi;Xiaosong Ma;Wenguang Chen
Affiliations:
Tsinghua University and Tsinghua University in Shenzhen;North Carolina State University;Tsinghua University;University of Wisconsin-Madison;North Carolina State University and Oak Ridge National Laboratory;North Carolina State University and Oak Ridge National Laboratory;Tsinghua University and Tsinghua University in Shenzhen
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 38
Cited 0

NFS illustrated

NFS illustrated
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Minerva: An automated resource provisioning tool for large-scale storage systems

ACM Transactions on Computer Systems (TOCS)
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Data Management: NetCDF: an Interface for Scientific Data Access

IEEE Computer Graphics and Applications
Characterizing parallel file-access patterns on a large-scale multiprocessor

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications

ACM SIGMETRICS Performance Evaluation Review
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Improving MPI-IO Output Performance with Active Buffering Plus Threads

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Modeling the relative fitness of storage

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimizing system configurations quickly by guessing at the performance

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Towards an I/O tracing framework taxonomy

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
CprFS: a user-level file system to support consistent file states for checkpoint and restart

Proceedings of the 22nd annual international conference on Supercomputing
Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel I/O prefetching using MPI file caching and I/O signatures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A view of cloud computing

Communications of the ACM
Performance modeling in industry: a case study on storage virtualization

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Practical performance models for complex, popular applications

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data Sharing Options for Scientific Workflows on Amazon EC2

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Coordinating Computation and I/O in Massively Parallel Sequence Search

IEEE Transactions on Parallel and Distributed Systems
Hippodrome: running circles around storage administration

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics

Proceedings of the 2nd ACM Symposium on Cloud Computing
Pesto: online storage performance management in virtualized datacenters

Proceedings of the 2nd ACM Symposium on Cloud Computing
Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

State of the Practice Reports
One optimized I/O configuration per HPC application: leveraging the configurability of cloud

Proceedings of the Second Asia-Pacific Workshop on Systems
A performance analysis framework for identifying potential benefits in GPGPU applications

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
scc: cluster storage provisioning informed by application characteristics and SLAs

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Scalia: an adaptive scheme for efficient multi-cloud storage

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cloud has become a promising alternative to traditional HPC centers or in-house clusters. This new environment highlights the I/O bottleneck problem, typically with top-of-the-line compute instances but sub-par communication and I/O facilities. It has been observed that changing cloud I/O system configurations leads to significant variation in the performance and cost efficiency of I/O intensive HPC applications. However, storage system configuration is tedious and error-prone to do manually, even for experts. This paper proposes ACIC, which takes a given application running on a given cloud platform, and automatically searches for optimized I/O system configurations. ACIC utilizes machine learning models to perform black-box performance/cost predictions. To tackle the high-dimensional parameter exploration space unique to cloud platforms, we enable affordable, reusable, and incremental training guided by Plackett and Burman Matrices. Results with four representative applications indicate that ACIC consistently identifies near-optimal configurations among a large group of candidate settings.