Lessons from characterizating the input/output behavior of parallel scientific applications
Performance Evaluation - Special issue on tools for performance evaluation
A case study in application I/O on Linux clusters
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Effective File-I/O Bandwidth Benchmark
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Investigation of leading HPC I/O performance using a scientific-application derived benchmark
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
GIGA+: scalable directories for shared file systems
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Accelerating large-scale data exploration through data diffusion
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
International Journal of Computational Science and Engineering
Scale and concurrency of GIGA+: file system directories with millions of files
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
AME: an anyscale many-task computing engine
Proceedings of the 6th workshop on Workflows in support of large-scale science
Design and analysis of data management in scalable parallel scripting
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Parallelizing the execution of sequential scripts
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Many scientific applications can be efficiently expressed with the parallel scripting (many-task computing, MTC) paradigm. These applications are typically composed of several stages of computation, with tasks in different stages coupled by a shared file system abstraction. However, we often see poor performance when running these applications on large scale computers due to the applications' frequency and volume of filesystem I/O and the absence of appropriate optimizations in the context of parallel scripting applications. In this paper, we show the capability of existing large scale computers to run parallel scripting applications by first defining the MTC envelope and then evaluating the envelope by benchmarking a suite of shared filesystem performance metrics. We also seek to determine the origin of the performance bottleneck by profiling the parallel scripting applications' I/O behavior and mapping the I/O operations to the MTC envelope. We show an example shared filesystem envelope and present a method to predict the I/O performance given the applications' level of I/O concurrency and I/O amount. This work is instrumental in guiding the development of parallel scripting applications to make efficient use of existing large scale computers, and to evaluate performance improvements in the hardware/software stack that will better facilitate parallel scripting applications.