Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic parallel I/O performance optimization in Panda
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Performance modeling for the panda array I/O library
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Minerva: An automated resource provisioning tool for large-scale storage systems
ACM Transactions on Computer Systems (TOCS)
A computationally efficient evolutionary algorithm for real-parameter optimization
Evolutionary Computation
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Automatic Parallel I/O Performance Optimization Using Genetic Algorithms
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
VORPAL: a versatile plasma simulation code
Journal of Computational Physics
An overview of evolutionary algorithms for parameter optimization
Evolutionary Computation
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Using utility to provision storage systems
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
I/O performance challenges at leadership scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Pyevolve: a Python open-source framework for genetic algorithms
ACM SIGEVOlution
Hippodrome: running circles around storage administration
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Online Adaptive Code Generation and Tuning
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A multi-objective auto-tuning framework for parallel codes
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel I/O, analysis, and visualization of a trillion particle simulation
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for auto-tuning HDF5 applications
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
SDS: a framework for scientific data services
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Hi-index | 0.00 |
We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, and 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. We consistently demonstrate I/O write speedups between 2x and 100x for test configurations.