Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Understanding and improving computational science storage access through continuous characterization
MSST '11 Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies
Optimizing fastquery performance on lustre file system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Taming parallel I/O complexity with auto-tuning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The modern parallel I/O stack consists of several software layers with complex inter-dependencies and performance characteristics. While each layer exposes tunable parameters, it is often unclear to users how different parameter settings interact with each other and affect overall I/O performance. As a result, users often resort to default system settings, which typically obtain poor I/O bandwidth. In this research, we develop a benchmark guided auto-tuning framework for tuning the HDF5, MPI-IO, and Lustre layers on production supercomputing facilities. Our framework consists of three main components. H5Tuner uses a control file to adjust I/O parameters without modifying or recompiling the application. H5PerfCapture records performance metrics for HDF5 and MPI-IO. H5Evolve uses a genetic algorithm to explore the parameter space to determine well-performing configurations. We demonstrate I/O performance results for three HDF5 application-based benchmarks on a Sun HPC system. All the benchmarks running on 512 MPI processes perform 3X to 5.5X faster with the auto-tuned I/O parameters compared to a configuration with default system parameters.