Automated Tuning of Parallel I/O Systems: An Approach to Portable I/O Performance for Scientific Applications

Authors:
Ying Chen
Affiliations:
IBM Almaden Research Center, San Jose, CA
Venue:
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Year:
2000

Citing 36
Cited 1

Query optimization by simulated annealing

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Optimization of large join queries

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Machine Characterization Based on an Abstract High-Level Language Machine

IEEE Transactions on Computers
Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Concurrent file operations in a high performance

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system

ICS '95 Proceedings of the 9th international conference on Supercomputing
Flexibility and performance of parallel file systems

ACM SIGOPS Operating Systems Review
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
Efficient data-parallel files via automatic mode detection

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Tuning the performance of I/O-intensive parallel applications

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
ENWRICH: a compute-processor write caching scheme for parallel file systems

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Scalable message passing in Panda

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Disk-directed I/O for MIMD multiprocessors

ACM Transactions on Computer Systems (TOCS)
Compilation and communication strategies for out-of-core programs on distributed memory machines

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
The Galley parallel file system

Parallel Computing - Special double issue: parallel I/O
Enhancing disk-directed I/O for fine-grained redistribution of file data

Parallel Computing - Special double issue: parallel I/O
Optimizing collective I/O performance on parallel computers: a multisystem study

ICS '97 Proceedings of the 11th international conference on Supercomputing
Exploiting local data in parallel array I/O on a practical network of workstations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Performance modeling for realistic storage devices

Performance modeling for realistic storage devices
Automatic parallel I/O performance optimization in Panda

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Correcting execution of distributed queries

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Performance modeling for the panda array I/O library

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Dynamic file-access characteristics of a production parallel scientific workload

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Passion: Optimized I/O for Parallel Applications

Computer
Performance of the Vesta parallel file system

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Intelligent, adaptive file system policy selection

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Automatic Parallel I/O Performance Optimization Using Genetic Algorithms

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Microbenchmarking and Performance Prediction for Parallel

Microbenchmarking and Performance Prediction for Parallel
SUMMA: Scalable Universal Matrix Multiplication Algorithm

SUMMA: Scalable Universal Matrix Multiplication Algorithm
Automatic parallel input/output performance optimization in panda

Automatic parallel input/output performance optimization in panda
Idleness is not sloth

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings

Automatic and portable performance modeling for parallel I/O: a machine-learning approach

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability, and usability goals of high-performance scientific applications has become a significant challenge. For scientists, the problem is exacerbated by the need to retune the I/O portion of their code for each supercomputer platform where they obtain access. We believe that a parallel I/O system that automatically selects efficient I/O plans for user applications is a solution to this problem. In this paper, we present such an approach for scientific applications performing collective I/O requests on multidimensional arrays. Under our approach, an optimization engine in a parallel I/O system selects high-quality I/O plans without human intervention, based on a description of the application I/O requests and the system configuration. To validate our hypothesis, we have built an optimizer that uses rule-based and randomized search-based algorithms to tune parameter settings in Panda, a parallel I/O library for multidimensional arrays. Our performance results obtained from an IBM SP using an out-of-core matrix multiplication application show that the Panda optimizer is able to select high-quality I/O plans and deliver high performance under a variety of system configurations with a small total optimization overhead.