Query optimization by simulated annealing
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Optimization of large join queries
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Machine Characterization Based on an Abstract High-Level Language Machine
IEEE Transactions on Computers
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Concurrent file operations in a high performance
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system
ICS '95 Proceedings of the 9th international conference on Supercomputing
Flexibility and performance of parallel file systems
ACM SIGOPS Operating Systems Review
The Vesta parallel file system
ACM Transactions on Computer Systems (TOCS)
Efficient data-parallel files via automatic mode detection
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Tuning the performance of I/O-intensive parallel applications
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
ENWRICH: a compute-processor write caching scheme for parallel file systems
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Scalable message passing in Panda
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Disk-directed I/O for MIMD multiprocessors
ACM Transactions on Computer Systems (TOCS)
Compilation and communication strategies for out-of-core programs on distributed memory machines
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
The Galley parallel file system
Parallel Computing - Special double issue: parallel I/O
Enhancing disk-directed I/O for fine-grained redistribution of file data
Parallel Computing - Special double issue: parallel I/O
Optimizing collective I/O performance on parallel computers: a multisystem study
ICS '97 Proceedings of the 11th international conference on Supercomputing
Exploiting local data in parallel array I/O on a practical network of workstations
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Performance modeling for realistic storage devices
Performance modeling for realistic storage devices
Automatic parallel I/O performance optimization in Panda
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Correcting execution of distributed queries
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Performance modeling for the panda array I/O library
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Dynamic file-access characteristics of a production parallel scientific workload
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Performance of the Vesta parallel file system
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Intelligent, adaptive file system policy selection
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Automatic Parallel I/O Performance Optimization Using Genetic Algorithms
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Microbenchmarking and Performance Prediction for Parallel
Microbenchmarking and Performance Prediction for Parallel
SUMMA: Scalable Universal Matrix Multiplication Algorithm
SUMMA: Scalable Universal Matrix Multiplication Algorithm
Automatic parallel input/output performance optimization in panda
Automatic parallel input/output performance optimization in panda
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Automatic and portable performance modeling for parallel I/O: a machine-learning approach
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
Parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability, and usability goals of high-performance scientific applications has become a significant challenge. For scientists, the problem is exacerbated by the need to retune the I/O portion of their code for each supercomputer platform where they obtain access. We believe that a parallel I/O system that automatically selects efficient I/O plans for user applications is a solution to this problem. In this paper, we present such an approach for scientific applications performing collective I/O requests on multidimensional arrays. Under our approach, an optimization engine in a parallel I/O system selects high-quality I/O plans without human intervention, based on a description of the application I/O requests and the system configuration. To validate our hypothesis, we have built an optimizer that uses rule-based and randomized search-based algorithms to tune parameter settings in Panda, a parallel I/O library for multidimensional arrays. Our performance results obtained from an IBM SP using an out-of-core matrix multiplication application show that the Panda optimizer is able to select high-quality I/O plans and deliver high performance under a variety of system configurations with a small total optimization overhead.