A technique for reducing synchronization overhead in large scale multiprocessors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Performance prediction tools for Cedar: a multiprocessor supercomputer
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Practical Parallel Band Triangular System Solvers
ACM Transactions on Mathematical Software (TOMS)
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Structure of Computers and Computations
Structure of Computers and Computations
Automatic program restructuring for high-speed computation
CONPAR '81 Proceedings of the Conference on Analysing Problem Classes and Programming for Parallel Computing
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
A VLSI-Based I/O Formatting Device
IEEE Transactions on Computers
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
IEEE Transactions on Computers
High-Speed Multiprocessors and Compilation Techniques
IEEE Transactions on Computers
Parallelism and Representation Problems in Distributed Systems
IEEE Transactions on Computers
Computer
Computer
STARAN parallel processor system hardware
AFIPS '74 Proceedings of the May 6-10, 1974, national computer conference and exposition
An overview of the Texas reconfigurable array computer
AFIPS '80 Proceedings of the May 19-22, 1980, national computer conference
A study of I/O behavior of perfect benchmarks on a multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Hi-index | 0.00 |
Previous models of program speedup on parallel architectures tend to ignore I/O activity and other important issues. In this paper we derive analytic speedup models including I/O activities. We show that ignoring I/O yields conservative speedup results. We explore the effectiveness of using hardware format conversion units in multiprocessors [33]. We prove that hardware parallel format conversion loses its edge over software parallel format conversion if the ratio of the number of processors to I/O bandwidth increases. For a given number of processors, program speedup is more sensitive to the available I/O bandwidth rather than the format conversion speed. Ninety-one Fortran programs are used in various experiments to verify our models and conclusions. Most of the programs are I/O bound. Our empirical results show that including I/O activity improves the speedup factor for 78 percent of the programs, and 18 percent of the programs are sped up only due to faster I/O activities. For a serial machine, using hardware format conversion units designed in [13] reduces program execution time by an average factor of three. The software format conversion speed used is obtained from direct measurements on an IBM 4341 running CMS and a CDC Cyber 175 running NOS. For multiprocessor systems a factor of eight increase in the processors to I/O bandwidth ratio reduces the effectiveness of hardware format conversion to an average factor of 1.36.