The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
UC: a set-based language for data-parallel programming
Journal of Parallel and Distributed Computing
The Vesta parallel file system
ACM Transactions on Computer Systems (TOCS)
The galley parallel file system
ICS '96 Proceedings of the 10th international conference on Supercomputing
Reducing synchronization overhead in parallel simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Parallelized Direct Execution Simulation of Message-Passing Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Transparent implementation of conservative algorithms in parallel simulation languages
WSC '93 Proceedings of the 25th conference on Winter simulation
Poems: end-to-end performance design of large parallel adaptive computational systems
Proceedings of the 1st international workshop on Software and performance
MPI-SIM: using parallel simulation to evaluate MPI programs
Proceedings of the 30th conference on Winter simulation
Predictive analysis of a wavefront application using LogGP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance prediction of large parallel applications using parallel simulations
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Asynchronous Parallel Simulation of Parallel Programs
IEEE Transactions on Software Engineering
Parallel simulation of parallel file systems and I/O programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel Simulation of Data Parallel Programs
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
Compositional Development of Performance Models in Poems
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
Data access in distributed simulations of multi-agent systems
Journal of Systems and Software
New techniques for simulating high performance MPI applications on large storage networks
The Journal of Supercomputing
LogGOPSim: simulating large-scale applications in the LogGOPS model
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Using automated performance modeling to find scalability bugs in complex codes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Accurate and efficient simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete-event simulation. This paper describes MPI-SIM, a direct execution-driven parallel simulator designed to predict the performance of existing MPI and MPI-IO application. MPI-SIM can be used to predict the performance of these programs as a function of architectural characteristics, including number of processors, message communication latencies, caching algorithms, and alternative implementations of collective I/O operations. Results are presented, which show the use of MPI-SIM in performing a scalability study of real-world applications. The benchmarks chosen for the study include Sweep3D, one of the ASCI benchmarks, and BTIO, an I/O-intensive benchmark from the NAS Parallel Benchmark suite. MPI-SIM is shown to accurately and efficiently predict the performance of Sweep3D running on an Origin 2000. It is also used to demonstrate the impact of the number of I/O nodes on BTIO's performance.