The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Interprocedural slicing using dependence graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
PROTEUS: a high-performance parallel-architecture simulator
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A distributed memory LAPSE: parallel simulation of message-passing programs
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Reducing synchronization overhead in parallel simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Optimistic simulation of parallel architectures using program executables
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Parallelized Direct Execution Simulation of Message-Passing Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Transparent implementation of conservative algorithms in parallel simulation languages
WSC '93 Proceedings of the 25th conference on Winter simulation
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Poems: end-to-end performance design of large parallel adaptive computational systems
Proceedings of the 1st international workshop on Software and performance
MPI-SIM: using parallel simulation to evaluate MPI programs
Proceedings of the 30th conference on Winter simulation
Performance prediction of large parallel applications using parallel simulations
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Asynchronous Parallel Simulation of Parallel Programs
IEEE Transactions on Software Engineering
Proceedings of the fifteenth workshop on Parallel and distributed simulation
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
IEEE Transactions on Software Engineering
An adaptive synchronization method for unpredictable communication patterns in dataparallel programs
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Parallel Simulation of Data Parallel Programs
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
FAST: A Functional Algoritm Simulation Testbed
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
International Journal of High Performance Computing Applications
Fast performance prediction of master-slave programs by partial task execution
SEPADS'05 Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems
Performance prediction of large-scale parallell system and application using macro-level simulation
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
A Simulator for Large-Scale Parallel Computer Architectures
International Journal of Distributed Systems and Technologies
Semi-automatic extraction of software skeletons for benchmarking large-scale parallel applications
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Validation and uncertainty assessment of extreme-scale HPC simulation through bayesian inference
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
In this paper, we propose and evaluate practical, automatic techniques that exploit compiler analysis to facilitate simulation of very large message-passing systems. We use compiler techniques and a compiler-synthesized static task graph model to identify the subset of the computations whose values have no significant effect on the performance of the program, and to generate symbolic estimates of the execution times of these computations. For programs with regular computation and communication patterns, this information allows us to avoid executing or simulating large portions of the computational code during the simulation. It also allows us to avoid performing some of the message data transfers, while still simulating the message performance in detail. We have used these techniques to integrate the MPI-Sim parallel simulator at UCLA with the Rice dHPF compiler infrastructure. We evaluate the accuracy and benefits of these techniques for three standard message-passing benchmarks on a wide range of problem and system sizes. The optimized simulator has errors of less than 16% compared with direct program measurement in all the cases we studied, and typically much smaller errors. Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. These dramatic savings allow us to simulate regular message-passing programs on systems and problem sizes 10 to 100 times larger than is possible with the original simulator, or other current state-of-the-art simulators.