Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Superfast parallel discrete event simulations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Asynchronous distributed simulation via a sequence of parallel computations
Communications of the ACM - Special issue on simulation modeling and statistical computing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Accuracy vs. performance in parallel simulation of interconnection networks
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Application of full-system simulation in exploratory system design and development
IBM Journal of Research and Development
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Simulation-based performance prediction for large parallel machines
International Journal of Parallel Programming - Special issue: The next generation software program
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Reducing the Impact of the MemoryWall for I/O Using Cache Injection
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
A framework for end-to-end simulation of high-performance computing systems
Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops
COTSon: infrastructure for full system simulation
ACM SIGOPS Operating Systems Review
An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
POWER4 system microarchitecture
IBM Journal of Research and Development
Open MPI: a flexible high performance MPI
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Towards using and improving the NAS parallel benchmarks: a parallel patterns approach
Proceedings of the 2010 Workshop on Parallel Programming Patterns
VM-based slack emulation of large-scale systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Validation and uncertainty assessment of extreme-scale HPC simulation through bayesian inference
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Instruction-level simulation is necessary to evaluate new architectures. However, single-node simulation cannot predict the behavior of a parallel application on a supercomputer. We present a scalable simulator that couples a cycle-accurate node simulator with a supercomputer network model. Our simulator executes individual instances of IBM's Mambo PowerPC simulator on hundreds of cores. We integrated a NIC emulator into Mambo and model the network instead of fully simulating it. This decouples the individual node simulators and makes our design scalable. Our simulator runs unmodified parallel message-passing applications on hundreds of nodes. We can change network and detailed node parameters, inject network traffic directly into caches, and use different policies to decide when that is an advantage. This paper describes our simulator in detail, evaluates it, and demonstrates its scalability. We show its suitability for architecture research by evaluating the impact of cache injection on parallel application performance.