A case study of trace-driven simulation for analyzing interconnection networks: cc-NUMAs with ILP processors

Authors:
V. Puente;J. M. Prellezo;C. Izu;J. A. Gregorio;R. Beivide
Affiliations:
University of Cantabria, Spain;University of Cantabria, Spain;University of Adelaide, Australia;University of Cantabria, Spain;University of Cantabria, Spain
Venue:
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Year:
2000

Citing 9
Cited 2

Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Using the SimOS machine simulator to study complex computer systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Performance benefits of virtual channels and adaptive routing: an application-driven study

ICS '97 Proceedings of the 11th international conference on Supercomputing
Low-level router design and its impact on supercomputer system performance

ICS '99 Proceedings of the 13th international conference on Supercomputing
Towards a Communication Characterization Methodology for Parallel Applications

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving the Accuracy vs. Speed Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Adaptive Bubble Router: A Design to Improve Performance in Torus Networks

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing

Interconnection network simulation using traces of MPI applications

International Journal of Parallel Programming
Exploiting temporal decoupling to accelerate trace-driven NoC emulation

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The evaluation of network performance under real application loads is carried out by detailed time-intensive and resource-intensive simulations. Moreover, the use of ILP processors in cc-NUMA architectures introduces non-deterministic memory accesses; the resulting parallel system must be modeled by a detailed execution-driven simulation, further increasing the evaluation cost. This work introduces a simulation methodology, based on network traces, to estimate the impact that a given network has on the execution time of parallel applications. This methodology allows the study of the network design space with a level of accuracy close to that of execution-driven simulations but with much shorter simulation times. The network trace, extracted from an execution-driven simulation, is processed to substitute the temporal dependencies produced by the simulated network with an estimation of the message dependencies caused by both the application and the applied cache-coherent protocol. This methodology has been tested on two direct networks, with 16 and 64 nodes respectively, running the FFT and Radix applications of the SPLASH2 suite. The trace-driven simulation is 3 to 4 times faster than the execution-driven one with an average error of 4% in total execution time.