FastFwd: an efficient hardware acceleration technique for trace-driven network-on-chip simulation

Authors:
Gummidipudi Krishnaiah;B.V.N. Silpa;Preeti Ranjan Panda;Anshul Kumar
Affiliations:
IIT Delhi, New Delhi, India;IIT Delhi, New Delhi, India;IIT Delhi, New Delhi, India;IIT Delhi, New Delhi, India
Venue:
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Year:
2010

Citing 16
Cited 1

Parallel simulation of chip-multiprocessor architectures

ACM Transactions on Modeling and Computer Simulation (TOMACS)
SPLASH: Stanford parallel applications for shared-memory*

SPLASH: Stanford parallel applications for shared-memory*
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
VPC3: a fast and effective trace-compression algorithm

Proceedings of the joint international conference on Measurement and modeling of computer systems
A Complete Network-On-Chip Emulation Framework

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Stream-Based Trace Compression

IEEE Computer Architecture Letters
Implementation analysis of NoC: a MPSoC trace-driven approach

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
A Statistical Traffic Model for On-Chip Interconnection Networks

MASCOTS '06 Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation
An efficient single-pass trace compression technique utilizing instruction streams

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fast, Accurate and Detailed NoC Simulations

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
RAMP: Research Accelerator for Multiple Processors

IEEE Micro
The FAST methodology for high-speed SoC/computer simulation

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
A-Ports: an efficient abstraction for cycle-accurate performance models on FPGAs

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Prediction and trace compression of data access addresses through nested loop recognition

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Technical Communication: SimuRed: A flit-level event-driven simulator for multicomputer network performance evaluation

Computers and Electrical Engineering

Exploiting temporal decoupling to accelerate trace-driven NoC emulation

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an efficient emulation-based technique to accelerate architecture exploration of networks-on-chip (NoCs). The large design space of NoC along with its growing complexity that results in low simulation speeds on host machines have motivated the need for hardware accelerators for speeding up the simulation. For example, simulation of applications with real life problem sizes could take weeks on a host machine. FPGA acceleration is a promising strategy for speeding up NoC simulations by several orders of magnitude. However, it is required to simulate a few billion network transactions of the application during NoC exploration, and this could still take tens of minutes even with an FPGA-based emulator. With the increasing complexity of architectures and applications, reducing emulation time is a key concern. We propose a technique, FastFwd, to minimize emulation time by efficiently identifying and eliminating redundant cycles during a trace-based NoC simulation. We have studied the implications of the additional FPGA hardware required for implementing our technique. A naïve implementation could lead to poor scalability and increase the required DRAM bandwidth, both of which ultimately impact the emulation speed negatively. We propose a hierarchical controller architecture to resolve the scalability issue, and a compressed representation of traces for mitigating the increased DRAM bandwidth requirement. Our experiments with several benchmarks have shown that the FPGA emulation with our technique reduces the average emulation time by a factor of 2 when compared to a conventional emulation.