The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Modeling virtual channel flow control in hypercubes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An Analytical Performance Model for the Spidergon NoC
AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
Fast, Accurate and Detailed NoC Simulations
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Analytical router modeling for networks-on-chip performance analysis
Proceedings of the conference on Design, automation and test in Europe
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
High-speed network modeling for full system simulation
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
An analytical method for evaluating network-on-chip performance
Proceedings of the Conference on Design, Automation and Test in Europe
CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
FIST (Fast Interconnect Simulation Techniques) is a fast and simple packet latency estimator to replace time-consuming detailed Network-on-Chip (NoC) models in full-system performance simulators. FIST combines ideas from analytical network modeling and execution-driven simulation models. The main idea is to abstractly model each router as a load-delay curve and sum load-dependent delay at each visited router to obtain a packet's latency by tracking each router's load at runtime. The resulting latency estimator can accurately capture subtle load-dependent behaviors of a NoC but is much simpler than a full-blown execution-driven model. We study two variations of FIST in the context of a software-based, cycle-level simulation of a tiled chip-multiprocessor (CMP). We evaluate FIST's accuracy and performance relative to the CMP simulator's original execution-driven 2D-mesh NoC model. A static FIST approach (trained offline using uniform random synthetic traffic) achieves less than 6% average error in packet latency and up to 43x average speedup for a 16x16 mesh. A dynamic FIST approach that adds periodic online training reduces the average packet latency error to less than 2% and still maintains an average speedup of up to 18x for a 16x16 mesh. Moreover, an FPGA-based realization of FIST can simulate 2D-mesh networks up to 24x24 nodes, at 3 to 4 orders of magnitude speedup over software-based simulators.