Fast barrier synchronization hardware
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
System Design with SystemC
The VERILOG Hardware Description Language
The VERILOG Hardware Description Language
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Reducing power while increasing performance with supercisc
ACM Transactions on Embedded Computing Systems (TECS)
TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Hi-index | 0.00 |
Simulation of new multi- and many-core systems is becoming an increasingly large bottleneck in the design process. This paper presents the ACME design automation tool flow that facilitates the hardware emulation of newly proposed large multi-core interconnection networks on FPGAs to mitigate the slowdowns of single threaded event driven simulation. The tool is aimed at computer and network architects who have knowledge of digital design but may not be comfortable with hardware description languages and synthesis flows. ACME uses a graphical entry that allows a mix of hardware components with software algorithms written in C, each with a user defined latency and throughput in terms of system cycles. ACME automatically generates a cycle accurate hardware emulator as a Xilinx Platform Studio project, which integrates synthesized hardware blocks with embedded soft-core processors that execute the C code. Our results demonstrate that for 16-core and 64-core cycle accurate packet switching networks, the FPGA-based emulation is faster than Simics-based software simulation by 2.5x and 14.6x, respectively.