Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A retargetable, ultra-fast instruction set simulator
DATE '99 Proceedings of the conference on Design, automation and test in Europe
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A universal technique for fast and flexible instruction-set architecture simulation
Proceedings of the 39th annual Design Automation Conference
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Asim: A Performance Model Framework
Computer
Instruction set compiled simulation: a technique for fast and flexible instruction set simulation
Proceedings of the 40th annual Design Automation Conference
Strata: A Software Dynamic Translation Infrastructure
Strata: A Software Dynamic Translation Infrastructure
Accelerating Architectural Simulation by Parallel Execution of Trace Samples
Accelerating Architectural Simulation by Parallel Execution of Trace Samples
Automatic Synthesis of High-Speed Processor Simulators
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
DynamoSim: a trace-based dynamically compiled instruction set simulator
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation
IEEE Computer Architecture Letters
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance
Proceedings of the International Symposium on Code Generation and Optimization
ISORC '07 Proceedings of the 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
HySim: a fast simulation framework for embedded software development
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A fast and generic hybrid simulation approach using C virtual machine
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing the java piped i/o stream library for performance
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Scratchpad memory management in a multitasking environment
EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
Dynamic and adaptive SPM management for a multi-task environment
Journal of Systems Architecture: the EUROMICRO Journal
Device-architecture co-optimization of STT-RAM based memory for low power embedded systems
Proceedings of the International Conference on Computer-Aided Design
MCEmu: A Framework for Software Development and Performance Analysis of Multicore Systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A real-time, energy-efficient system software suite for heterogeneous multicore platforms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Boosting instruction set simulator performance with parallel block optimisation and replacement
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Proceedings of the 50th Annual Design Automation Conference
A cycle-approximate, mixed-ISA simulator for the KAHRISMA architecture
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Performance and power profiling for emulated Android systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Innovations in Systems and Software Engineering
Hi-index | 0.00 |
There have been strong demands for a fast and cycle-accurate virtual platforms in the embedded systems area where developers can do meaningful software development including performance debugging in the context of the entire platform. In this paper, we describe the design and implementation of a fast and cycle-accurate architecture simulator called FaCSim as a first step towards such a virtual platform. FacSim accurately models the ARM9E-S processor core and ARM926EJ-S processor's memory subsystem. It accurately simulates exceptions and interrupts to enable whole-system simulation including the OS. Since it is implemented in a modular manner in C++, it can be easily extended with other system components by subclassing or adding new classes. FaCSim is based on an interpretive simulation technique to provide flexibility, yet achieving high speed. It enables fast cycle-accurate architecture simulation by means of three mechanisms. First, it computes elapsed cycles in each pipeline stage as a chunk and incrementally adds it up to advance the core clock instead of performing cycle-by-cycle simulation. Second, it uses a basic-block cache that caches decoded instructions at the basic-block level. Finally, it is parallelized to exploit multicore systems that are available everywhere these days. Using 21 applications from the EEMBC benchmark suite, FaCSim's accuracy is validated against the ARM926EJ-S development board from ARM, and is accurate in a ±7% error margin. Due to basic-block level caching and parallelization, FaCSim is, on average, more than three times faster than ARMulator and more than six times faster than SimpleScalar.