Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Synthesis: an efficient implementation of fundamental operating system services
Synthesis: an efficient implementation of fundamental operating system services
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
EEL: machine-independent executable editing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Efficient instruction cache simulation and execution profiling with a threaded-code interpreter
Proceedings of the 29th conference on Winter simulation
DIGITAL FX!32: combining emulation and binary translation
Digital Technical Journal
Fast, effective code generation in a just-in-time Java compiler
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Direct execution models of processor behavior and performance
WSC '87 Proceedings of the 19th conference on Winter simulation
Workload characterization of emerging computer applications
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Asim: A Performance Model Framework
Computer
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
LaTTe: A Java VM Just-in-Time Compiler with Fast and Efficient Register Allocation
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Tango introduction and tutorial
Tango introduction and tutorial
Picking Statistically Valid and Early Simulation Points
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Constructing portable compiled instruction-set simulators: an ADL-driven approach
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Design and Implementation of aWorkload Specific Simulator
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
A fast and generic hybrid simulation approach using C virtual machine
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
FaCSim: a fast and cycle-accurate architecture simulator for embedded systems
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Hybrid-compiled simulation: An efficient technique for instruction-set architecture simulation
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the Conference on Design, Automation and Test in Europe
A high-parallelism distributed scheduling mechanism for multi-core instruction-set simulation
Proceedings of the 48th Design Automation Conference
The Java Virtual Machine in retargetable, high-performance instruction set simulation
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
An Extended SystemC Framework for Efficient HW/SW Co-Simulation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A distributed timing synchronization technique for parallel multi-core instruction-set simulation
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Hi-index | 0.00 |
Microprocessor simulators are very popular in research and teaching environments. For example, functional simulators are often used to perform architectural studies, to fast-forward over uninteresting code, to generate program traces, and to warm up tables before switching to a more detailed but slower simulator. Unfortunately, most portable functional simulators are on the order of 100 times slower than native execution. This paper describes a set of novel techniques and optimizations to synthesize portable functional simulators that are only 6.6times slower on average (16 times in the worst case) than native execution and 19 times faster than SimpleScalar's sim-fast on the SPECcpu2000 programs. When simulating a memory hierarchy, the synthesized code is 2.6 times faster than the equivalent ATOM code. Our fully automated synthesis approach works without access to source/assembly code or debug information. It generates C code, integrates optional user-provided code, performs unwanted-code removal, preserves basic blocks, generates low-overhead profiles, employs a simple heuristic to determine potential jump targets, only compiles important instructions, and utilizes mixed-mode execution, i.e., it interleaves compiled and interpreted simulation to maximize performance.