The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Talisman: fast and accurate multicomputer simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Accurate indirect branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
The YAGS branch prediction scheme
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Asim: A Performance Model Framework
Computer
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
A Design for Efficient Simulation of a Multiprocessor
MASCOTS '93 Proceedings of the International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Effects of Mispredicted-Path Execution on Branch Prediction Structures
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Design and evaluation of a multiscalar processor
Design and evaluation of a multiscalar processor
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
A serializability violation detector for shared-memory server programs
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Dynamic Verification of Sequential Consistency
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
The design and utility of the ML-RSIM system simulator
Journal of Systems Architecture: the EUROMICRO Journal
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Unichos: a full system simulator for thin client platform
Proceedings of the 2007 ACM symposium on Applied computing
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
FaCSim: a fast and cycle-accurate architecture simulator for embedded systems
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
SP-NUCA: a cost effective dynamic non-uniform cache architecture
ACM SIGARCH Computer Architecture News
Protocol offload analysis by simulation
Journal of Systems Architecture: the EUROMICRO Journal
COTSon: infrastructure for full system simulation
ACM SIGOPS Operating Systems Review
ACM SIGARCH Computer Architecture News
A virtual platform for network experimentation
Proceedings of the 1st ACM workshop on Virtualized infrastructure systems and architectures
Full-system simulation of distributed memory multicomputers
Cluster Computing
mSWAT: low-cost hardware fault detection and diagnosis for multicore systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Network interfaces for programmable NICs and multicore platforms
Computer Networks: The International Journal of Computer and Telecommunications Networking
Performance of large low-associativity caches
ACM SIGMETRICS Performance Evaluation Review
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An efficient verification method for microprocessors based on the virtual machine
ICESS'04 Proceedings of the First international conference on Embedded Software and Systems
Transformer: a functional-driven cycle-accurate multicore simulator
Proceedings of the 49th Annual Design Automation Conference
Simulating the future kilo-x86-64 core processors and their infrastructure
Proceedings of the 45th Annual Simulation Symposium
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Computer system designers often evaluate future design alternatives with detailed simulators that strive for functional fidelity (to execute relevant workloads) and performance fidelity (to rank design alternatives). Trends toward multi-threaded architectures, more complex micro-architectures, and richer workloads, make authoring detailed simulators increasingly difficult. To manage simulator complexity, this paper advocates decoupled simulator organizations that separate functional and performance concerns. Furthermore, we define an approach, called timing-first simulation, that uses an augmented timing simulator to execute instructions important to performance in conjunction with a functional simulator to insure correctness. This design simplifies software development, leverages existing simulators, and can model micro-architecture timing in detail.We describe the timing-first organization and our experiences implementing TFsim, a full-system multiprocessor performance simulator. TFsim models a pipelined, out-of-order micro-architecture in detail, was developed in less than one person-year, and performs competitively with previously-published simulators. TFsim's timing simulator implements dynamically common instructions (99.99% of them), while avoiding the vast and exacting implementation efforts necessary to run unmodified commercial operating systems and workloads. Virtutech Simics, a full-system functional simulator, checks and corrects the timing simulator's execution, contributing 18-36% to the overall run-time. TFsim's mostly correct functional implementation introduces a worst-case performance error of 4.8% for our commercial workloads. Some additional simulator performance is gained by verifying functional correctness less often, at the cost of some additional performance error.