Using SimPoint for accurate and efficient simulation

Authors:
Erez Perelman;Greg Hamerly;Michael Van Biesbrouck;Timothy Sherwood;Brad Calder
Affiliations:
University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego
Venue:
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
2003

Citing 2
Cited 53

Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques

Whole Execution Traces

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
EMPS: An Environment for Memory Performance Studies

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Whole execution traces and their applications

ACM Transactions on Architecture and Code Optimization (TACO)
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

IEEE Transactions on Computers
Phase guided sampling for efficient parallel application simulation

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
A Sampling Method Focusing on Practicality

IEEE Micro
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread

Journal of Parallel and Distributed Computing
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Phase-aware adaptive hardware selection for power-efficient scientific computations

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Accelerating two-dimensional page walks for virtualized systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Power-efficient clustering via incomplete bypassing

Proceedings of the 13th international symposium on Low power electronics and design
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution

Microprocessors & Microsystems
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

Transactions on High-Performance Embedded Architectures and Compilers I
Divide-and-conquer: a bubble replacement for low level caches

Proceedings of the 23rd international conference on Supercomputing
Hybrid Techniques for Fast Multicore Simulation

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors

Architecture Design for Soft Errors
ReSPIR: a response surface-based Pareto iterative refinement for application-specific design space exploration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exploiting execution locality with a decoupled Kilo-instruction processor

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Decoupled state-execute architecture

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
The significance of affectors and affectees correlations for branch prediction

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
WHOLE: a low energy I-cache with separate way history

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Adaptive simulation sampling using an autoregressive framework

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using dead blocks as a virtual victim cache

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MemScale: active low-power modes for main memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Improving branch prediction by considering affectors and affectees correlations

Transactions on high-performance embedded architectures and compilers III
Soft error benchmarking of L2 caches with PARMA

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
Soft error benchmarking of L2 caches with PARMA

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Using silent writes in low-power traffic-aware ECC

PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
On the simulation of large-scale architectures using multiple application abstraction levels

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A practical method for quickly evaluating program optimizations

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Performance modeling: understanding the past and predicting the future

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Preventing PCM banks from seizing too much power

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler support for value-based indirect branch prediction

CC'12 Proceedings of the 21st international conference on Compiler Construction
CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern

Proceedings of the 26th ACM international conference on Supercomputing
TAP: token-based adaptive power gating

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
MultiScale: memory system DVFS with multiple memory controllers

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CoScale: Coordinating CPU and Memory System DVFS in Server Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CoLT: Coalesced Large-Reach TLBs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Inferred Models for Dynamic and Sparse Hardware-Software Spaces

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates

Proceedings of the 40th Annual International Symposium on Computer Architecture
MAPG: memory access power gating

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Linearizing irregular memory accesses for improved correlated prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Insertion and promotion for tree-based PseudoLRU last-level caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ARI: Adaptive LLC-memory traffic management

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of a single industry standard benchmark at this level of detail takes on the order of months to complete. This problem is exacerbated by the fact that to properly perform an architectural evaluation requires multiple benchmarks to be evaluated across many separate runs. To address this issue we recently created a tool called SimPoint that automatically finds a small set of Simulation Points to represent the complete execution of a program for efficient and accurate simulation. In this paper we describe how to use the SimPoint tool, and introduce an improved SimPoint algorithm designed to significantly reduce the simulation time required when the simulation environment relies upon fast-forwarding.