Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
IEEE Transactions on Computers
Phase guided sampling for efficient parallel application simulation
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
A Sampling Method Focusing on Practicality
IEEE Micro
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Journal of Parallel and Distributed Computing
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Phase-aware adaptive hardware selection for power-efficient scientific computations
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Accelerating two-dimensional page walks for virtualized systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Power-efficient clustering via incomplete bypassing
Proceedings of the 13th international symposium on Low power electronics and design
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution
Microprocessors & Microsystems
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
Divide-and-conquer: a bubble replacement for low level caches
Proceedings of the 23rd international conference on Supercomputing
Hybrid Techniques for Fast Multicore Simulation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors
Architecture Design for Soft Errors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exploiting execution locality with a decoupled Kilo-instruction processor
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
The significance of affectors and affectees correlations for branch prediction
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
WHOLE: a low energy I-cache with separate way history
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Adaptive simulation sampling using an autoregressive framework
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MemScale: active low-power modes for main memory
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Improving branch prediction by considering affectors and affectees correlations
Transactions on high-performance embedded architectures and compilers III
Soft error benchmarking of L2 caches with PARMA
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement in hybrid memory systems
Proceedings of the international conference on Supercomputing
Soft error benchmarking of L2 caches with PARMA
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Using silent writes in low-power traffic-aware ECC
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
On the simulation of large-scale architectures using multiple application abstraction levels
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Performance modeling: understanding the past and predicting the future
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Preventing PCM banks from seizing too much power
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler support for value-based indirect branch prediction
CC'12 Proceedings of the 21st international conference on Compiler Construction
CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern
Proceedings of the 26th ACM international conference on Supercomputing
TAP: token-based adaptive power gating
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
MultiScale: memory system DVFS with multiple memory controllers
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CoScale: Coordinating CPU and Memory System DVFS in Server Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Inferred Models for Dynamic and Sparse Hardware-Software Spaces
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates
Proceedings of the 40th Annual International Symposium on Computer Architecture
MAPG: memory access power gating
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Insertion and promotion for tree-based PseudoLRU last-level caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ARI: Adaptive LLC-memory traffic management
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of a single industry standard benchmark at this level of detail takes on the order of months to complete. This problem is exacerbated by the fact that to properly perform an architectural evaluation requires multiple benchmarks to be evaluated across many separate runs. To address this issue we recently created a tool called SimPoint that automatically finds a small set of Simulation Points to represent the complete execution of a program for efficient and accurate simulation. In this paper we describe how to use the SimPoint tool, and introduce an improved SimPoint algorithm designed to significantly reduce the simulation time required when the simulation environment relies upon fast-forwarding.