Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Efficient (stack) algorithms for analysis of write-back and sector memories
ACM Transactions on Computer Systems (TOCS)
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Analytical Modeling of Set-Associative Cache Behavior
IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel trace-driven cache simulation by time partitioning
WSC' 90 Proceedings of the 22nd conference on Winter simulation
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Eager writeback - a technique for improving bandwidth utilization
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing leakage in a high-performance deep-submicron instruction cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Memory Design and Exploration for Low Power, Embedded Systems
Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Leakage Energy Management in Cache Hierarchies
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler-directed instruction cache leakage optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Managing static leakage energy in microprocessor functional units
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Thermal Management System for High Performance PowerPCTM Microprocessors
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
A resource allocation model for QoS management
RTSS '97 Proceedings of the 18th IEEE Real-Time Systems Symposium
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Adaptive mode control: A static-power-efficient cache design
ACM Transactions on Embedded Computing Systems (TECS)
Effectiveness and scaling trends of leakage control techniques for sub-130nm CMOS technologies
Proceedings of the 2003 international symposium on Low power electronics and design
Exploiting program hotspots and code sequentiality for instruction cache leakage management
Proceedings of the 2003 international symposium on Low power electronics and design
The accuracy of trace-driven simulations of multiprocessors
The accuracy of trace-driven simulations of multiprocessors
Miss Rate Prediction across All Program Inputs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Characterizing and Predicting Program Behavior and its Variability
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Static energy reduction techniques for microprocessor caches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Automatic Tuning of Two-Level Caches to Embedded Applications
Proceedings of the conference on Design, automation and test in Europe - Volume 1
State-Preserving vs. Non-State-Preserving Leakage Control in Caches
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
A self-tuning cache architecture for embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
IBM Journal of Research and Development
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Cache optimization for embedded processor cores: An analytical approach
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Transition Phase Classification and Prediction
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Comprehensive multiprocessor cache miss rate generation using multivariate models
ACM Transactions on Computer Systems (TOCS)
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Accurate and Complexity-Effective Spatial Pattern Prediction
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A non-uniform cache architecture for low power system design
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Exploring the limits of leakage power reduction in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
Profile Directed Instruction Cache Tuning for Embedded Systems
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
Finding optimal L1 cache configuration for embedded systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Selecting Software Phase Markers with Code Structure Analysis
Proceedings of the International Symposium on Code Generation and Optimization
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Configurable cache subsetting for fast cache tuning
Proceedings of the 43rd annual Design Automation Conference
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A co-phase matrix to guide simultaneous multithreading simulation
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Structures for phase classification
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Phase guided sampling for efficient parallel application simulation
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
A one-shot configurable-cache tuner for improved energy and performance
Proceedings of the conference on Design, automation and test in Europe
Instruction trace compression for rapid instruction cache simulation
Proceedings of the conference on Design, automation and test in Europe
A self-tuning configurable cache
Proceedings of the 44th annual Design Automation Conference
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A table-based method for single-pass cache optimization
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Discovering and Exploiting Program Phases
IEEE Micro
Dynamic tuning of configurable architectures: the AWW online algorithm
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Adaptive insertion policies for managing shared caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Computer Architecture Techniques for Power-Efficiency
Computer Architecture Techniques for Power-Efficiency
An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
SlackSim: a platform for parallel simulations of CMPs on CMPs
ACM SIGARCH Computer Architecture News
GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Enabling software management for multicore caches with a lightweight hardware support
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Evaluation techniques for storage hierarchies
IBM Systems Journal
Modeling and Stack Simulation of CMP Cache Capacity and Accessibility
IEEE Transactions on Parallel and Distributed Systems
Software—Practice & Experience
Computer Architecture Performance Evaluation Methods
Computer Architecture Performance Evaluation Methods
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
T-SPaCS: a two-level single-pass cache simulation methodology
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Hotspot: acompact thermal modeling methodology for early-stage VLSI design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A forward body-biased low-leakage SRAM cache: device, circuit and architecture considerations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast configurable-cache tuning with a unified second-level cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CloudCache: Expanding and shrinking private caches
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Trace-driven simulation of multithreaded applications
ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces
MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
CPACT - The conditional parameter adjustment cache tuner for dual-core architectures
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
IEEE Spectrum
A NUCA Substrate for Flexible CMP Cache Sharing
IEEE Transactions on Parallel and Distributed Systems
Multi-Core Cache Hierarchies
Hi-index | 0.00 |
Low power and/or energy consumption is a requirement not only in embedded systems that run on batteries or have limited cooling capabilities, but also in desktop and mainframes where chips require costly cooling techniques. Since the cache subsystem is typically the most power/energy-consuming subsystem, caches are good candidates for power/energy optimizations, and therefore, cache tuning techniques are widely researched. This survey focuses on state-of-the-art offline static and online dynamic cache tuning techniques and summarizes the techniques' attributes, major challenges, and potential research trends to inspire novel ideas and future research avenues.