Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

Authors:
Engin Ipek;Onur Mutlu;José F. Martínez;Rich Caruana
Affiliations:
-;-;-;-
Venue:
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Year:
2008

Citing 36
Cited 49

Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evidence-based static branch prediction using machine learning

ACM Transactions on Programming Languages and Systems (TOPLAS)
A language for describing predictors and its application to automatic synthesis

Proceedings of the 24th annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Scheduling straight-line code using reinforcement learning and rollouts

Proceedings of the 1998 conference on Advances in neural information processing systems II
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
Reinforcement Learning

Reinforcement Learning
Machine Learning

Machine Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
The Alpha 21264 Microprocessor

IEEE Micro
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic Branch Prediction with Perceptrons

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Modern dram architectures

Modern dram architectures
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Early Evaluation of the Cray X1

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A study of performance impact of memory controller features in multi-processor server environment

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Mobilized Ad-Hoc Networks: A Reinforcement Learning Approach

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
A Reinforcement Learning Framework for Dynamic Resource Allocation: First Results.

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
The design space of data-parallel memory systems

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Quantitative performance analysis of the SPEC OMPM2001 benchmarks

Scientific Programming - OpenMP
Effective Management of DRAM Bandwidth in Multicore Processors

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Burst Scheduling Access Reordering Mechanism

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Online resource allocation using decompositional reinforcement learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2

Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
VCONF: a reinforcement learning approach to virtual machines auto-configuration

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Application heartbeats for software performance and health

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
RL-Based Memory Controller for Scalable Autonomous Systems

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
ReSPIR: a response surface-based Pareto iterative refinement for application-specific design space exploration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Proceedings of the 7th international conference on Autonomic computing
Smartlocks: lock acquisition scheduling for self-aware synchronization

Proceedings of the 7th international conference on Autonomic computing
Rethinking DRAM design and organization for energy-constrained multi-cores

Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A pattern for efficient parallel computation on multicore processors with scalar operand networks

Proceedings of the 2010 Workshop on Parallel Programming Patterns
METE: meeting end-to-end QoS in multicores through system-wide resource management

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
Smart data structures: an online machine learning approach to multicore data structures

Proceedings of the 8th ACM international conference on Autonomic computing
METE: meeting end-to-end QoS in multicores through system-wide resource management

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Bandwidth constrained coordinated HW/SW prefetching for multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Memory access schedule minimization for embedded systems

Journal of Systems Architecture: the EUROMICRO Journal
URL: A unified reinforcement learning approach for autonomic cloud management

Journal of Parallel and Distributed Computing
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
A Variability-Aware Robust Design Space Exploration Methodology for On-Chip Multiprocessors Subject to Application-Specific Constraints

ACM Transactions on Embedded Computing Systems (TECS)
Metronome: operating system level performance management via self-adaptive computing

Proceedings of the 49th Annual Design Automation Conference
Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Proceedings of the 26th ACM international conference on Supercomputing
PARDIS: a programmable memory controller for the DDRx interfacing standards

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving writeback efficiency with decoupled last-write prediction

Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Achieving autonomous power management using reinforcement learning

ACM Transactions on Design Automation of Electronic Systems (TODAES)
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
Conservative open-page policy for mixed time-criticality memory controllers

Proceedings of the Conference on Design, Automation and Test in Europe
Flicker: a dynamically adaptive architecture for power limited multicore systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Improving memory scheduling via processor-side load criticality information

Proceedings of the 40th Annual International Symposium on Computer Architecture
Navigating big data with high-throughput, energy-efficient data partitioning

Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Writeback-aware bandwidth partitioning for multi-core systems with PCM

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Reshaping cache misses to improve row-buffer locality in multicore systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A programmable memory controller for the DDRx interfacing standards

ACM Transactions on Computer Systems (TOCS)
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective, high-performance chip multiprocessors(CMPs). Conventional memory controllers deliver relativelylow performance in part because they often employ fixed,rigid access scheduling policies designed for average-case applicationbehavior. As a result, they cannot learn and optimizethe long-term performance impact of their scheduling decisions,and cannot adapt their scheduling policies to dynamic workloadbehavior.We propose a new, self-optimizing memory controller designthat operates using the principles of reinforcement learning (RL)to overcome these limitations. Our RL-based memory controllerobserves the system state and estimates the long-term performanceimpact of each action it can take. In this way, the controllerlearns to optimize its scheduling policy on the fly to maximizelong-term performance. Our results show that an RL-basedmemory controller improves the performance of a set of parallelapplications run on a 4-core CMP by 19% on average (upto 33%), and it improves DRAM bandwidth utilization by 22%compared to a state-of-the-art controller.