Memory scheduling for modern microprocessors

Authors:
Ibrahim Hur;Calvin Lin
Affiliations:
The University of Texas at Austin and IBM Corporation, Austin, TX;The University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2007

Citing 26
Cited 10

Performance evaluation of vector accesses in parallel memories using a skewed storage scheme

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
On randomly interleaved memories

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Increasing the number of strides for conflict-free vector access

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Chinese remainder theorem and the prime memory system

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Access ordering and effective memory bandwidth

Access ordering and effective memory bandwidth
Vector multiprocessors with arbitrated memory access

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Maximizing memory bandwidth for streamed computations

Maximizing memory bandwidth for streamed computations
Memory systems and pipelined processors

Memory systems and pipelined processors
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Algorithmic foundations for a parallel vector access memory system

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
Smarter Memory: Improving Bandwidth for Streamed References

Computer
Imagine: Media Processing with Streams

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor

Proceedings of the 30th annual international symposium on Computer architecture
Hardware Support for Dynamic Access Ordering: Performance of Some Design Options

Hardware Support for Dynamic Access Ordering: Performance of Some Design Options
Design and implementation of the POWER5™ microprocessor

Proceedings of the 41st annual Design Automation Conference
Toward Realistic Haptic Rendering of Surface Textures

IEEE Computer Graphics and Applications
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Adaptive History-Based Memory Schedulers for Modern Processors

IEEE Micro
Enhancing memory controllers to improve dram power and performance

Enhancing memory controllers to improve dram power and performance
POWER4 system microarchitecture

IBM Journal of Research and Development

An SDRAM-aware router for Networks-on-Chip

Proceedings of the 46th Annual Design Automation Conference
An analytical model to exploit memory task scheduling

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
A Low-Latency and Memory-Efficient On-chip Network

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors

Journal of Signal Processing Systems
An SDRAM-aware router for networks-on-chip

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
A software memory partition approach for eliminating bank-level interference in multicore systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Return data interleaving for multi-channel embedded CMPs systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing DRAM row activations with eager read/write clustering

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os

Proceedings of the International Conference on Computer-Aided Design
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.