Improving memory scheduling via processor-side load criticality information

Authors:
Saugata Ghose;Hyodong Lee;José F. Martínez
Affiliations:
Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 24
Cited 0

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Checkpointed Early Load Retirement

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Quantitative performance analysis of the SPEC OMPM2001 benchmarks

Scientific Programming - OpenMP
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
MORSE: Multi-objective reconfigurable self-optimizing memory scheduler

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
RAIDR: Retention-Aware Intelligent DRAM Refresh

Proceedings of the 39th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

We hypothesize that performing processor-side analysis of load instructions, and providing this pre-digested information to memory schedulers judiciously, can increase the sophistication of memory decisions while maintaining a lean memory controller that can take scheduling actions quickly. This is increasingly important as DRAM frequencies continue to increase relative to processor speed. In this paper we propose one such mechanism, pairing up a processor-side load criticality predictor with a lean memory controller that prioritizes load requests based on ranking information supplied from the processor side. Using a sophisticated multi-core simulator that includes a detailed quad-channel DDR3 DRAM model, we demonstrate that this mechanism can improve performance significantly on a CMP, with minimal overhead and virtually no changes to the processor itself. We show that our design compares favorably to several state-of-the-art schedulers.