Unified memory optimizing architecture: memory subsystem control with a unified predictor

Authors:
Yasuo Ishii;Mary Inaba;Kei Hiraki
Affiliations:
The University of Tokyo & NEC Corporation, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 22
Cited 0

Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
AC/DC: An Adaptive Data Cache Prefetcher

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Spatial Memory Streaming

Proceedings of the 33rd annual international symposium on Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Less reused filter: improving l2 cache performance via filtering less reused lines

Proceedings of the 23rd international conference on Supercomputing
Access map pattern matching for data cache prefetch

Proceedings of the 23rd international conference on Supercomputing
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
DRAMSim2: A Cycle Accurate Memory System Simulator

IEEE Computer Architecture Letters
MARSS: a full system simulator for multicore x86 CPUs

Proceedings of the 48th Design Automation Conference
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data prefetching, advanced cache replacement policy, and memory access scheduling are incorporated in modern processors. Typically, each technique holds recently accessed locations independently and controls the memory subsystem based on the prediction of future memory access. Unfortunately, these specific optimizations often increase the implementation cost, decrease the system performance, and reduce scalability of the processor chip. In this paper, we propose Unified Memory Optimizing (UMO) architecture to resolve these problems. The UMO architecture is a control architecture for the memory subsystem and takes a unified approach to data prefetching, cache management, and memory access scheduling. On this architecture, we propose a Map-based Unified Memory Subsystem Controller (MUMSC) that is composed of DRAM-Aware prefetching, Prefetch-Aware Cache Line Promotion, and lightweight memory controllers. MUMSC is implemented as the per-core resource to predict future memory access from the per-core memory access history. MUMSC realizes a scalable and high performance memory subsystem with a reasonable hardware cost. We evaluate MUMSC using a multi-core simulator with multi-programmed workloads of SPEC CPU2006. The results of the simulation show that the system throughput using MUMSC outperforms a combination of state-of-the-art enhancement techniques by 11.5% without increasing the hardware costs and the complexity of the design of the shared resources.