Moguls: a model to explore the memory hierarchy for bandwidth improvements

Authors:
Guangyu Sun;Christopher J. Hughes;Changkyu Kim;Jishen Zhao;Cong Xu;Yuan Xie;Yen-Kuang Chen
Affiliations:
Pennsylvania State University, State College, PA., USA;Intel Labs, Santa Clara, CA., USA;Intel Labs, Santa Clara, CA., USA;Pennsylvania State University, State College, PA., USA;Pennsylvania State University, State College, PA., USA;Pennsylvania State University, State College, PA., USA;Intel Labs, Santa Clara, CA., USA
Venue:
Proceedings of the 38th annual international symposium on Computer architecture
Year:
2011

Citing 16
Cited 3

Decoupled sectored caches: conciliating low tag implementation cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Dynamically Variable Line-Size Cache Exploiting High On-Chip Memory Bandwidth of Merged DRAM/Logic LSIs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Reflections on the memory wall

Proceedings of the 1st conference on Computing frontiers
Latency lags bandwith

Communications of the ACM - Voting systems
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Cache miss behavior: is it √2?

Proceedings of the 3rd conference on Computing frontiers
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory-Link Compression Schemes: A Value Locality Perspective

IEEE Transactions on Computers
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Architecting phase change memory as a scalable dram alternative

Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Hybrid cache architecture with disparate memory technologies

Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures

IEEE Transactions on Visualization and Computer Graphics

Bandwidth-aware reconfigurable cache design with hybrid memory technologies

Proceedings of the International Conference on Computer-Aided Design
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies

Proceedings of the 9th conference on Computing Frontiers
Dual-addressing memory architecture for two-dimensional memory access patterns

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the increasing number of processor cores and limited increases in main memory bandwidth have led to the problem of the bandwidth wall, where memory bandwidth is becoming a performance bottleneck. This is especially true for emerging latency-insensitive, bandwidth-sensitive applications. Designing the memory hierarchy for a platform with an emphasis on maximizing bandwidth within a fixed power budget becomes one of the key challenges. To facilitate architects to quickly explore the design space of memory hierarchies, we propose an analytical performance model called Moguls. The Moguls model estimates the performance of an application on a system, using the bandwidth demand of the application for a range of cache capacities and the bandwidth provided by the system with those capacities. We show how to extend this model with appropriate approximations to optimize a cache hierarchy under a power constraint. The results show how many levels of cache should be designed, and what the capacity, bandwidth, and technology of each level should be. In addition, we study memory hierarchy design with hybrid memory technologies, which shows the benefits of using multiple technologies for future computing systems.