A memory bandwidth effective cache store miss policy

Authors:
Hou Rui;Fuxin Zhang;Weiwu Hu
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Year:
2005

Citing 12
Cited 0

Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Billion-Transistor Architectures: There and Back Again

Computer
Microarchitecture of the Godson-2 processor

Journal of Computer Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory bandwidth becomes more and more important in the forthcoming 10 billion transistors chip times. This paper discusses and implements a memory bandwidth effective cache store miss policy. Although the write-allocate policy is adopted, we find it is possible not to load the full cache block from lower memory hierarchy when cache store miss occurs, if the cache block is fully modified before any load instruction accesses the un-modified data of the same cache block. This cache store miss policy will partly reduce the pressure on memory bandwidth, and improve the cache hit rate. We provides a hardware mechanism, Store Merge Buffer, to implement the policy in Goodson-2 processor. Our experiments demonstrate the encouraging results: Memory bandwidth improved by almost 50% (tested by stream benchmark), and IPC on SPEC CPU2K improved by 9.4% on average.