Dynamic partition of memory reference instructions – a register guided approach

Authors:
Yixin Shi;Gyungho Lee
Affiliations:
ECE Department, University of Illinois at Chicago;ECE Department, University of Illinois at Chicago
Venue:
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Year:
2005

Citing 14
Cited 1

High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Eliminating cache conflict misses through XOR-based placement functions

ICS '97 Proceedings of the 11th international conference on Supercomputing
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Access region locality for high-bandwidth processor memory system design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Billion-Transistor Architectures

Computer
Superspeculative Microarchitecture for Beyond AD 2000

Computer
Partitioned first-level cache design for clustered microarchitectures

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
Parallel Cachelets

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Access Region Cache: A Multi-Porting Solution for Future Wide-Issue Processors

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
A high-bandwidth memory pipeline for wide issue processors

A high-bandwidth memory pipeline for wide issue processors

Access region cache with register guided memory reference partitioning

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A high bandwidth L-1 data cache is essential for achieving high performance in wide-issue processors. Previous studies have shown that using multiple small single-ported caches instead of a monolithic large multi-ported one for L-1 data cache can be a scalable and inexpensive way to provide higher bandwidth. Many schemes have been proposed on how to direct the memory references to these multiple caches in order to achieve a close match to the performance of an ideal multi-ported cache. However, most previous designs seldom take dynamic data access patterns into consideration and thus suffer from access conflicts within one cache and unbalanced loads between the caches. We observe that if one can group data references defined in a program into several regions (access regions) to allow parallel accesses, then providing separate small caches (access region cache) for these regions may prove to have better performance than previous multi-cache schemes. The register-guided memory reference partition approach proposed in this paper effectively identifies these semantic regions and organizes them in multiple caches in an adaptive way to maximize concurrent accesses without incurring too much overhead. In our design, the base register number, not its content, in the memory reference instruction is used as a basic guide for instruction steering. A reassignment mechanism is applied to capture the pattern when program is moving across its access regions. In addition, a distribution mechanism is introduced to further reduce residual conflicts, which adaptively enables access regions to extend or shrink among the physical caches. Our simulations of SPEC CPU2000 benchmarks have shown that the register-guided approach can reduce the conflicts effectively, distribute memory reference instructions properly, and yield considerable performance improvement in terms of IPC.