Sunder: a programmable hardware prefetch architecture for numerical loops

  • Authors:
  • Tzi-cker Chiueh

  • Affiliations:
  • State University of New York at Stony Brook, Stony Brook, NY

  • Venue:
  • Proceedings of the 1994 ACM/IEEE conference on Supercomputing
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Beyond data caching, data prefetching is by far the most effective way to address the memory access bottleneck associated with high-performance processors. This is particularly true for scientific programs whose working sets cannot be easily fit into the on-chip data cache. This paper proposes a new data prefetching architecture called Sunder, which combines the flexibility and accurateness of software prefetching and the transparency and low-overhead of hardware prefetching. The heart of the design is a dedicated Prefetch Engine that is programmable at run time by the software. An important design decision is to keep the Prefetch Engine completely isolated from the normal instruction execution pipeline except a loop counter to keep the two synchronized at the boundaries of loop iterations. A detailed simulation study on the Sunder architecture shows that compared to the cache-only architecture, Sunder achieves an average relative performance advantage over cache-only architectures ranging from 28% to 46%, with smaller cache block sizes leading to greater performance improvement.