The interaction of software prefetching with ILP processors in shared-memory systems

  • Authors:
  • Parthasarathy Ranganathan;Vijay S. Pai;Hazim Abdel-Shafi;Sarita V. Adve

  • Affiliations:
  • Department of Electrical and Computer Engineering, Rice University, Houston, Texas;Department of Electrical and Computer Engineering, Rice University, Houston, Texas;Department of Electrical and Computer Engineering, Rice University, Houston, Texas;Department of Electrical and Computer Engineering, Rice University, Houston, Texas

  • Venue:
  • Proceedings of the 24th annual international symposium on Computer architecture
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a significant performance bottleneck for shared-memory multiprocessor systems built of such processors.This paper provides the first study of the effectiveness of software-controlled non-binding prefetching in shared memory multiprocessors built of state-of-the-art ILP-based processors. We find that software prefetching results in significant reductions in execution time (12% to 31%) for three out of five applications on an ILP system. However, compared to previous-generation system, software prefetching is significantly less effective in reducing the memory stall component of execution time on an ILP system. Consequently, even after adding software prefetching, memory stall time accounts for over 30% of the total execution time in four out of five applications on our ILP system.This paper also investigates the interaction of software prefetching with memory consistency models on ILP-based multiprocessors. In particular, we seek to determine whether software prefetching can equalize the performance of sequential consistency (SC) and release consistency (RC). We find that even with software prefetching, for three out of five applications, RC provides a significant reduction in execution time (15% to 40%) compared to SC.