The interaction of software prefetching with ILP processors in shared-memory systems

Authors:
Parthasarathy Ranganathan;Vijay S. Pai;Hazim Abdel-Shafi;Sarita V. Adve
Affiliations:
Department of Electrical and Computer Engineering, Rice University, Houston, Texas;Department of Electrical and Computer Engineering, Rice University, Houston, Texas;Department of Electrical and Computer Engineering, Rice University, Houston, Texas;Department of Electrical and Computer Engineering, Rice University, Houston, Texas
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 25
Cited 13

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective cache prefetching on bus-based multiprocessors

ACM Transactions on Computer Systems (TOCS)
Memory latency reduction via data prefetching and data forwarding in shared memory multiprocessors

Memory latency reduction via data prefetching and data forwarding in shared memory multiprocessors
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Latency Tolerance for Dynamic Processors

Latency Tolerance for Dynamic Processors
Reducing Cache Miss Rates Using Prediction Caches

Reducing Cache Miss Rates Using Prediction Caches

Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Performance of image and video processing with general-purpose processors and media ISA extensions

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

IEEE Transactions on Computers
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Fighting the memory wall with assisted execution

Proceedings of the 1st conference on Computing frontiers
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
A unified theory of shared memory consistency

Journal of the ACM (JACM)
Memory Performance Optimizations For Real-Time Software HDTV Decoding

Journal of VLSI Signal Processing Systems
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a significant performance bottleneck for shared-memory multiprocessor systems built of such processors.This paper provides the first study of the effectiveness of software-controlled non-binding prefetching in shared memory multiprocessors built of state-of-the-art ILP-based processors. We find that software prefetching results in significant reductions in execution time (12% to 31%) for three out of five applications on an ILP system. However, compared to previous-generation system, software prefetching is significantly less effective in reducing the memory stall component of execution time on an ILP system. Consequently, even after adding software prefetching, memory stall time accounts for over 30% of the total execution time in four out of five applications on our ILP system.This paper also investigates the interaction of software prefetching with memory consistency models on ILP-based multiprocessors. In particular, we seek to determine whether software prefetching can equalize the performance of sequential consistency (SC) and release consistency (RC). We find that even with software prefetching, for three out of five applications, RC provides a significant reduction in execution time (15% to 40%) compared to SC.