Tolerating data access latency with register preloading

Authors:
William Y. Chen;Scott A. Mahlke;Wen-mei W. Hwu;Tokuzo Kiyohara;Pohua P. Chang
Affiliations:
-;-;-;-;-
Venue:
ICS '92 Proceedings of the 6th international conference on Supercomputing
Year:
1992

Citing 15
Cited 6

A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
MIPS RISC architecture

MIPS RISC architecture
CRegs: a new kind of memory for referencing arrays and pointers

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Run-time disambiguation: coping with statically unpredictable dependencies

IEEE Transactions on Computers
Instruction scheduling for the IBM RISC System/6000 processor

IBM Journal of Research and Development
Analysis of pointers and structures

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Architectural support for register allocation in the presence of aliasing

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data dependence analysis on multi-dimensional array references

ICS '89 Proceedings of the 3rd international conference on Supercomputing
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers

Data relocation and prefetching for programs with large data sets

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
Run-Time Cache Bypassing

IEEE Transactions on Computers
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion

Proceedings of the international symposium on Code generation and optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.