A novel renaming mechanism that boosts software prefetching

Authors:
Daniel Ortega;Mateo Valero;Eduard Ayguadé
Affiliations:
Departamento de Arquitectura de Computadores, Univ ersidad Politécnica de Catalu~ na -- Barcelona, Spain;Departamento de Arquitectura de Computadores, Univ ersidad Politécnica de Catalu~ na -- Barcelona, Spain;Departamento de Arquitectura de Computadores, Univ ersidad Politécnica de Catalu~ na -- Barcelona, Spain
Venue:
ICS '01 Proceedings of the 15th international conference on Supercomputing
Year:
2001

Citing 19
Cited 4

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Facilitating superscalar processing via a combined static/dynamic register renaming scheme

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed data prefetching in multiprocessors with memory hierarchies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A victim cache for vector registers

ICS '97 Proceedings of the 11th international conference on Supercomputing
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Classifying load and store instructions for memory renaming

ICS '99 Proceedings of the 13th international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture

Cost-Effective Compiler Directed Memory Prefetching and Bypassing

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic memory instruction bypassing

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Kilo-instruction processors, runahead and prefetching

Proceedings of the 3rd conference on Computing frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The detection and correct handling of data and control dependencies constitutes one of the biggest issues to expose ILP in current architectures. The ever increasing memory latencies and working space of programmes are making prefetching techniques crucial for the attainment of sustained high performance. Software prefetching allows the compiler to use information discovered at compile-time to effectively bring needed data before it is used, thus hiding all or part of the latency from main memory.On the other hand, renaming is a technique that allows the hardware to break register naming dependencies, thus exposing more parallelism to the hardware. In this paper we will present a new compiler-directed renaming mechanism focused on prefetch instructions. The compiler informs the hardware on the association of prefetch and load instructions, thus making it possible for the hardware to convert non-binding prefetches in to binding prefetches, without any of the compile-time limitations this other kind of prefetching may have.The mechanism can be implemented at a very low costin terms of area and we believe it will not impact cycle time. The research presented in this paper is at a first stage; nevertheless, our results for a set of numerical application show a speedup of 5% to 22%, and in any case no performance degradation was observed.