MIPS RISC architectures
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Reducing memory traffic with CRegs
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Predictive techniques for aggressive load speculation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed early load-address generation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storageless value prediction using prior register values
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Classifying load and store instructions for memory renaming
ICS '99 Proceedings of the 13th international conference on Supercomputing
Access region locality for high-bandwidth processor memory system design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Read-after-read memory dependence prediction
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Extending Value Reuse to Basic Blocks with Compiler Support
IEEE Transactions on Computers
Understanding the backward slices of performance degrading instructions
Proceedings of the 27th annual international symposium on Computer architecture
Speculative Memory Cloaking and Bypassing
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A High-Bandwidth Memory Pipeline for Wide Issue Processors
IEEE Transactions on Computers
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction
IEEE Transactions on Computers
The predictability of load address
ACM SIGARCH Computer Architecture News
Control-Flow Speculation through Value Prediction
IEEE Transactions on Computers
Using Dataflow Based Contextfor Accurate Branch Prediction
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Decoupling Recovery Mechanism for Data Speculation from Dynamic Instruction Scheduling Structure
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Prolog Tailoring Technique on an Epilog Tailored Procedure
PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Quantifying behavioral differences between multimedia and general-purpose workloads
Journal of Systems Architecture: the EUROMICRO Journal
Three extensions to register integration
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic memory instruction bypassing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Address-free memory access based on program syntax correlation of loads and stores
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Efficient spill code for SDRAM
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Load elimination for low-power embedded processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Incremental Commit Groups for Non-Atomic Trace Processing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Store-Load Forwarding via Store Queue Index Prediction
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic memory instruction bypassing
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
NoSQ: Store-Load Communication without a Store Queue
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A comparison of two policies for issuing instructions speculatively
Journal of Systems Architecture: the EUROMICRO Journal
Visual simulator for ILP dynamic OOO processor
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Reducing misspeculation penalty in trace-level speculative multithreaded architectures
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Hi-index | 0.01 |
As processors continue to exploit more instruction-level parallelism, a greater demand is placed on reducing the effects of memory access latency. In this paper, we introduce a novel modification of the processor pipeline called memory renaming. Memory renaming applies register access techniques to load instructions, reducing the effect of delays caused by the need to calculate effective addresses for the load and all preceding stores before the data can be fetched. Memory renaming allows the processor to speculatively fetch values when the producer of the data can be reliably determined without the need for an effective address. This work extends previous studies of data value and dependence speculation. When memory renaming is added to the processor pipeline, renaming can be applied to 30% to 50% of all memory references, translating to an overall improvement in execution time of up to 41%. Furthermore, this improvement is seen across all memory segments-including the heap segment, which has often been difficult to manage efficiently.