Microarchitecture support for improving the performance of load target prediction

Authors:
Chung-Ho Chen;Akida Wu
Affiliations:
Dept. of Electronic Engineering, National Yunlin University of Science & Technology, Yunlin, Taiwan, R.O.C.;Dept. of Electronic Engineering, National Yunlin University of Science & Technology, Yunlin, Taiwan, R.O.C.
Venue:
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Year:
1997

Citing 8
Cited 5

Designing the TFP Microprocessor

IEEE Micro
A unified architectural tradeoff methodology

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A load-instruction unit for pipelined processors

IBM Journal of Research and Development

Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Direct load: dependence-linked dataflow resolution of load address and cache coordinate

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Address-free memory access based on program syntax correlation of loads and stores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal
Simulating a LAGS processor to consider variable latency on L1 D-Cache

Proceedings of the 2010 Summer Computer Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Presents a load target prediction scheme that mitigates the impact of load latency for modern microprocessors. The scheme uses a cache-like buffer to provide the base address, offset and operand size at the instruction fetching stage of a pipeline so that a load target address can be computed earlier at the decode stage. With the dynamic use of a load stride, the scheme has achieved a prediction rate that is 15% higher than a previously proposed approach. By providing a 128-entry direct-mapped load-prediction buffer, two adders and two forwarding paths, for a 4-fetch processor the scheme provides an average speedup of 10% to 32% in performance improvement as the data cache latency increases from 2 cycles to 4 cycles. A bit-array design that supports multiple-cast writes and eliminates the associative logic commonly used in base register caching is developed for the prediction scheme.