Next cache line and set prediction

Authors:
Brad Calder;Dirk Grunwald
Affiliations:
Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO;Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO
Venue:
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Year:
1995

Citing 16
Cited 33

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for branch target buffers

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Designing the TFP Microprocessor

IEEE Micro
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The PowerPC 604 RISC microprocessor

IEEE Micro
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
Don't use the page number, but a pointer to it

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design decisions influencing the UltraSPARC's instruction fetch architecture

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The block-based trace cache

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Completion time multiple branch prediction for enhancing trace cache performance

Proceedings of the 27th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Two cache lines prediction for a wide-issue micro-architecture

ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Direct load: dependence-linked dataflow resolution of load address and cache coordinate

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cool-cache for hot multimedia

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
UltraSparc I: A Four-Issue Processor Supporting Multimedia

IEEE Micro
Operational Data Analysis: Improved Predictions Using Multi-computer Pattern Detection

DSOM '00 Proceedings of the 11th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Services Management in Intelligent Networks
Speeding Up Target Address Generation Using a Self-indexed FTB (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Partitioned first-level cache design for clustered microarchitectures

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Cool-Cache: A compiler-enabled energy efficient data caching framework for embedded/multimedia processors

ACM Transactions on Embedded Computing Systems (TECS)
Low cost instruction cache designs for tag comparison elimination

Proceedings of the 2003 international symposium on Low power electronics and design
Power-Aware Branch Prediction: Characterization and Design

IEEE Transactions on Computers
Using a serial cache for energy efficient instruction fetching

Journal of Systems Architecture: the EUROMICRO Journal
Reducing latencies of pipelined cache accesses through set prediction

Proceedings of the 19th annual international conference on Supercomputing
Merging path and gshare indexing in perceptron branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing I-cache energy of multimedia applications through low cost tag comparison elimination

Journal of Embedded Computing - Cache exploitation in embedded systems
An enhanced DLX-based superscalar system simulator

WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Enlarging Instruction Streams

IEEE Transactions on Computers
A latency-conscious SMT branch prediction architecture

International Journal of High Performance Computing and Networking
Federation: repurposing scalar cores for out-of-order instruction issue

Proceedings of the 45th annual Design Automation Conference
Less reused filter: improving l2 cache performance via filtering less reused lines

Proceedings of the 23rd international conference on Supercomputing
Predictive algorithms in the management of computer systems

IBM Systems Journal
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Predicting cost amortization for query services

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Target encoding for efficient indirect jump prediction

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Link-time optimization for power efficiency in a tagless instruction cache

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of predicting the likely out-come of branch instructions. Several researchers have proposed very effective fetch and branch prediction mechanisms including branch target buffers (BTB) that store the target addresses of taken branches. An alternative approach fetches the instruction following a branch by using an index into the cache instead of a branch target address. We call such an index a next cache line and set (NLS) predictor. A NLS predictor is a pointer into the instruction cache, indicating the target instruction of a branch.In this paper we examine the use of NLS predictors for efficient and accurate fetch and branch prediction. Previous studies associated each NLS predictor with a cache line and provided only one-bit conditional branch predictors. Our study examines the use of NLS predictors with highly accurate two-level correlated conditional branch architectures. We examine the performance of decoupling the NLS predictors from the cache line and storing them in a separate tag-less memory buffer. Our results show that the decoupled architecture performs better than associating the NLS predictors with the cache line, that the NLS architecture benefits from reduced cache miss rates, and it is particularly effective for programs containing many branches. We also provide an in-depth comparison between the NLS and BTB architectures, showing that the NLS architecture is a competitive alternative to the BTB design.