Exceeding the dataflow limit via value prediction

Authors:
Mikko H. Lipasti;John Paul Shen
Affiliations:
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh PA
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 23
Cited 135

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
On the limits of program parallelism and its smoothability

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Speculative disambiguation: a compilation technique for dynamic memory disambiguation

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The multiscalar architecture

The multiscalar architecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance evaluation of the PowerPC 620 microarchitecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
VMW: A Visualization-Based Microarchitecture Workbench

Computer
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
An architectural alternative to optimizing compilers

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A computer architecture for the dynamic optimization of high-level language programs

A computer architecture for the dynamic optimization of high-level language programs
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Low power data processing by elimination of redundant computations

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Can program profiling support value prediction?

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP

ICS '98 Proceedings of the 12th international conference on Supercomputing
Load execution latency reduction

ICS '98 Proceedings of the 12th international conference on Supercomputing
The effect of instruction fetch bandwidth on value prediction

Proceedings of the 25th annual international symposium on Computer architecture
Modeling program predictability

Proceedings of the 25th annual international symposium on Computer architecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Predictive techniques for aggressive load speculation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value speculation scheduling for high performance processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Fast speculative search engine on the highly parallel computer EM-X

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Value prediction in VLIW machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storageless value prediction using prior register values

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Increasing effective IPC by exploiting distant parallelism

ICS '99 Proceedings of the 13th international conference on Supercomputing
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Data threaded microarchitecture

ACM SIGARCH Computer Architecture News
Improving branch predictors by correlating on data values

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dynamic memory disambiguation in the presence of out-of-order store issuing

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Read-after-read memory dependence prediction

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Predicting the usefulness of a block result: a micro-architectural technique for high-performance low-power processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Optimizations and oracle parallelism with dynamic translation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Limits of Data Value Predictability

International Journal of Parallel Programming
Extending Value Reuse to Basic Blocks with Compiler Support

IEEE Transactions on Computers
HLS: combining statistical and symbolic simulation to guide microprocessor designs

Proceedings of the 27th annual international symposium on Computer architecture
On the value locality of store instructions

Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Speculative Memory Cloaking and Bypassing

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Efficient and flexible value sampling

ACM SIGPLAN Notices
Silent stores for free

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Compiler controlled value prediction using branch predictor based confidence

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance improvement with circuit-level speculation

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Efficient and flexible value sampling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Architectural support for fast symmetric-key cryptography

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Better exploration of region-level value locality with integrated computation reuse and value prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
CryptoManiac: a fast flexible architecture for secure communication

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
On Table Bandwidth and Its Update Delay for Value Prediction on Wide-Issue ILP Processors

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction

IEEE Transactions on Computers
Silent Stores and Store Value Locality

IEEE Transactions on Computers
Static load classification for improving the value predictability of data-cache misses

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Latency and energy aware value prediction for high-frequency processors

ICS '02 Proceedings of the 16th international conference on Supercomputing
The predictability of load address

ACM SIGARCH Computer Architecture News
Exploiting speculative value reuse using value prediction

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Exploiting Value Locality to Exceed the Dataflow Limit

International Journal of Parallel Programming
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

International Journal of Parallel Programming
Superspeculative Microarchitecture for Beyond AD 2000

Computer
Trace Processors: Moving to Fourth-Generation Microarchitectures

Computer
Guest Editors' Introduction: Challenges in Processor Modeling and Validation

IEEE Micro
An Integrated Functional Performance Simulator

IEEE Micro
Hybrid Load-Value Predictors

IEEE Transactions on Computers
On Augmenting Trace Cache for High-Bandwidth Value Prediction

IEEE Transactions on Computers
Modeling Value Speculation: An Optimal Edge Selection Problem

IEEE Transactions on Computers
Putting Data Value Predictors to Work in Fine-Grain Parallel Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Using Dataflow Based Contextfor Accurate Branch Prediction

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Influence of Compiler Optimizations on Value Prediction

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
On the Exploitation of Value Predication and Producer Identification to Reduce Barrier Synchronization Time

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Predicting Conditional Branches With Fusion-Based Hybrid Predictors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Quantifying Instruction Criticality

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Independent Hashing as Confidence Mechanism for Value Predictors in Microprocessors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Implementation of Hybrid Context Based Value Predictors Using Value Sequence Classification

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Prolog Tailoring Technique on an Epilog Tailored Procedure

PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Value Prediction as a Cost-Effective Solution to Improve Embedded Processors Performance

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicate prediction for efficient out-of-order execution

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A new speculation technique to optimize floating-point performance while preserving bit-by-bit reproducibility

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploring Microprocessor Architectures for Gigascale Integration

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Dynamic Data Dependence Tracking and its Application to Branch Prediction

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Hybridizing and Coalescing Load Value Predictors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A Power Perspective of Value Speculation for Superscalar Microprocessors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Detecting global stride locality in value streams

Proceedings of the 30th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
Address-free memory access based on program syntax correlation of loads and stores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Efficient spill code for SDRAM

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Speeding Up Processing with Approximation Circuits

Computer
Circuit-aware architectural simulation

Proceedings of the 41st annual Design Automation Conference
VPC3: a fast and effective trace-compression algorithm

Proceedings of the joint international conference on Measurement and modeling of computer systems
A Content Aware Integer Register File Organization

Proceedings of the 31st annual international symposium on Computer architecture
A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
An Efficient Value Predictor Dynamically Using Loop and Locality Properties

The Journal of Supercomputing
On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Whole Execution Traces

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Generation of High-Performance Trace Compressors

Proceedings of the international symposium on Code generation and optimization
Improving Energy-Efficiency by Bypassing Trivial Computations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Instruction level redundant number computations for fast data intensive processing in asynchronous processors

Journal of Systems Architecture: the EUROMICRO Journal
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction

IEEE Transactions on Computers
Fuzzy Memoization for Floating-Point Multimedia Applications

IEEE Transactions on Computers
Whole execution traces and their applications

ACM Transactions on Architecture and Code Optimization (TACO)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The VPC Trace-Compression Algorithms

IEEE Transactions on Computers
Chip multi-processor scalability for single-threaded applications

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Revised Stride Data Value Predictor Design

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Profiling over Adaptive Ranges

Proceedings of the International Symposium on Code Generation and Optimization
TCgen 2.0: a tool to automatically generate lossless trace compressors

ACM SIGARCH Computer Architecture News
Dynamic reuse of subroutine results

Journal of Systems Architecture: the EUROMICRO Journal
Speculative trivialization point advancing in high-performance processors

Journal of Systems Architecture: the EUROMICRO Journal
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
SoftSig: software-exposed hardware signatures for code analysis and optimization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Spice: speculative parallel iteration chunk execution

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Formulating and implementing profiling over adaptive ranges

ACM Transactions on Architecture and Code Optimization (TACO)
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing register file size through instruction pre-execution enhanced by value prediction

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Limits for a feasible speculative trace reuse implementation

International Journal of High Performance Systems Architecture
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An input-centric paradigm for program dynamic optimizations

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Data value prefetching method based on Markov model

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Neural confidence estimation for more accurate value prediction

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Making power-efficient data value predictions

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Speculative issue logic

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Exploiting inter-sequence correlations for program behavior prediction

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Cachetor: detecting cacheable data to remove bloat

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Aggressive Value Prediction on a GPU

International Journal of Parallel Programming

Quantified Score

Hi-index	0.04

Visualization

Abstract

For decades, the serialization constraints induced by true data dependences have been regarded as an absolute limit--the dataflow limit--on the parallel execution of serial programs. This paper proposes a new technique--value prediction--for exceeding that limit that allows data dependent instructions to issue and execute in parallel without violating program semantics. This technique is built on the concept of value locality, which describes the likelihood of the recurrence of a previously-seen value within a storage location inside a computer system. Value prediction consists of predicting entire 32- and 64-bit register values based on previously-seen values. We find that such register values being written by machine instructions are frequently predictable. Furthermore, we show that simple micro- architectural enhancements to a modern microprocessor implementation based on the PowerPC 620 that enable value prediction can effectively exploit value locality to collapse true dependences, reduce average result latency, and provide performance gains of 4.5%-23% (depending on machine model) by exceeding the dataflow limit.