Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Speculative disambiguation: a compilation technique for dynamic memory disambiguation
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The multiscalar architecture
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance evaluation of the PowerPC 620 microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
An architectural alternative to optimizing compilers
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A computer architecture for the dynamic optimization of high-level language programs
A computer architecture for the dynamic optimization of high-level language programs
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Low power data processing by elimination of redundant computations
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Can program profiling support value prediction?
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
Load execution latency reduction
ICS '98 Proceedings of the 12th international conference on Supercomputing
The effect of instruction fetch bandwidth on value prediction
Proceedings of the 25th annual international symposium on Computer architecture
Modeling program predictability
Proceedings of the 25th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Using value prediction to increase the power of speculative execution hardware
ACM Transactions on Computer Systems (TOCS)
Predictive techniques for aggressive load speculation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed early load-address generation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value speculation scheduling for high performance processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Fast speculative search engine on the highly parallel computer EM-X
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Value prediction in VLIW machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storageless value prediction using prior register values
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Increasing effective IPC by exploiting distant parallelism
ICS '99 Proceedings of the 13th international conference on Supercomputing
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
Data threaded microarchitecture
ACM SIGARCH Computer Architecture News
Improving branch predictors by correlating on data values
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dynamic memory disambiguation in the presence of out-of-order store issuing
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Read-after-read memory dependence prediction
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Optimizations and oracle parallelism with dynamic translation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Limits of Data Value Predictability
International Journal of Parallel Programming
Extending Value Reuse to Basic Blocks with Compiler Support
IEEE Transactions on Computers
HLS: combining statistical and symbolic simulation to guide microprocessor designs
Proceedings of the 27th annual international symposium on Computer architecture
On the value locality of store instructions
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Speculative Memory Cloaking and Bypassing
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Efficient and flexible value sampling
ACM SIGPLAN Notices
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Compiler controlled value prediction using branch predictor based confidence
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance improvement with circuit-level speculation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Efficient and flexible value sampling
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Architectural support for fast symmetric-key cryptography
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
CryptoManiac: a fast flexible architecture for secure communication
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
On Table Bandwidth and Its Update Delay for Value Prediction on Wide-Issue ILP Processors
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction
IEEE Transactions on Computers
Silent Stores and Store Value Locality
IEEE Transactions on Computers
Static load classification for improving the value predictability of data-cache misses
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Latency and energy aware value prediction for high-frequency processors
ICS '02 Proceedings of the 16th international conference on Supercomputing
The predictability of load address
ACM SIGARCH Computer Architecture News
Exploiting speculative value reuse using value prediction
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Exploiting Value Locality to Exceed the Dataflow Limit
International Journal of Parallel Programming
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors
International Journal of Parallel Programming
IEEE Transactions on Computers
On Augmenting Trace Cache for High-Bandwidth Value Prediction
IEEE Transactions on Computers
Modeling Value Speculation: An Optimal Edge Selection Problem
IEEE Transactions on Computers
Putting Data Value Predictors to Work in Fine-Grain Parallel Processors
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Using Dataflow Based Contextfor Accurate Branch Prediction
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Influence of Compiler Optimizations on Value Prediction
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Predicting Conditional Branches With Fusion-Based Hybrid Predictors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Quantifying Instruction Criticality
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Independent Hashing as Confidence Mechanism for Value Predictors in Microprocessors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Implementation of Hybrid Context Based Value Predictors Using Value Sequence Classification
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Prolog Tailoring Technique on an Epilog Tailored Procedure
PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Value Prediction as a Cost-Effective Solution to Improve Embedded Processors Performance
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicate prediction for efficient out-of-order execution
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Dynamic Data Dependence Tracking and its Application to Branch Prediction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Hybridizing and Coalescing Load Value Predictors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A Power Perspective of Value Speculation for Superscalar Microprocessors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Detecting global stride locality in value streams
Proceedings of the 30th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
Address-free memory access based on program syntax correlation of loads and stores
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Efficient spill code for SDRAM
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Circuit-aware architectural simulation
Proceedings of the 41st annual Design Automation Conference
VPC3: a fast and effective trace-compression algorithm
Proceedings of the joint international conference on Measurement and modeling of computer systems
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
An Efficient Value Predictor Dynamically Using Loop and Locality Properties
The Journal of Supercomputing
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Generation of High-Performance Trace Compressors
Proceedings of the international symposium on Code generation and optimization
Improving Energy-Efficiency by Bypassing Trivial Computations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Journal of Systems Architecture: the EUROMICRO Journal
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
Fuzzy Memoization for Floating-Point Multimedia Applications
IEEE Transactions on Computers
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The VPC Trace-Compression Algorithms
IEEE Transactions on Computers
Chip multi-processor scalability for single-threaded applications
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Revised Stride Data Value Predictor Design
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
TCgen 2.0: a tool to automatically generate lossless trace compressors
ACM SIGARCH Computer Architecture News
Dynamic reuse of subroutine results
Journal of Systems Architecture: the EUROMICRO Journal
Speculative trivialization point advancing in high-performance processors
Journal of Systems Architecture: the EUROMICRO Journal
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Spice: speculative parallel iteration chunk execution
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Checkpoint allocation and release
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing register file size through instruction pre-execution enhanced by value prediction
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Limits for a feasible speculative trace reuse implementation
International Journal of High Performance Systems Architecture
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Data value prefetching method based on Markov model
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Neural confidence estimation for more accurate value prediction
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Making power-efficient data value predictions
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Cachetor: detecting cacheable data to remove bloat
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Aggressive Value Prediction on a GPU
International Journal of Parallel Programming
Hi-index | 0.04 |
For decades, the serialization constraints induced by true data dependences have been regarded as an absolute limit--the dataflow limit--on the parallel execution of serial programs. This paper proposes a new technique--value prediction--for exceeding that limit that allows data dependent instructions to issue and execute in parallel without violating program semantics. This technique is built on the concept of value locality, which describes the likelihood of the recurrence of a previously-seen value within a storage location inside a computer system. Value prediction consists of predicting entire 32- and 64-bit register values based on previously-seen values. We find that such register values being written by machine instructions are frequently predictable. Furthermore, we show that simple micro- architectural enhancements to a modern microprocessor implementation based on the PowerPC 620 that enable value prediction can effectively exploit value locality to collapse true dependences, reduce average result latency, and provide performance gains of 4.5%-23% (depending on machine model) by exceeding the dataflow limit.