Focusing processor policies via critical-path prediction

Authors:
Brian Fields;Shai Rubin;Rastislav Bodík
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison
Venue:
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Year:
2001

Citing 19
Cited 73

Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Critical Path Profiling of Message Passing and Shared-Memory Programs

IEEE Transactions on Parallel and Distributed Systems
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Control CPR: a branch height reduction optimization for EPIC architectures

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Critical path analysis of TCP transactions

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Latency and energy aware value prediction for high-frequency processors

ICS '02 Proceedings of the 16th international conference on Supercomputing
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing power with dynamic critical path information

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Joint local and global hardware adaptations for energy

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Modeling Value Speculation: An Optimal Edge Selection Problem

IEEE Transactions on Computers
Quantifying Instruction Criticality

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Reducing Energy Consumption via Low-Cost Value Prediction

PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler managed micro-cache bypassing for high performance EPIC processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A quantitative framework for automated pre-execution thread selection

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Quantifying instruction criticality for shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
TCP: Tag Correlating Prefetchers

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor

Proceedings of the 30th annual international symposium on Computer architecture
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Detecting global stride locality in value streams

Proceedings of the 30th annual international symposium on Computer architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Application adaptive energy efficient clustered architectures

Proceedings of the 2004 international symposium on Low power electronics and design
Power-performance trade-off using pipeline delays

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Interaction Cost: For When Event Counts Just Don't Add Up

IEEE Micro
Balancing clustering-induced stalls to improve performance in clustered processors

Proceedings of the 2nd conference on Computing frontiers
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection

Proceedings of the 32nd annual international symposium on Computer Architecture
Investigating heterogeneous combination of functional units for a criticality-based low-power processor architecture

ISICT '04 Proceedings of the 2004 international symposium on Information and communication technologies
An asymmetric clustered processor based on value content

Proceedings of the 19th annual international conference on Supercomputing
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient instruction scheduling utilizing cache miss information

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Stall cycle redistribution in a transparent fetch pipeline

Proceedings of the 2006 international symposium on Low power electronics and design
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Serialization-Aware Mini-Graphs: Performance with Fewer Resources

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
Global critical path: a tool for system-level timing analysis

Proceedings of the 44th annual Design Automation Conference
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Building a large instruction window through ROB compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Accurate critical path prediction via random trace construction

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Branch-on-random

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Reducing the impact of intra-core process variability with criticality-based resource allocation and prefetching

Proceedings of the 5th conference on Computing frontiers
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture

Proceedings of the 5th conference on Computing frontiers
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
Focused prefetching: performance oriented prefetching based on commit stalls

Proceedings of the 22nd annual international conference on Supercomputing
Fetch-Criticality Reduction through Control Independence

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Power-efficient clustering via incomplete bypassing

Proceedings of the 13th international symposium on Low power electronics and design
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
End-to-end performance forecasting: finding bottlenecks before they happen

Proceedings of the 36th annual international symposium on Computer architecture
SPARTAN: A software tool for Parallelization Bottleneck Analysis

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Inferno: streamlining verification with inferred semantics

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Application-aware prioritization mechanisms for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
Criticality-driven superscalar design space exploration

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Netrace: dependency-driven trace-based network-on-chip simulation

Proceedings of the Third International Workshop on Network on Chip Architectures
Parallelization libraries: Characterizing and reducing overheads

ACM Transactions on Architecture and Code Optimization (TACO)
CROB: implementing a large instruction window through compression

Transactions on high-performance embedded architectures and compilers III
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
Criticality based speculation control for speculative multithreaded architectures

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Criticality driven energy aware speculation for speculative multithreaded processors

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Bottleneck identification and scheduling in multithreaded applications

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hot-and-Cold: using criticality in the design of energy-efficient caches

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Speculative issue logic

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Compiler directed issue queue energy reduction

Transactions on High-Performance Embedded Architectures and Compilers IV
Criticality guided energy aware speculation for speculative multithreaded processors

Parallel Computing
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Proceedings of the 26th ACM international conference on Supercomputing
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Addressing End-to-End Memory Access Latency in NoC-Based Multicores

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Improving memory scheduling via processor-side load criticality information

Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Although some instructions hurt performance more than others, current processors typically apply scheduling and speculation as if each instruction was equally costly. Instruction cost can be naturally expressed through the critical path: if we could predict it at run-time, egalitarian policies could be replaced with cost-sensitive strategies that will grow increasingly effective as processors become more parallel.This paper introduces a hardware predictor of instruction criticality and uses it to improve performance. The predictor is both effective and simple in its hardware implementation. The effectiveness at improving performance stems from using a dependence-graph model of the microarchitectural critical path that identifies execution bottlenecks by incorporating both data and machine-specific dependences. The simplicity stems from a token-passing algorithm that computes the critical path without actually building the dependence graph.By focusing processor policies on critical instructions, our predictor enables a large class of optimizations. It can (i) give priority to critical instructions for scarce resources (functional units, ports, predictor entries); and (ii) suppress speculation on non-critical instructions, thus reducing “useless” misspeculations. We present two case studies that illustrate the potential of the two types of optimization, we show that (i) critical-path-based dynamic instruction scheduling and steering in a clustered architecture improves performance by as much as 21% (10% on average); and (ii) focusing value prediction only on critical instructions improves performance by as much as 5%, due to removing nearly half of the misspeculations.