Slack: maximizing performance under technological constraints

Authors:
Brian Fields;Rastislav Bodík;Mark D. Hill
Affiliations:
University of Wisconsin---Madison;University of Wisconsin---Madison;University of Wisconsin---Madison
Venue:
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Year:
2002

Citing 10
Cited 32

Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing power with dynamic critical path information

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Micro-architecture design and control speculation for energy reduction

Power aware computing
The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Quantifying Instruction Criticality

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch deferral using static slack

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler managed micro-cache bypassing for high performance EPIC processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Quantifying instruction criticality for shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor

Proceedings of the 30th annual international symposium on Computer architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Application adaptive energy efficient clustered architectures

Proceedings of the 2004 international symposium on Low power electronics and design
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance

IEEE Micro
Efficient Timing Budget Management for Accuracy Improvement in a Collaborative Object Tracking System

Journal of VLSI Signal Processing Systems
A case for a complexity-effective, width-partitioned microarchitecture

ACM Transactions on Architecture and Code Optimization (TACO)
Stall cycle redistribution in a transparent fetch pipeline

Proceedings of the 2006 international symposium on Low power electronics and design
Serialization-Aware Mini-Graphs: Performance with Fewer Resources

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
Accurate critical path prediction via random trace construction

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Fetch-Criticality Reduction through Control Independence

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The performance of pollution control victim cache for embedded systems

Proceedings of the 21st annual symposium on Integrated circuits and system design
Slack analysis in the system design loop

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
End-to-end performance forecasting: finding bottlenecks before they happen

Proceedings of the 36th annual international symposium on Computer architecture
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
Performance evaluation of scheduling applications with DAG topologies on multiclusters with independent local schedulers

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Characterizing the performance and energy attributes of scientific simulations

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Non-uniform instruction scheduling

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Bottleneck identification and scheduling in multithreaded applications

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
An integrated partitioning and scheduling based branch decoupling

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Criticality guided energy aware speculation for speculative multithreaded processors

Parallel Computing
Efficiently tolerating timing violations in pipelined microprocessors

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many emerging processor microarchitectures seek to manage technological constraints (e.g., wire delay, power, and circuit complexity) by resorting to non-uniform designs that provide resources at multiple quality levels (e.g., fast/slow, bypass paths, multi-speed functional units, and grid architectures). In such designs, the constraint problem becomes a control problem, and the challenge becomes designing a control policy that mitigates the performance penalty of the non-uniformity. Given the increasing importance of non-uniform control policies, we believe it is appropriate to examine them in their own right.To this end, we develop slack for use in creating control policies that match program execution behavior, to machine design. Intuitively, the slack of a dynamic instruction i is the number of cycles i can be delayed with no effect on execution time. This property makes slack a natural candidate for hiding non-uniform latencies.We make three contributions in our exploration of slack. First, we formally define slack, distinguish three variants (local, global and apportioned), and perform a limit study to show that slack is prevalent in our SPEC2000 workload. Second, we show how to predict slack in hardware. Third, we illustrate how to create a control policy based on slack for steering instructions among fast (high power) and slow (lower power) pipelines.