Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing power with dynamic critical path information
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Micro-architecture design and control speculation for energy reduction
Power aware computing
The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Dynamic Prediction of Critical Path Instructions
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Quantifying Instruction Criticality
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch deferral using static slack
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler managed micro-cache bypassing for high performance EPIC processors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Quantifying instruction criticality for shared memory multiprocessors
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Application adaptive energy efficient clustered architectures
Proceedings of the 2004 international symposium on Low power electronics and design
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Criticality Analysis of Clustering in Superscalar Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Journal of VLSI Signal Processing Systems
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Stall cycle redistribution in a transparent fetch pipeline
Proceedings of the 2006 international symposium on Low power electronics and design
Serialization-Aware Mini-Graphs: Performance with Fewer Resources
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th annual international symposium on Computer architecture
Accurate critical path prediction via random trace construction
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The performance of pollution control victim cache for embedded systems
Proceedings of the 21st annual symposium on Integrated circuits and system design
Slack analysis in the system design loop
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
End-to-end performance forecasting: finding bottlenecks before they happen
Proceedings of the 36th annual international symposium on Computer architecture
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Characterizing the performance and energy attributes of scientific simulations
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Non-uniform instruction scheduling
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Bottleneck identification and scheduling in multithreaded applications
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
An integrated partitioning and scheduling based branch decoupling
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Efficiently tolerating timing violations in pipelined microprocessors
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
Many emerging processor microarchitectures seek to manage technological constraints (e.g., wire delay, power, and circuit complexity) by resorting to non-uniform designs that provide resources at multiple quality levels (e.g., fast/slow, bypass paths, multi-speed functional units, and grid architectures). In such designs, the constraint problem becomes a control problem, and the challenge becomes designing a control policy that mitigates the performance penalty of the non-uniformity. Given the increasing importance of non-uniform control policies, we believe it is appropriate to examine them in their own right.To this end, we develop slack for use in creating control policies that match program execution behavior, to machine design. Intuitively, the slack of a dynamic instruction i is the number of cycles i can be delayed with no effect on execution time. This property makes slack a natural candidate for hiding non-uniform latencies.We make three contributions in our exploration of slack. First, we formally define slack, distinguish three variants (local, global and apportioned), and perform a limit study to show that slack is prevalent in our SPEC2000 workload. Second, we show how to predict slack in hardware. Third, we illustrate how to create a control policy based on slack for steering instructions among fast (high power) and slow (lower power) pipelines.