The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exploiting Basic Block Value Locality with Block Reuse
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Dynamic Trace Memorization Reuse Technique
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 30th annual international symposium on Computer architecture
The Limits of Speculative Trace Reuse on Deeply Pipelined Processors
SBAC-PAD '03 Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Impact of Software Bypassing on Instruction Level Parallelism and Register File Traffic
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Hi-index | 0.00 |
Modern embedded processors are designed to maximize execution efficiency—the amount of performance achieved per unit of energy dissipated while meeting minimum performance levels. To increase this efficiency, we propose utilizing static strands, dependence chains without fan-out, which are exposed by a compiler pass. These dependent instructions are resequenced to be sequential and annotated to communicate their location to the hardware. Importantly, this modified application is binary compatible and functionally identical to the original, allowing transparent execution on a baseline processor. However, these static strands can be easily collapsed and optimized by simple processor modifications, significantly reducing the workload energy. Results show that over 30% of MediaBench and Spec2000int dynamic instructions can be collapsed, reducing issue logic energy by 20%, bypass energy 19%, and register file energy 14%. In addition, by increasing the effective capactity of pipeline resources by almost a third, average IPC can be improved up to 15%. This performance gain can then be traded in for a lower clock frequency to maintain a basline level of performance, further reducing energy.