A systematic approach for optimized bypass configurations for application-specific embedded processors

Authors:
Thorsten Jungeblut;Boris Hübener;Mario Porrmann;Ulrich Rückert
Affiliations:
Bielefeld University, Germany;Bielefeld University, Germany;University of Paderborn, Germany;Bielefeld University, Germany
Venue:
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Year:
2013

Citing 15
Cited 0

The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Dhrystone: a synthetic systems programming benchmark

Communications of the ACM
Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The Design of Rijndael

The Design of Rijndael
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Using Internal Redundant Representations and Limited Bypass to Support Pipelined Adders and Register Files

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Feedback driven instruction-set extension

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Evaluation of Speed and Area of Clustered VLIW Processors

VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Power Reduction in VLIW Processor with Compiler Driven Bypass Network

VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
Design Space Exploration for Memory Subsystems of VLIW Architectures

NAS '10 Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage
The H.264 Advanced Video Compression Standard

The H.264 Advanced Video Compression Standard
Code compression for embedded VLIW processors using variable-to-fixed coding

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low-power data forwarding for VLIW embedded architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The fast Fourier transform

IEEE Spectrum
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The diversity of today's mobile applications requires embedded processor cores with a high resource efficiency, that means, the devices should provide a high performance at low area requirements and power consumption. The fine-grained parallelism supported by multiple functional units of VLIW architectures offers a high throughput at reasonable low clock frequencies compared to single-core RISC processors. To efficiently utilize the processor pipeline, common system architectures have to cope with data hazards due to data dependencies between consecutive operations. On the one hand, such hazards can be resolved by complex forwarding circuits (i.e., a pipeline bypass) which forward intermediate results to a subsequent instruction. On the other hand, the pipeline bypass can strongly affect or even dominate the total resource requirements and degrade the maximum clock frequency. In this work the CoreVA VLIW architecture is used for the development and the analysis of application-specific bypass configurations. It is shown that many paths of a comprehensive bypass system are rarely used and may not be required for certain applications. For this reason, several strategies have been implemented to enhance the efficiency of the total system by introducing application-specific bypass configurations. The configuration can be carried out statically by only implementing required paths or at runtime by dynamically reconfiguring the hardware. An algorithm is proposed which derives an optimized configuration by iteratively disabling single bypass paths. The adaptation of these application-specific bypass configurations allows for a reduction of the critical path by 26%. As a result, the execution time and energy requirements could be reduced by up to 21.5%. Using Dynamic Frequency Scaling (DFS) and dynamic deactivation/reactivation of bypass paths allows for a runtime reconfiguration of the bypass system. This ensures the highest efficiency while processing varying applications.