Dynamic configuration of application-specific implicit instructions for embedded pipelined processors

Authors:
Martino Sykora;Giovanni Agosta;Cristina Silvano
Affiliations:
Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy
Venue:
Proceedings of the 2008 ACM symposium on Applied computing
Year:
2008

Citing 14
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Microprocessor Architectures: From VLIW to Tta

Microprocessor Architectures: From VLIW to Tta
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic coalescing for 16-bit instructions

ACM Transactions on Embedded Computing Systems (TECS)
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware

IEEE Transactions on Computers
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Reducing the cost of conditional transfers of control by using comparison specifications

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
Low-power data forwarding for VLIW embedded architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Architecture Optimization of Application-Specific Implicit Instructions

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose the dynamic configuration of application specific implicit instructions for pipelined processors to better exploit the available parallelism at instruction level. Given the target application, the compiler selects a set of candidate instructions to be implicitly executed - i.e. their execution is controlled through a data-driven model, which avoids explicit instruction fetch. Consequently, the clock cycles usually required for the explicit issues are saved, thus improving the performance and reducing the code size. The compiler generates the reconfiguration operations to properly setup the data-path. The processor pipeline has been optimized to support the parallel execution of implicitly issued instructions, requiring a limited hardware overhead. The proposed technique has a negligible impact on the processor ISA - only reconfiguration instructions are added - which also benefits the compiler development times, since the optimization can be almost seamlessly added to an existing compilation tool-chain. The proposed approach has been applied to DSP and multimedia kernel loops, comparing its performance with those of two different baseline architectures: a scalar MIPS processor and a 4-issue VLIW processor of the LX family provided by STMicroelectronics [5]. Experimental results show a speedup ranging from 10 to 35%, and an average code size reduction of 19%.