From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Authors:
Sami Yehia;Olivier Temam
Affiliations:
LRI, Paris XI University, France;LRI, Paris XI University, France
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 21
Cited 17

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A C compiler for a processor with a reconfigurable functional unit

FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
High-performance carry chains for FPGA's

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Efficient architecture/compiler co-exploration for ASIPs

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the design space of LUT-based transparent accelerators

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Serialization-Aware Mini-Graphs: Performance with Fewer Resources

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Proceedings of the International Symposium on Code Generation and Optimization
Static strands: Safely exposing dependence chains for increasing embedded power efficiency

ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
An Application Development Framework for ARISE Reconfigurable Processors

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Mighty-morphing power-SIMD

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The ARISE approach for extending embedded processors with arbitrary hardware accelerators

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion

Proceedings of the 8th ACM International Conference on Computing Frontiers
Functional unit chaining: a runtime adaptive architecture for reducing bypass delays

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Speculative hardware/software co-designed floating-point multiply-add fusion

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
A just-in-time customizable processor

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we present an approach for improving the performance of sequences of dependent instructions. We observe that many sequences of instructionscan be interpreted as functions. Unlike sequences of instructions, functions can be translated into very fast butexponentially costly two-level combinational circuits. Wepresent an approach that exploits this principle, speeds upprograms thanks to circuit-level parallelism/redundancy,but avoids the exponential costs.We analyze the potential of this approach, and thenwe propose an implementation that consists of a superscalar processor with a large specific functional unit associated with specific back-end transformations. The performance of the SpecInt2000 benchmarks and selectedprograms from the Olden and MiBench benchmark suitesimproves on average from 2.4% to 12% depending on thelatency of the functional units, and up to 39.6%; moreprecisely, the performance of optimized code sections improves on average from 3.5% to 19%, and up to 49%.