Architecture and compiler tradeoffs for a long instruction wordprocessor

Authors:
Robert Cohn;Thomas Gross;Monica Lam
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA
Venue:
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Year:
1989

Citing 10
Cited 20

Compilation for a high-performance systolic array

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
PIPE: a high performance VLSI processor implementation

Advances in VLSI and Computer Systems
How many addressing modes are enough?

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The warp computer: Architecture, implementation, and performance

IEEE Transactions on Computers
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Warp: an integrated solution of high-speed parallel computing

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The Compilation of Loop Induction Expressions

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Systolic Array Optimizing Compiler

A Systolic Array Optimizing Compiler
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture

Cost-effective design of application specific VLIW processors using the SCARCE framework

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Mapping concurrent programs to VLIW processors

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Architecture and implementation of a VLIW supercomputer

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
How many operation units are adequate?

ACM SIGARCH Computer Architecture News
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Instruction scheduling in the TOBEY compiler

IBM Journal of Research and Development
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Communication styles for parallel systems

Computer
iWarp: A 100-MOPS, LIW Microprocessor for Multicomputers

IEEE Micro
Instruction Window Size Trade-Offs and Characterization of Program Parallelism

IEEE Transactions on Computers
Pipelining and Bypassing in a VLIW Processor

IEEE Transactions on Parallel and Distributed Systems
Modeling Instruction-Level Parallelism for Software Pipelining

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
PBExplore: A Framework for Compiler-in-the-Loop Exploration of Partial Bypassing in Embedded Processors

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Retargetable pipeline hazard detection for partially bypassed processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A very long instruction word (VLIW) processor exploits parallelism by controlling multiple operations in a single instruction word. This paper describes the architecture and compiler tradeoffs in the design of iWarp, a VLIW single-chip microprocessor developed in a joint project with Intel Corp. The iWarp processor is capable of specifying up to nine operations in an instruction word and has a peak performance of 20 million floating-point operations and 20 million integer operations per second. An optimizing compiler has been constructed and used as a tool to evaluate the different architectural proposals in the development of iWarp. We present here the analysis and compiler optimizations for those architectural features that address two key issues in the design of a VLIW microprocessor: code density and a streamlined execution cycle. We support the results of our analysis with performance data for the Livermore Loops and a selection of programs from the LINPACK library.