A VLIW architecture for a trace Scheduling Compiler

Authors:
Robert P. Colwell;Robert P. Nix;John J. O'Donnell;David B. Papworth;Paul K. Rodman
Affiliations:
-;-;-;-;-
Venue:
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Year:
1988

Citing 15
Cited 75

The cosmic cube

Communications of the ACM - Special section on computer architecture
Reduced instruction set computer architectures for VLSI

Reduced instruction set computer architectures for VLSI
Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
Applications of the Connection Machine

Computer
The hardware architecture of the CRISP microprocessor

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
Cache Memories

ACM Computing Surveys (CSUR)
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600

Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Tradeoffs in instruction format design for horizontal architectures

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Micro-optimization of floating-point operations

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Dynamic Instruction Scheduling and the Astronautics ZS-1

Computer
Exploitation of APL data parallelism on a shared-memory MIMD machine

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Architecture and implementation of a VLIW supercomputer

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The design of a RISC based multiprocessor chip

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A Theory of Reduced and Minimal Procedural Dependencies

IEEE Transactions on Computers
OHMEGA: a VLSI superscalar processor architecture for numerical applications

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effect of employing advanced branching mechanisms in superscalar processors

ACM SIGARCH Computer Architecture News
Exploiting multi-way branching to boost superscalar processor performance

ACM SIGPLAN Notices
Eiffel Linda: an object-oriented Linda dialect

ACM SIGPLAN Notices
Architecture synthesis of high-performance application-specific processors

DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Executing loops on a fine-grained MIMD architecture

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Computer Architecture in the 1990s

Computer
Distributed Instruction Set Computer Architecture

IEEE Transactions on Computers
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream

IEEE Transactions on Computers
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
StaCS: a Static Control Superscalar architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Scheduling and mapping: software pipelining in the presence of structural hazards

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Cache miss heuristics and preloading techniques for general-purpose programs

Proceedings of the 28th annual international symposium on Microarchitecture
Disjoint eager execution: an optimal form of speculative execution

Proceedings of the 28th annual international symposium on Microarchitecture
A software pipelining based VLIW architecture and optimizing compiler

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A fine-grained MIMD architecture based upon register channels

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Hardware implementation of a general multi-way jump mechanism

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Realistic scheduling: compaction for pipelined architectures

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Scalable instruction-level parallelism through tree-instructions

ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code

ICS '97 Proceedings of the 11th international conference on Supercomputing
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting idle floating-point resources for integer execution

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An out-of-order execution technique for runtime binary translators

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Optimization of high-performance superscalar architectures for energy efficiency

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
Heads and tails: a variable-length instruction format supporting parallel fetch and decode

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Handling irreducible loops: optimized node splitting versus DJ-graphs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Branch Effect Reduction Techniques

Computer
EPIC: Explicitly Parallel Instruction Computing

Computer
The MAP1000A VLIW Mediaprocessor

IEEE Micro
Introducing the IA-64 Architecture

IEEE Micro
Instruction Window Size Trade-Offs and Characterization of Program Parallelism

IEEE Transactions on Computers
A Performance and Cost Analysis of Applying Superscalar Method to Mainframe Computers

IEEE Transactions on Computers
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

IEEE Transactions on Computers
Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories

IEEE Transactions on Parallel and Distributed Systems
Pipelining and Bypassing in a VLIW Processor

IEEE Transactions on Parallel and Distributed Systems
A Method for Register Allocation to Loops in Multiple Register File Architectures

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Parallel Queue Processor Architecture Based on Produced Order Computation Model

The Journal of Supercomputing
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

IEEE Transactions on Computers
Effects of program compression

Journal of Systems Architecture: the EUROMICRO Journal
Dynamic Barrier Architecture for Multi-Mode Fine-Grain Parallelism Using Conventional Processors

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Partitioning of Variables for Multiple-Register-File VLIW Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
VLIW-DLX simulator for educational purposes

WCAE '07 Proceedings of the 2007 workshop on Computer architecture education
Code compression for VLIW embedded systems using a self-generating table

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dictionary-based program compression on customizable processor architectures

Microprocessors & Microsystems
Effects of program compression

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Template vertical dictionary-based program compression scheme on the TTA

PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
High performance FFT on SGI Altix 3700

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Parallel Computing

Quantified Score

Hi-index	0.03

Visualization

Abstract

A VLIW (very long instruction word) architecture machine called the TRACE has been built along with its companion Trace Scheduling compacting compiler. This machine has three hardware configurations, capable of executing 7, 14, or 28 operations simultaneously. The 'seven-wide' achieves a performance improvement of a factor of five or six for a wide range of scientific code, compared to machines of higher cost and fast chip implementation technology (such as the VAX 8700). The TRACE extends some basic reduced-instruction-set computer (RISC) precepts: the architecture is load/store, the microarchitecture is exposed to the compiler, there is no microcode, and there is almost no hardware devoted to synchronization, arbitration, or interlocking of any kind (the compiler has sole responsibility for run-time resource usage). The authors discuss the design of this machine and present some initial performance results.