A VLIW architecture for a trace scheduling compiler

Authors:
Robert P. Colwell;Robert P. Nix;John J. O'Donnell;David B. Papworth;Paul K. Rodman
Affiliations:
Multiflow Computer, Branford, CT;Multiflow Computer, Branford, CT;Multiflow Computer, Branford, CT;Multiflow Computer, Branford, CT;Multiflow Computer, Branford, CT
Venue:
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Year:
1987

Citing 12
Cited 73

The cosmic cube

Communications of the ACM - Special section on computer architecture
Reduced instruction set computer architectures for VLSI

Reduced instruction set computer architectures for VLSI
Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
Applications of the Connection Machine

Computer
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Computer Structures: Principles and Examples

Computer Structures: Principles and Examples
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600

The performance potential of multiple functional unit processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The white dwarf: a high-performance application-specific processor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An integrated environment for development and execution of real-time programs

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compile-time techniques for efficient utilization of parallel memories

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cost-effective design of application specific VLIW processors using the SCARCE framework

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
VISA: A variable instruction set architecture

ACM SIGARCH Computer Architecture News
Employing register channels for the exploitation of instruction level parallelism

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
GT-EP: a novel high-performance real-time architecture

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An instruction-level performance analysis of the Multiflow TRACE 14/300

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
SPIRE: streaming processing with instructions release element

ACM SIGARCH Computer Architecture News
Tolerating data access latency with register preloading

ICS '92 Proceedings of the 6th international conference on Supercomputing
An architectural framework for migration from CISC to higher performance platforms

ICS '92 Proceedings of the 6th international conference on Supercomputing
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Executing compressed programs on an embedded RISC architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Speculative disambiguation: a compilation technique for dynamic memory disambiguation

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Custom-fit processors: letting applications define architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Media architecture: general purpose vs. multiple application-specific programmable processor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Power efficient mediaprocessors: design space exploration

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Designing power efficient hypermedia processors

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Multiple instruction issue in the NonStop cyclone processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Pipeline Reconfigurable FPGAs

Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Exploring Hypermedia Processor Design Space

Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures

IEEE Transactions on Parallel and Distributed Systems
A code decompression architecture for VLIW processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Guest Editor's Introduction Real Machines: Design Choices/Engineering Trade-Offs

Computer
An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures

Computer
Limited Bandwidth to Affect Processor Design

IEEE Micro
Exploiting Instruction-Level Parallelism for Integrated Control-Flow Monitoring

IEEE Transactions on Computers
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Compile-Time Based Performance Prediction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Performance Issues in Parallel Processing Systems

Performance Evaluation: Origins and Directions
Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
An Architectural Overview of the Programmable Multimedia Processor, TM-1

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Automatic Architectural Synthesis of VLIW and EPIC Processors

Proceedings of the 12th international symposium on System synthesis
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Processor Description Languages

Processor Description Languages
Programmable and Scalable Architecture for Graphics Processing Units

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Code compression for embedded VLIW processors using variable-to-fixed coding

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Single thread program parallelism with dataflow abstracting thread

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

Quantified Score

Hi-index	0.01

Visualization

Abstract

Very Long Instruction Word (VLIW) architectures were promised to deliver far more than the factor of two or three that current architectures achieve from overlapped execution. Using a new type of compiler which compacts ordinary sequential code into long instruction words, a VLIW machine was expected to provide from ten to thirty times the performance of a more conventional machine built of the same implementation technology.Multiflow Computer, Inc., has now built a VLIW called the TRACETM along with its companion Trace SchedulingTM compacting compiler. This new machine has fulfilled the performance promises that were made. Using many fast functional units in parallel, this machine extends some of the basic Reduced-Instruction-Set precepts: the architecture is load/store, the microarchitecture is exposed to the compiler, there is no microcode, and there is almost no hardware devoted to synchronization, arbitration, or interlocking of any kind (the compiler has sole responsibility for runtime resource usage).This paper discusses the design of this machine and presents some initial performance results.