Very Long Instruction Word architectures and the ELI-512

Authors:
Joseph A. Fisher
Affiliations:
-
Venue:
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Year:
1983

Citing 9
Cited 128

The Organization of Microprogram Stores

ACM Computing Surveys (CSUR)
MIPS: A microprocessor architecture

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Monte Carlo techniques in code optimization

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Using an oracle to measure potential parallelism in single instruction stream programs

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
2n-way jump microinstruction hardware and an effective instruction binding method

MICRO 13 Proceedings of the 13th annual workshop on Microprogramming
Towards an efficient, machine-independent language for microprogramming

MICRO 12 Proceedings of the 12th annual workshop on Microprogramming
A technique of global optimization of microprograms

MICRO 11 Proceedings of the 11th annual workshop on Microprogramming
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)

Principles of Compiler Design (Addison-Wesley series in computer science and information processing)

A computer with low-level parallelism QA-2: its applications to 3-D graphics and Prolog/Lisp machines

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A study of scalar compilation techniques for pipelined supercomputers

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The ZS-1 central processor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The performance potential of multiple functional unit processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Toward a dataflow/von Neumann hybrid architecture

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The white dwarf: a high-performance application-specific processor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A two-tier memory architecture for high-performance multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
A method for asynchronous parallelization

ICSE '88 Proceedings of the 10th international conference on Software engineering
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Organization of array data for concurrent memory access

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Tradeoffs in instruction format design for horizontal architectures

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A preceding activation scheme with graph unfolding for the parallel processing system-array

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A study of scalar compilation techniques for pipelined supercomputers

ACM Transactions on Mathematical Software (TOMS)
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Architecture and implementation of a VLIW supercomputer

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
GT-EP: a novel high-performance real-time architecture

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution

ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
Exploiting multi-way branching to boost superscalar processor performance

ACM SIGPLAN Notices
Architecture synthesis of high-performance application-specific processors

DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
An instruction-level performance analysis of the Multiflow TRACE 14/300

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Implementation optimization techniques for architecture synthesis of application-specific processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Instruction-level parallelism in Prolog: analysis and architectural support

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream

IEEE Transactions on Computers
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A new approach to schedule operations across nested-ifs and nested-loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
High-level synthesis of scalable architectures for IIR filters using multichip modules

DAC '93 Proceedings of the 30th international Design Automation Conference
Shared memory consistency conditions for non-sequential execution: definitions and programming strategies

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Instruction scheduling in the TOBEY compiler

IBM Journal of Research and Development
Design at the system level with VLSI CMOS

IBM Journal of Research and Development - Special issue: IBM CMOS technology
Reduced instruction set computers

Communications of the ACM - Special section on computer architecture
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
Spert-II: A Vector Microprocessor System

Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
Exploiting dual data-memory banks in digital signal processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Strategic directions in computer architecture

ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Custom-fit processors: letting applications define architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hardware implementation of a general multi-way jump mechanism

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
An analysis of dynamic scheduling techniques for symbolic applications

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Scalable instruction-level parallelism through tree-instructions

ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code

ICS '97 Proceedings of the 11th international conference on Supercomputing
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Exploiting idle floating-point resources for integer execution

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A programmable hardware accelerator for compiled electrical simulation

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
On local register allocation

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Evon: an extended von Neumann model for parallel processing

ACM '86 Proceedings of 1986 ACM Fall joint computer conference
An investigation of static versus dynamic scheduling

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The impact of synchronization and granularity on parallel systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Design Alternatives of Multithreaded Architecture

International Journal of Parallel Programming
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
Polygon rendering on a stream architecture

HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Reducing the complexity of the issue logic

ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures

IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Baring It All to Software: Raw Machines

Computer
Simulating Multimedia Systems with MVPSIM

IEEE Design & Test
Organization of the Motorola 88110 Superscalar RISC Microprocessor

IEEE Micro
Exploiting Instruction-Level Parallelism for Integrated Control-Flow Monitoring

IEEE Transactions on Computers
Generalized Multiway Branch Unit for VLIW Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler optimization on VLIW instruction scheduling for low power

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Indirect VLIW memory allocation for the ManArray multiprocessor DSP

ACM SIGARCH Computer Architecture News
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A high performance factoring machine

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A model of clocked micro-architectures for firmware engineering and design automation applications

MICRO 17 Proceedings of the 17th annual workshop on Microprogramming
Cheap Out-of-Order Execution Using Delayed Issue

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Matrix bidiagonalization: implementation and evaluation on the Trident processor

Neural, Parallel & Scientific Computations
Controlling the data space of tree structured computations

Information and Computation
Parallel processing: a smart compiler and a dumb machine

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Synthesizable HDL generation method for configurable VLIW processors

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A Speculative Control Scheme for an Energy-Efficient Banked Register File

IEEE Transactions on Computers
A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures

Journal of VLSI Signal Processing Systems
RPU: a programmable ray processing unit for realtime ray tracing

ACM SIGGRAPH 2005 Papers
Instruction-level parallelism

Encyclopedia of Computer Science
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Experimentation with a two-level microprogrammed multiprocessor computer

ACM SIGMICRO Newsletter
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
Hybrid multi-core architecture for boosting single-threaded performance

ACM SIGARCH Computer Architecture News
Efficient design space exploration for application specific systems-on-a-chip

Journal of Systems Architecture: the EUROMICRO Journal
Code and data partitioning for fine-grain parallelism

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
VLIW-DLX simulator for educational purposes

WCAE '07 Proceedings of the 2007 workshop on Computer architecture education
Multimedia terminal system-on-chip design and simulation

EURASIP Journal on Applied Signal Processing
A GaAs-Based Microprocessor Architecture for Real-Time Applications

IEEE Transactions on Computers
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers
A highly efficient implementation of back propagation algorithm using matrix instruction set architecture

Neural, Parallel & Scientific Computations
The revolution inside the box

Communications of the ACM - Web science
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA

Journal of Parallel and Distributed Computing
Reducing complexity of multiobjective design space exploration in VLIW-based embedded systems

ACM Transactions on Architecture and Code Optimization (TACO)
Approximating the buffer allocation problem using epochs

Journal of Parallel and Distributed Computing
Trend and Challenge on System-on-a-Chip Designs

Journal of Signal Processing Systems
Configurable emulated shared memory architecture for general purpose MP-SOCs and NOC regions

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Dynamic Malicious Code Detection Based on Binary Translator

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
A VLIW vector media coprocessor with cascaded SIMD ALUs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance evaluation of efficient multi-objective evolutionary algorithms for design space exploration of embedded computer systems

Applied Soft Computing
Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Artificial neural networks in hardware: A survey of two decades of progress

Neurocomputing
RTRAM: reconfigurable and testable multi-bit RAM design

ITC'88 Proceedings of the 1988 international conference on Test: new frontiers in testing
Exploiting dynamic reconfiguration techniques: the 2D-VLIW approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Codevelopment of multi-level instruction set architecture and hardware for an efficient matrix processor

Neural, Parallel & Scientific Computations
Automatic OpenCL device characterization: guiding optimized kernel design

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
FPGA implementation of variable-precision floating-point arithmetic

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
Mat-core: a decoupled matrix core extension for general-purpose processors

Neural, Parallel & Scientific Computations
DRMA: dynamically reconfigurable MPSoC architecture

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

Journal of Parallel and Distributed Computing
VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

ACM Transactions on Architecture and Code Optimization (TACO)
A systematic approach for optimized bypass configurations for application-specific embedded processors

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Modular multi-ported SRAM-based memories

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.02

Visualization

Abstract

By compiling ordinary scientific applications programs with a radical technique called trace scheduling, we are generating code for a parallel machine that will run these programs faster than an equivalent sequential machine—we expect 10 to 30 times faster. Trace scheduling generates code for machines called Very Long Instruction Word architectures. In Very Long Instruction Word machines, many statically scheduled, tightly coupled, fine-grained operations execute in parallel within a single instruction stream. VLIWs are more parallel extensions of several current architectures. These current architectures have never cracked a fundamental barrier. The speedup they get from parallelism is never more than a factor of 2 to 3. Not that we couldn't build more parallel machines of this type; but until trace scheduling we didn't know how to generate code for them. Trace scheduling finds sufficient parallelism in ordinary code to justify thinking about a highly parallel VLIW. At Yale we are actually building one. Our machine, the ELI-512, has a horizontal instruction word of over 500 bits and will do 10 to 30 RISC-level operations per cycle [Patterson 82]. ELI stands for Enormously Longword Instructions; 512 is the size of the instruction word we hope to achieve. (The current design has a 1200-bit instruction word.) Once it became clear that we could actually compile code for a VLIW machine, some new questions appeared, and answers are presented in this paper. How do we put enough tests in each cycle without making the machine too big? How do we put enough memory references in each cycle without making the machine too slow?