Communications of the ACM - Special section on computer architecture
Reduced instruction set computer architectures for VLSI
Reduced instruction set computer architectures for VLSI
Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Computer Structures: Principles and Examples
Computer Structures: Principles and Examples
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Design of a Computer—The Control Data 6600
Design of a Computer—The Control Data 6600
The performance potential of multiple functional unit processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The white dwarf: a high-performance application-specific processor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An integrated environment for development and execution of real-time programs
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compile-time techniques for efficient utilization of parallel memories
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cost-effective design of application specific VLIW processors using the SCARCE framework
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
VISA: A variable instruction set architecture
ACM SIGARCH Computer Architecture News
Employing register channels for the exploitation of instruction level parallelism
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
A variable instruction stream extension to the VLIW architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
GT-EP: a novel high-performance real-time architecture
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An instruction-level performance analysis of the Multiflow TRACE 14/300
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
SPIRE: streaming processing with instructions release element
ACM SIGARCH Computer Architecture News
Tolerating data access latency with register preloading
ICS '92 Proceedings of the 6th international conference on Supercomputing
An architectural framework for migration from CISC to higher performance platforms
ICS '92 Proceedings of the 6th international conference on Supercomputing
Register requirements of pipelined processors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Executing compressed programs on an embedded RISC architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
Speculative disambiguation: a compilation technique for dynamic memory disambiguation
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unconstrained speculative execution with predicated state buffering
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
Media architecture: general purpose vs. multiple application-specific programmable processor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
25 years of the international symposia on Computer architecture (selected papers)
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Power efficient mediaprocessors: design space exploration
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Designing power efficient hypermedia processors
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Multiple instruction issue in the NonStop cyclone processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Exploring Hypermedia Processor Design Space
Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures
IEEE Transactions on Parallel and Distributed Systems
A code decompression architecture for VLIW processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Limited Bandwidth to Affect Processor Design
IEEE Micro
Exploiting Instruction-Level Parallelism for Integrated Control-Flow Monitoring
IEEE Transactions on Computers
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors
IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution
IEEE Transactions on Computers
Compile-Time Based Performance Prediction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Performance Issues in Parallel Processing Systems
Performance Evaluation: Origins and Directions
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
An Architectural Overview of the Programmable Multimedia Processor, TM-1
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Automatic Architectural Synthesis of VLIW and EPIC Processors
Proceedings of the 12th international symposium on System synthesis
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Software pipelining: an effective scheduling technique for VLIW machines
ACM SIGPLAN Notices - Best of PLDI 1979-1999
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Processor Description Languages
Processor Description Languages
Programmable and Scalable Architecture for Graphics Processing Units
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Code compression for embedded VLIW processors using variable-to-fixed coding
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Single thread program parallelism with dataflow abstracting thread
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Hi-index | 0.01 |
Very Long Instruction Word (VLIW) architectures were promised to deliver far more than the factor of two or three that current architectures achieve from overlapped execution. Using a new type of compiler which compacts ordinary sequential code into long instruction words, a VLIW machine was expected to provide from ten to thirty times the performance of a more conventional machine built of the same implementation technology.Multiflow Computer, Inc., has now built a VLIW called the TRACETM along with its companion Trace SchedulingTM compacting compiler. This new machine has fulfilled the performance promises that were made. Using many fast functional units in parallel, this machine extends some of the basic Reduced-Instruction-Set precepts: the architecture is load/store, the microarchitecture is exposed to the compiler, there is no microcode, and there is almost no hardware devoted to synchronization, arbitration, or interlocking of any kind (the compiler has sole responsibility for runtime resource usage).This paper discusses the design of this machine and presents some initial performance results.