Communications of the ACM - Special section on computer architecture
Reduced instruction set computer architectures for VLSI
Reduced instruction set computer architectures for VLSI
Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
The hardware architecture of the CRISP microprocessor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
ACM Computing Surveys (CSUR)
Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Design of a Computer—The Control Data 6600
Design of a Computer—The Control Data 6600
Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Tradeoffs in instruction format design for horizontal architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Micro-optimization of floating-point operations
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Exploitation of APL data parallelism on a shared-memory MIMD machine
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Architecture and implementation of a VLIW supercomputer
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The design of a RISC based multiprocessor chip
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A Theory of Reduced and Minimal Procedural Dependencies
IEEE Transactions on Computers
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effect of employing advanced branching mechanisms in superscalar processors
ACM SIGARCH Computer Architecture News
Exploiting multi-way branching to boost superscalar processor performance
ACM SIGPLAN Notices
Eiffel Linda: an object-oriented Linda dialect
ACM SIGPLAN Notices
Architecture synthesis of high-performance application-specific processors
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Executing loops on a fine-grained MIMD architecture
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Computer Architecture in the 1990s
Computer
Distributed Instruction Set Computer Architecture
IEEE Transactions on Computers
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream
IEEE Transactions on Computers
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
StaCS: a Static Control Superscalar architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Avoidance and suppression of compensation code in a trace scheduling compiler
ACM Transactions on Programming Languages and Systems (TOPLAS)
Height reduction of control recurrences for ILP processors
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Scheduling and mapping: software pipelining in the presence of structural hazards
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Disjoint eager execution: an optimal form of speculative execution
Proceedings of the 28th annual international symposium on Microarchitecture
A software pipelining based VLIW architecture and optimizing compiler
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A fine-grained MIMD architecture based upon register channels
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Hardware implementation of a general multi-way jump mechanism
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Realistic scheduling: compaction for pipelined architectures
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Dynamically scheduled VLIW processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Scalable instruction-level parallelism through tree-instructions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code
ICS '97 Proceedings of the 11th international conference on Supercomputing
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting idle floating-point resources for integer execution
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An out-of-order execution technique for runtime binary translators
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
Exploiting ILP in page-based intelligent memory
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Boosting beyond static scheduling in a superscalar processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Optimization of high-performance superscalar architectures for energy efficiency
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Heads and tails: a variable-length instruction format supporting parallel fetch and decode
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Handling irreducible loops: optimized node splitting versus DJ-graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Branch Effect Reduction Techniques
Computer
The MAP1000A VLIW Mediaprocessor
IEEE Micro
Introducing the IA-64 Architecture
IEEE Micro
Instruction Window Size Trade-Offs and Characterization of Program Parallelism
IEEE Transactions on Computers
A Performance and Cost Analysis of Applying Superscalar Method to Mainframe Computers
IEEE Transactions on Computers
IEEE Transactions on Computers
Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories
IEEE Transactions on Parallel and Distributed Systems
Pipelining and Bypassing in a VLIW Processor
IEEE Transactions on Parallel and Distributed Systems
A Method for Register Allocation to Loops in Multiple Register File Architectures
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Parallel Queue Processor Architecture Based on Produced Order Computation Model
The Journal of Supercomputing
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
Effects of program compression
Journal of Systems Architecture: the EUROMICRO Journal
Dynamic Barrier Architecture for Multi-Mode Fine-Grain Parallelism Using Conventional Processors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Partitioning of Variables for Multiple-Register-File VLIW Architectures
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
VLIW-DLX simulator for educational purposes
WCAE '07 Proceedings of the 2007 workshop on Computer architecture education
Code compression for VLIW embedded systems using a self-generating table
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dictionary-based program compression on customizable processor architectures
Microprocessors & Microsystems
Effects of program compression
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Template vertical dictionary-based program compression scheme on the TTA
PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
High performance FFT on SGI Altix 3700
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.03 |
A VLIW (very long instruction word) architecture machine called the TRACE has been built along with its companion Trace Scheduling compacting compiler. This machine has three hardware configurations, capable of executing 7, 14, or 28 operations simultaneously. The 'seven-wide' achieves a performance improvement of a factor of five or six for a wide range of scientific code, compared to machines of higher cost and fast chip implementation technology (such as the VAX 8700). The TRACE extends some basic reduced-instruction-set computer (RISC) precepts: the architecture is load/store, the microarchitecture is exposed to the compiler, there is no microcode, and there is almost no hardware devoted to synchronization, arbitration, or interlocking of any kind (the compiler has sole responsibility for run-time resource usage). The authors discuss the design of this machine and present some initial performance results.