Measuring the Parallelism Available for Very Long Instruction Word Architectures

Authors:
A. Nicolau;J. A. Fisher
Affiliations:
Department of Computer Science, Cornell University;-
Venue:
IEEE Transactions on Computers
Year:
1984

Citing 15
Cited 14

The parallel execution of DO loops

Communications of the ACM
A preliminary architecture for a basic data-flow processor

ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A multiple processor data flow machine that supports generalized procedures

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)

Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
Time and Parallel Processor Bounds for Linear Recurrence Systems

IEEE Transactions on Computers
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
High-Speed Multiprocessors and Compilation Techniques

IEEE Transactions on Computers
Time and Parallel Processor Bounds for Fortran-Like Loops

IEEE Transactions on Computers
Detection and Parallel Execution of Independent Instructions

IEEE Transactions on Computers
On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup

IEEE Transactions on Computers
The Inhibition of Potential Parallelism by Conditional Jumps

IEEE Transactions on Computers
Percolation of Code to Enhance Parallel Dispatching and Execution

IEEE Transactions on Computers
Branch Prediction Strategies and Branch Target Buffer Design

Computer
A VLSI RISC

Computer

Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds

IEEE Transactions on Computers
SAMP: a general purpose processor based on a self-timed VLIW structure

ACM SIGARCH Computer Architecture News
Exploiting parallel microprocessor microarchitectures with a compiler code generator

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Fred: An Architecture for a Self-Timed Decoupled Computer

ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
An Architecture-Independent Workload Characterization Model for Parallel Computer Architectures

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Selective Guarded Execution Using Profiling on a Dynamically Scheduled Processor

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On the potential of latency tolerant execution in speculative multithreading

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Towards achieving reliable and high-performance nanocomputing via dynamic redundancy allocation

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	14.98

Visualization

Abstract

Long instruction word architectures, such as attached scientific processors and horizontally microcoded CPU's, are a popular means of obtaining code speedup via fine-grained parallelism. The falling cost of hardware holds out the hope of using these architectures for much more parallelism. But this hope has been diminished by experiments measuring how much parallelism is available in the code to start with. These experiments implied that even if we had infinite hardware, long instruction word architectures could not provide a speedup of more than a factor of 2 or 3 on real programs.