The parallel execution of DO loops
Communications of the ACM
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A multiple processor data flow machine that supports generalized procedures
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
Time and Parallel Processor Bounds for Linear Recurrence Systems
IEEE Transactions on Computers
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
High-Speed Multiprocessors and Compilation Techniques
IEEE Transactions on Computers
Time and Parallel Processor Bounds for Fortran-Like Loops
IEEE Transactions on Computers
Detection and Parallel Execution of Independent Instructions
IEEE Transactions on Computers
IEEE Transactions on Computers
The Inhibition of Potential Parallelism by Conditional Jumps
IEEE Transactions on Computers
Percolation of Code to Enhance Parallel Dispatching and Execution
IEEE Transactions on Computers
Computer
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds
IEEE Transactions on Computers
SAMP: a general purpose processor based on a self-timed VLIW structure
ACM SIGARCH Computer Architecture News
Exploiting parallel microprocessor microarchitectures with a compiler code generator
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Fred: An Architecture for a Self-Timed Decoupled Computer
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
An Architecture-Independent Workload Characterization Model for Parallel Computer Architectures
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Selective Guarded Execution Using Profiling on a Dynamically Scheduled Processor
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Towards achieving reliable and high-performance nanocomputing via dynamic redundancy allocation
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 14.98 |
Long instruction word architectures, such as attached scientific processors and horizontally microcoded CPU's, are a popular means of obtaining code speedup via fine-grained parallelism. The falling cost of hardware holds out the hope of using these architectures for much more parallelism. But this hope has been diminished by experiments measuring how much parallelism is available in the code to start with. These experiments implied that even if we had infinite hardware, long instruction word architectures could not provide a speedup of more than a factor of 2 or 3 on real programs.