Instruction-level parallel processing: history, overview, and perspective

Authors:
B. Ramakrishna Rau;Joseph A. Fisher
Affiliations:
-;-
Venue:
The Journal of Supercomputing - Special issue on instruction-level parallelism
Year:
1993

Citing 0
Cited 81

Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Interprocedural may-alias analysis for pointers: beyond k-limiting

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving resource utilization of the MIPS R8000 via post-scheduling global instruction distribution

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparison of two pipeline organizations

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Scheduling and mapping: software pipelining in the presence of structural hazards

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software pipelining

ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Petri net versus modulo scheduling for software pipelining

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
GPMB—software pipelining branch-intensive loops

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Scalable instruction-level parallelism through tree-instructions

ICS '97 Proceedings of the 11th international conference on Supercomputing
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
Wavesched: a novel scheduling technique for control-flow intensive behavioral descriptions

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Optimal Modulo Scheduling Through Enumeration

International Journal of Parallel Programming
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time

IEEE Transactions on Computers
Design and test space exploration of transport-triggered architectures

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Data Dependence Analysis of Assembly Code

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
On the Boosting of Instruction Scheduling by Renaming

The Journal of Supercomputing
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

International Journal of Parallel Programming
Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

International Journal of Parallel Programming
Compilers for Instruction-Level Parallelism

Computer
Superscalar Instruction Issue

IEEE Micro
SH-5: The 64-Bit SuperH Architecture

IEEE Micro
The Design Space of Register Renaming Techniques

IEEE Micro
An Advanced Optimizer for the IA-64 Architecture

IEEE Micro
Generalized Multiway Branch Unit for VLIW Microprocessors

IEEE Transactions on Parallel and Distributed Systems
A finite state machine based format model of software pipelined loops with conditions

Progress in computer research
Formal Verification of Explicitly Parallel Microprocessors

CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Software pipelining: A Genetic Algorithm Approach

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Architectural Considerations for Application-Specific Counterflow Pipelines

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Reconfigurable Pipelines in VLIW Execution Units

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Co-Scheduling Hardware and Software Pipelines

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A C++ compiler for FPGA custom execution units synthesis

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Register-Sensitive Software Pipelining

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Enhanced Co-Scheduling Method using Reduced MS-State Diagrams

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Predicated Software Pipelining Technique for Loops with Conditions

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Code size reduction technique and implementation for software-pipelined DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
Register allocation for optimal loop scheduling

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

ACM Transactions on Architecture and Code Optimization (TACO)
Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs

Proceedings of the 31st annual international symposium on Computer architecture
An approach for integrating basic retiming and software pipelining

Proceedings of the 4th ACM international conference on Embedded software
Data cache management on EPIC architecture: optimizing memory access for image processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Tabu Search Algorithms for Cyclic Machine Scheduling Problems

Journal of Scheduling
A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures

Journal of VLSI Signal Processing Systems
Instruction-level parallelism

Encyclopedia of Computer Science
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Generic software pipelining at the assembly level

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
mhz: anatomy of a micro-benchmark

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Multimedia terminal system-on-chip design and simulation

EURASIP Journal on Applied Signal Processing
Acceleration-based Dopplerlet transform-Part II: Implementations and applications to passive motion parameter estimation of moving sound source

Signal Processing
Register allocation for software pipelined multidimensional loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
On the exploitation of loop-level parallelism in embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Compilation strategies for reducing code size on a VLIW processor with variable length instructions

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Compilers, architectures and synthesis for embedded computing: retrospect and prospect

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
How many threads to spawn during program multithreading?

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Hardware support for multithreaded execution of loops with limited parallelism

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
A dynamic data dependence analysis approach for software pipelining

NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Computer assisted source-code parallelisation

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Constraint-Based register allocation and instruction scheduling

CP'12 Proceedings of the 18th international conference on Principles and Practice of Constraint Programming

Quantified Score

Hi-index	0.01

Instruction-level parallel processing: history, overview, and perspective

Quantified Score

Visualization

Abstract