An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
HPSm, a high performance restricted data flow architecture having minimal functionality
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds
IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Incremental performance contributions of hardware concurrency extraction techniques
Proceedings of the 1st International Conference on Supercomputing
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems
IEEE Transactions on Computers
ACM Computing Surveys (CSUR)
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Structure of Computers and Computations
Structure of Computers and Computations
Representation and detection of concurrency using ordering-matrices.
Representation and detection of concurrency using ordering-matrices.
Speedup of ordinary programs
Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Requirements for optimal execution of oops with tests
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Modeling the effects of instruction queue loading on a static instruction stream micro-architecture
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
A model for microarchitecture structure evaluation
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
A Theory of Reduced and Minimal Procedural Dependencies
IEEE Transactions on Computers
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream
IEEE Transactions on Computers
Extraction of massive instruction level parallelism
ACM SIGARCH Computer Architecture News
Ideograph/Ideogram: framework/hardware for eager evaluation
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Requirements for Optimal Execution of Loops with Tests
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.01 |
It has been shown that parallelism is a very promising alternative for enhancing computer performance. Parallelism, however, introduces much complexity to the programming effort. This has lead to the development of automatic concurrency extraction techniques. Prior work has demonstrated that static program restructuring via compiler based techniques provides a large degree of parallelism to the target machine. Purely hardware based extraction techniques (without software preprocessing) have also demonstrated significant (but lesser) degrees of parallelism. This paper considers the performance effects of the combination of both hardware and software techniques. The concurrency extracted from a given set of benchmarks by each technique separately, and together, is determined via simulations and/or analysis. The “common parallelism” extracted by the two methods is thus also considered, using new metrics. The analytic techniques for predicting the performance of specific programs are also described.